chapter 28. parallel processing - enea · chapter 28. parallel processing the fluent solver allows...

Chapter 28. Parallel Processing

The FLUENT solver allows for parallel processing and provides toolsfor checking and modifying the parallel configuration. You can use adedicated parallel machine (for example, a multiprocessor workstation),or you can run your solver on a network of workstations. The followingsections describe the parallel-processing features of FLUENT.

• Section 28.1: Introduction to Parallel Processing

• Section 28.2: Starting the Parallel Version of the Solver

• Section 28.3: Using a Parallel Network of Workstations

• Section 28.4: Partitioning the Grid

• Section 28.5: Checking and Improving Parallel Performance

28.1 Introduction to Parallel Processing

The parallel version of FLUENT is a different version of the solver that si-multaneously computes the solution using multiple compute nodes (pro-cesses). To use it, you will partition your grid into multiple subdomainssuch that the number of partitions is an integral multiple of the numberof compute nodes available to you (e.g., 8 partitions for 1, 2, 4, or 8compute nodes). Each partition (or agglomerated group of partitions)will then “reside” on a different compute node. These processes can becompute nodes on a massively parallel computer, processes on a multiple-CPU workstation, or processes on a network of heterogeneous worksta-tions running UNIX, or on a network of workstations running Windows.In general, as the number of compute nodes increases, turnaround timefor the solution will decrease. However, parallel efficiency decreases asthe ratio of communication to computation increases, so you should becareful to choose a large enough problem for the parallel machine.

c© Fluent Inc. November 28, 2001 28-1

Parallel Processing

The recommended procedure for using parallel FLUENT is as follows:

1. Start up the parallel solver and spawn additional compute nodes(if necessary). See Sections 28.2 and 28.3 for details.

2. Read your case file and have FLUENT partition the grid automati-cally upon loading it. It is best to partition after the problem is setup, since partitioning has some model dependencies (e.g., adaptionon non-conformal interfaces, sliding-mesh and shell-conduction en-capsulation). If your case file contains sliding meshes, or non-conformal interfaces on which you plan to perform adaption duringthe calculation, you will have to partition it in the serial solver.

Note that there are other approaches for partitioning, includingmanual partitioning in either the serial or the parallel solver. SeeSection 28.4 for details.

3. Review the partitions and perform partitioning again, if necessary.See Section 28.4.5 for details on checking your partitions.

4. Calculate a solution. See Section 28.5 for information on checkingand improving the parallel performance.

28.2 Starting the Parallel Version of the Solver

The way you start the parallel version of FLUENT depends on whetheryou are using a dedicated parallel machine or a workstation cluster.

28.2.1 Starting the Parallel Solver on a UNIX System

You can run FLUENT on a UNIX dedicated parallel machine or a networkof UNIX workstations. The procedures for starting these versions aredescribed in this section.

Running on a Multiprocessor UNIX Machine

To run FLUENT on a dedicated parallel machine (i.e., a multiproces-sor workstation or a massively parallel machine), type the usual startupcommand without a version (i.e., fluent), and then use the Select Solver

28-2 c© Fluent Inc. November 28, 2001


panel (Figure 28.2.1) to specify the parallel architecture and version in-formation.

File −→Run...

Figure 28.2.1: The Select Solver Panel

1. Under Versions, specify the 3D or 2D single- or double-precisionversion by turning the 3D and Double Precision options on or off,and turn on the Parallel option.

2. Under Options, select the message-passing library in the Communi-cator drop-down list. The Default library is recommended, becauseit selects the library that should provide the best overall parallelperformance for your dedicated parallel machine.


Parallel Processing

If you prefer to select a specific library, you can choose either Ven-dor MPI or Shared Memory MPI (MPICH). Vendor MPI selects themessage-passing library optimized by your hardware vendor. If theparallel toolkit supplied by your hardware vendor is installed onyour machine, FLUENT will detect it automatically when the De-fault option is selected. Shared Memory MPI (MPICH) selects theMPICH message-passing library, a public-domain version of MPI.

3. Set the number of CPUs in the Processes field.

4. Click on the Run button to start the parallel version. No additionalsetup is required once the solver starts.

If you prefer to start the parallel version from the command line, youcan type

fluent version -tn [-pcomm] [-loadhost] [-pathpath]

where version is 2d, 3d, 2ddp, or 3ddp, and n is replaced by the numberof CPUs to be used. The remaining arguments are optional, as indi-cated by the square brackets around them. (If you enter one or moreof these optional arguments, do not include the square brackets.) commis replaced by the name of the parallel communication library, host isreplaced by the hostname of the machine to launch the compute nodes(by default, it is set to the machine you’re on when entering this com-mand line), and path is replaced by the root path to the Fluent.Incinstallation directory.

In general, you will need to specify -pcomm only if you want to override!the default communication library (which should provide best overallparallel performance).

The available communicators for dedicated parallel machines and theassociated communication libraries for them are listed below:

vmpi vendor MPIsmpi shared memory MPI (MPICH)net socket

See step 2, above, for a description of these libraries.



Running on a UNIX Workstation Cluster

To run FLUENT on a network of UNIX workstations, type the usualstartup command without a version (i.e., fluent), and then use theSelect Solver panel (Figure 28.2.1) to specify the parallel architectureand version information.

File −→Run...

1. Under Versions, specify the 3D or 2D single- or double-precisionversion by turning the 3D and Double Precision options on or off,and turn on the Parallel option.

2. Under Options, select the Socket message-passing library in theCommunicator drop-down list.

When you start the parallel network version, you must select Socket!or Network MPI (MPICH) in the Communicator drop-down list, un-less the vendor MPI library (described earlier in this section) sup-ports clustering. If you keep the Default option, one of the MPIparallel versions will start instead, and you will be unable to spawnadditional compute nodes.

3. Set the number of initial compute node processes to spawn on thehost machine in the Processes field. You can start with 1 or 0 nodesand spawn the rest later on, as described in Section 28.3.1.

4. (optional) Specify the name of a file containing a list of machines,one per line, in the Hosts File field. If the number of Processes is setto 0, FLUENT will spawn a compute node on each machine listedin the file.

5. Click on the Run button to start the parallel network version.

If you prefer to start the parallel network version from the commandline, you can type

fluent version -t1 -pnet

(to use the socket communicator) or


Parallel Processing

fluent version -t1 -pnmpi

(to use the network MPI communicator) to start the solver with 1 com-pute node on the host workstation. You can then spawn additionalprocesses on remote workstations using the Network Configuration panel,as described in Section 28.3.1.

You can type

fluent version -t0 -pnet [-cnf=hostsfile]

(to use the socket communicator) or

fluent version -t0 -pnmpi [-cnf=hostsfile]

(to use the network MPI communicator) to start a host process thatcontrols compute nodes situated on remote machines. If the optional-cnf=hostsfile is specified, a compute node will be spawned on each ma-chine listed in the file hostsfile. (If you enter this optional argument, donot include the square brackets.) Otherwise, you can spawn the processesas described in Section 28.3.1.

28.2.2 Starting the Parallel Solver on a Windows System

You can run FLUENT on a Windows dedicated parallel machine or a net-work of Windows machines. The procedures for starting these versionsare described in this section.

Running on a Multiprocessor Windows Machine

On a Windows system, you can start the dedicated parallel version ofFLUENT from the MS-DOS Command Prompt window. To start theparallel version on x processors, type

fluent version -tx

at the prompt, replacing version with the solver version (2d, 3d, 2ddp,or 3ddp) and x with the number of processors (e.g., fluent 3d -t3 torun the 3D version on 3 processors). (See Section 1.5.3 for informationabout modifying your user environment if the fluent command is notrecognized.)



Running on a Windows Cluster

There are two ways to run FLUENT in parallel on a network of Windowsmachines: using the RSHD communicator software that is included withthe FLUENT distribution, or using a vendor-supplied message-passinginterface (VMPI). See the installation instructions for Windows parallelfor details about obtaining and installing one of these programs. Thestartup instructions below assume that you have properly set up thenecessary software, based on the appropriate installation instructions.

Starting the RSHD-Based Parallel Version of FLUENT

If you are using the RSHD software for network communication, typethe following in an MS-DOS Command Prompt window:

fluent version -pnet [-pathsharename] [-cnf=hostfile] -tnprocs

where

• version must be replaced by the version of FLUENT you want torun (2d, 3d, 2ddp, or 3ddp).

• -pathsharename specifies the shared network name for theFluent.Inc directory in UNC form. This input is required onlyif you are not starting the job on the same computer on whichFLUENT has been installed. (If you do include this input, do notinclude the square brackets; see the example below.) If you startthe job on the same computer, you do not need to enter this infor-mation.

For example, if FLUENT has been installed on computer1, thenyou should replace sharename by the UNC name for the shareddirectory, \\computer1\fluent.inc.

• -cnf=hostfile (optional) specifies the hostfile, which contains a listof the computers on which you want to run the parallel job. If thehostfile is not located in the directory where you are typing thestartup command, you will need to supply the full pathname tothe file. (If you include the -cnf option, do not include the squarebrackets; see the example below.)


Parallel Processing

You can use a plain text editor like Notepad to create the hostfile.The only restriction on the filename is that there should be nospaces in it. For example, hosts.txt is an acceptable hostfilename, but my hosts.txt is not.

Your hostfile (e.g., hosts.txt) might contain the following entries:

computer1computer2

The first computer in the list must be the name of the local com-!puter you are working on.

If a computer in the network is a multiprocessor, you can list itmore than once. For example, if computer1 has 2 CPUs, then,to take advantage of both CPUs, the hosts.txt file should listcomputer1 twice:

computer1computer1computer2

If you do not include the -cnf option, FLUENT will start nprocs(see below) processes on the computer where you type the startupcommand. You can then use the Network Configuration panel inFLUENT to interactively spawn additional nodes on the cluster.See Section 28.3 for details.

• -tnprocs specifies the number of processes to use. If the -cnfoption is present, the hostfile argument is used to determine whichcomputers to use for the parallel job. For example, if there are 10computers listed in the hostfile and you want to run a job with 5processes, set nprocs to 5 (i.e., -t5) and FLUENT will use the first5 machines listed in the hostfile.

You can use the Network Configuration panel to kill processes orspawn additional processes after startup. See Section 28.3 for de-tails.



As an example, the full command line to start a 3D RSHD-based paralleljob on the first 3 computers listed in a hostfile called hosts.txt is asfollows:

fluent 3d -pnet -cnf=hosts.txt -path\\computer1\fluent.inc -t3

Starting the Vendor-MPI-Based Parallel Version of FLUENT

If you are using the vendor-supplied MPI software for network commu-nication, type the following in an MS-DOS Command Prompt window:

fluent version -pvmpi [-pathsharename] -cnf=hostfile -tnprocs

where the options have the same meanings as for the RSHD-based startupdescribed above, with the following differences:

• The hostfile specification is required. You cannot spawn nodes onthe cluster using the Network Configuration panel when this MPIsoftware is used. (Recall that, as for the RSHD-based version, thefirst computer listed in the hostfile must be the name of the localcomputer you are working on.)

• You cannot use the Network Configuration panel to kill processes orspawn additional processes after startup when this MPI softwareis used.

As an example, the full command line to start a 3D vendor-MPI-basedparallel job on the first 3 computers listed in a hostfile called hosts.txtis as follows:

fluent 3d -pvmpi -cnf=hosts.txt -path\\computer1\fluent.inc -t3


Parallel Processing

28.3 Using a Parallel Network of Workstations

You can create a virtual parallel machine by spawning (and killing) com-pute node processes on workstations connected by a network. Multiplecompute node processes are allowed to exist on the same workstation,even if the workstation contains only a single CPU.

28.3.1 Configuring the Network

If you want to spawn compute nodes on several different machines, or ifyou want to make any changes to the current network configuration (e.g.,if you accidentally spawned too many compute nodes on the host machinewhen you started the solver), you can use the Network Configuration panel(Figure 28.3.1).

Parallel −→ Network −→Configure...

Figure 28.3.1: The Network Configuration Panel



Structure of the Network

Compute nodes are labeled sequentially starting at 0. In addition to thecompute node processes, there is one host process. The host process isautomatically started when FLUENT starts, and it is killed when FLU-ENT exits. It cannot be killed while running. Compute nodes, however,can be killed at any time, with the exception that compute node 0 canonly be killed if it is the last remaining compute node process. The hostprocess always spawns compute node 0. Compute node 0 spawns allother compute nodes.

Steps for Spawning Compute Nodes

The basic steps for spawning compute nodes are as follows:

1. Choose the host machine(s) on which to spawn compute nodes inthe Available Hosts list. If the desired machine is not listed, youcan use the Host Entry fields to manually add a host (as describedbelow), or you can copy the desired host from the host database(as described in Section 28.3.2).

2. Set the number of compute node processes to spawn on each se-lected host machine in the Spawn Count field.

3. Click on the Spawn button and the new node(s) will be spawnedand added to the Spawned Compute Nodes list.

Additional functions related to network configuration are described be-low.

Adding Hosts Manually

To add a host to the Available Hosts list in the Network Configurationpanel manually, you can enter the internet name of the remote machinein the Hostname field under Host Entry, enter your login name on thatmachine in the Username field (unless your accounts all have the samelogin name, in which case you need not specify a username), and thenclick on the Add button. The specified host will be added to the AvailableHosts list.


Parallel Processing

Deleting Hosts

To delete a host from the Available Hosts list in the Network Configurationpanel, select the host and click on the Delete button. The host name willbe removed from the Available Hosts list (but the hosts database (seeSection 28.3.2) will not be affected).

Killing Compute Nodes

If you spawn an undesired compute node, you can easily remove it byselecting it in the Spawned Compute Nodes list and clicking on the Killbutton.

Remember that compute node 0 can only be killed if it is the last re-!maining compute node process.

Saving a Hosts File

If you have compiled a group of Available Hosts that you may want to useagain in another session, you can save a hosts file containing all entries inthe Available Hosts list. Click on the Save... button and, in the resultingSelect File dialog box, enter the name of the file and save it. In a futuresession, you can load the contents of this file into the hosts database (seeSection 28.3.2) and then copy the hosts over to the Network Configurationpanel in order to reproduce the current Available Hosts list.

Common Problems Encountered During Node Spawning

The spawning process will try to establish a connection with a new com-pute node, but if after 50 seconds it receives no response from the newcompute node, it will assume the spawn was unsuccessful. The spawnwill be unsuccessful, for example, if the remote machine is unable to findthe FLUENT executable. To manually test if the spawning machine canstart a new compute node, you can type

rsh [-l username] hostname fluent -t0 -v

from a shell prompt on the spawning machine. hostname should bereplaced with the internet name of the machine on which you want to



spawn a compute node, and username should be replaced with your loginname on the remote machine specified by hostname.

If all your accounts have the same login name, you do not need to specify!a username. (The square brackets around -l username indicate that itis not always required; if you do enter a login name, do not include thesquare brackets.) Note that on some systems, the remote shell commandis remsh instead of rsh.

The spawn test could fail for several reasons:

Login incorrect. The machine spawning a new compute node must beable to rsh to the machine where the new process will reside, or thespawn will fail. There are several ways to enable this capability.Consult your systems administrator for assistance.

fluent: Command not found. The rsh to the remote machine suc-ceeded, but the path to the FLUENT shell script could not befound on that machine. If you are using csh, then the path to theFLUENT shell script should be added to the path variable in your.cshrc file. If that also fails, you can use the parallel/network/path text command to set the path to the Fluent.Inc installationdirectory directly before spawning the compute node.

parallel −→ network −→path

28.3.2 The Hosts Database

When you are creating a parallel network of workstations, it is conve-nient to start with a list of machines that are part of your local network(a “hosts file”). You can load a file containing these names into thehosts database and then select the hosts that are available for creatinga parallel configuration (or network) on a cluster of workstations usingthe Hosts Database panel (Figure 28.3.2).

Parallel −→ Network −→Database...

(You can also open this panel by clicking on the Database... button inthe Network Configuration panel.)


Parallel Processing

Figure 28.3.2: The Hosts Database Panel



If the hosts file fluent.hosts or .fluent.hosts exists in your homedirectory, its contents are automatically added to the hosts database atstartup. Otherwise, the hosts database will be empty until you read ina host file.

Reading Hosts Files

If you have a hosts file containing a list of machines on your local network,you can load this file into the Hosts Database panel by clicking on theLoad... button and specifying the file name in the resulting Select Filedialog box. Once the contents of the file have been read, the host nameswill appear in the Hosts list. (FLUENT will automatically add the IP(Internet Protocol) address for each recognized machine. If a machine isnot currently on the local network, it will be labeled unknown.)

Copying Hosts to the Network Configuration Panel

If you want to copy one or more of the Hosts in the Hosts Database panelto the Available Hosts list in the Network Configuration panel, select thedesired name(s) in the Hosts list and click on the Copy button. Theselected hosts will be added to the list of Available Hosts on which youcan spawn nodes.

28.3.3 Checking Network Connectivity

For any compute node, you can print network connectivity informationthat includes the hostname, architecture, process ID, and ID of the se-lected compute node and all machines connected to it. The ID of theselected compute node is marked with an asterisk.

The ID for the FLUENT host process is always host. The compute nodesare numbered sequentially starting from node-0. All compute nodes arecompletely connected. In addition, compute node 0 is connected to thehost process.

To obtain connectivity information for a compute node, you can use theParallel Connectivity panel (Figure 28.3.3).

Parallel −→Show Connectivity...


Parallel Processing

Figure 28.3.3: The Parallel Connectivity Panel

Indicate the compute node ID for which connectivity information is de-sired in the Compute Node field, and then click on the Print button.Sample output for compute node 0 is shown below:

----------------------------------------------------------------ID Hostname O.S. PID Mach ID HW ID Name----------------------------------------------------------------node-2 fili irix 16729 2 11 Fluent Nodenode-1 bofur irix 16182 1 10 Fluent Nodehost balin sunos 5845 0 7 Fluent Hostnode-0* balin sunos 5864 0 -1 Fluent Node

O.S is the architecture, PID is the process ID number, Mach ID is thecompute node ID, and HW ID is an identifier specific to the communicatorused.

You can also check connectivity of a compute node in the Network Con-figuration panel by selecting it in the Spawned Compute Nodes list andclicking on the Connectivity button. If you click on the Connectivity but-ton without selecting any of the Spawned Compute Nodes, the ParallelConnectivity panel will open, and you can specify the node there, as de-scribed above. If you select more than one of the Spawned ComputeNodes, clicking on the Connectivity button will print connectivity infor-mation for each selected node.


28.4 Partitioning the Grid


Information about grid partitioning is provided in the following sections:

• Section 28.4.1: Overview of Grid Partitioning

• Section 28.4.2: Partitioning the Grid Automatically

• Section 28.4.3: Partitioning the Grid Manually

• Section 28.4.4: Grid Partitioning Methods

• Section 28.4.5: Checking the Partitions

• Section 28.4.6: Load Distribution

28.4.1 Overview of Grid Partitioning

When you use the parallel solver in FLUENT, you need to partition orsubdivide the grid into groups of cells that can be solved on separateprocessors (see Figure 28.4.1). You can either use the automatic parti-tioning algorithms when reading an unpartitioned grid into the parallelsolver (recommended approach, described in Section 28.4.2), or performthe partitioning yourself in the serial solver or after reading a mesh intothe parallel solver (as described in Section 28.4.3). In either case, theavailable partitioning methods are those described in Section 28.4.4. Youcan partition the grid before or after you set up the problem (by defin-ing models, boundary conditions, etc.), although it is better to partitionafter the setup, due to some model dependencies (e.g., adaption on non-conformal interfaces, sliding-mesh and shell-conduction encapsulation).

If your case file contains sliding meshes, or non-conformal interfaces on!which you plan to perform adaption during the calculation, you will haveto partition it in the serial solver. See Sections 28.4.2 and 28.4.3 for moreinformation.

Note that the relative distribution of cells among compute nodes willbe maintained during grid adaption, except if non-conformal interfacesare present, so repartitioning after adaption is not required. See Sec-tion 28.4.6 for more information.


Parallel Processing

If you use the serial solver to set up the problem before partitioning,the machine on which you perform this task must have enough memoryto read in the grid. If your grid is too large to be read into the se-rial solver, you can read the unpartitioned grid directly into the parallelsolver (using the memory available in all the defined hosts) and haveit automatically partitioned. In this case you will set up the problemafter an initial partition has been made. You will then be able to manu-ally repartition the case if necessary. See Sections 28.4.2 and 28.4.3 foradditional details and limitations, and Section 28.4.5 for details aboutchecking the partitions.

Partition 0 Partition 1After Partitioning

InterfaceBoundary

DomainBefore Partitioning

Figure 28.4.1: Partitioning the Grid



28.4.2 Partitioning the Grid Automatically

For automatic grid partitioning, you can select the bisection methodand other options for creating the grid partitions before reading a casefile into the parallel version of the solver. For some of the methods,you can perform pretesting to ensure that the best possible partitionis performed. See Section 28.4.4 for information about the partitioningmethods available in FLUENT.

Note that if your case file contains sliding meshes, or non-conformalinterfaces on which you plan to perform adaption during the calculation,you will need to partition it in the serial solver, and then read it into theparallel solver, with the Case File option turned on in the Auto PartitionGrid panel (the default setting).

The procedure for partitioning automatically in the parallel solver is asfollows:

1. (optional) Set the partitioning parameters in the Auto PartitionGrid panel (Figure 28.4.2).

Parallel −→Auto Partition...

Figure 28.4.2: The Auto Partition Grid Panel

If you are reading in a mesh file or a case file for which no partitioninformation is available, and you keep the Case File option turnedon, FLUENT will partition the grid using the method displayed inthe Method drop-down list.


Parallel Processing

If you want to specify the partitioning method and associated op-tions yourself, the procedure is as follows:

(a) Turn off the Case File option. The other options in the panelwill become available.

(b) Select the bisection method in the Method drop-down list.The choices are the techniques described in Section 28.4.4.

(c) You can choose to independently apply partitioning to eachcell zone, or you can allow partitions to cross zone boundariesusing the Across Zones check button. It is recommended thatyou not partition cells zones independently (by turning off theAcross Zones check button) unless cells in different zones willrequire significantly different amounts of computation duringthe solution phase (e.g., if the domain contains both solid andfluid zones).

(d) If you have chosen the Principal Axes or Cartesian Axes method,you can improve the partitioning by enabling the automatictesting of the different bisection directions before the actualpartitioning occurs. To use pretesting, turn on the Pre-Testoption. Pretesting is described in Section 28.4.4.

(e) Click on OK.

If you have a case file where you have already partitioned the grid,and the number of partitions divides evenly into the number ofcompute nodes, you can keep the default selection of Case File inthe Auto Partition Grid panel. This instructs FLUENT to use thepartitions in the case file.

2. Read the case file.

File −→ Read −→Case...

Reporting During Auto Partitioning

As the grid is automatically partitioned, some information about thepartitioning process will be printed in the text (console) window. If youwant additional information, you can print a report from the PartitionGrid panel after the partitioning is completed.



Parallel −→Partition...

When you click on the Print Active Partitions or Print Stored Partitionsbutton in the Partition Grid panel, FLUENT will print the partition ID,number of cells, faces, and interfaces, and the ratio of interfaces to facesfor each active or stored partition in the console window. In addition, itwill print the minimum and maximum cell, face, interface, and face-ratiovariations. See Section 28.4.5 for details. You can examine the partitionsgraphically by following the directions in Section 28.4.5.

28.4.3 Partitioning the Grid Manually

Automatic partitioning in the parallel solver (described in Section 28.4.2)is the recommended approach to grid partitioning, but it is also possibleto partition the grid manually in either the serial solver or the paral-lel solver. After automatic or manual partitioning, you will be able toinspect the partitions created (see Section 28.4.5) and optionally repar-tition the grid, if necessary. Again, you can do so within the serial orthe parallel solver, using the Partition Grid panel. A partitioned grid mayalso be used in the serial solver without any loss in performance.

Guidelines for Partitioning the Grid

The following steps are recommended for partitioning a grid manually:

1. Partition the grid using the default bisection method (PrincipalAxes) and optimization (Smooth).

2. Examine the partition statistics, which are described in Section 28.4.5.Your aim is to achieve small values of Interface ratio variationand Global interface ratio while maintaining a balanced load(Cell variation). If the statistics are not acceptable, try one ofthe other bisection methods.

3. Once you determine the best bisection method for your problem,you can turn on Pre-Test (see Section 28.4.4) to improve it further,if desired.


Parallel Processing

4. You can also improve the partitioning using the Merge optimiza-tion, if desired.

Instructions for manual partitioning are provided below.

Using the Partition Grid Panel

For grid partitioning, you need to select the bisection method for creatingthe grid partitions, set the number of partitions, select the zones and/orregisters, and choose the optimizations to be used. For some methods,you can also perform pretesting to ensure that the best possible bisectionis performed. Once you have set all the parameters in the Partition Gridpanel to your satisfaction, click on the Partition button to subdivide thegrid into the selected number of partitions using the prescribed methodand optimization(s). See above for recommended partitioning strategies.

You can set the relevant inputs in the Partition Grid panel (Figure 28.4.3in the parallel solver, or Figure 28.4.4 in the serial solver) in the followingmanner:

Parallel −→Partition...

1. Select the bisection method in the Method drop-down list. Thechoices are the techniques described in Section 28.4.4.

2. Set the desired number of grid partitions in the Number integernumber field. You can use the counter arrows to increase or de-crease the value, instead of typing in the box. The number of gridpartitions must be an integral multiple of the number of processorsavailable for parallel computing.

3. You can choose to independently apply partitioning to each cellzone, or you can allow partitions to cross zone boundaries usingthe Across Zones check button. It is recommended that you notpartition cells zones independently (by turning off the Across Zonescheck button) unless cells in different zones will require significantlydifferent amounts of computation during the solution phase (e.g.,if the domain contains both solid and fluid zones).



Figure 28.4.3: The Partition Grid Panel in the Parallel Solver

Figure 28.4.4: The Partition Grid Panel in the Serial Solver


Parallel Processing

4. You can select Encapsulate Grid Interfaces if you would like the cellssurrounding all non-conformal grid interfaces in your mesh to re-side in a single partition at all times during the calculation. Gridinterfaces must be encapsulated when the grid slides or when themesh is adapted, so, when sliding meshes are present, the Encap-sulate Grid Interfaces option will always be on. If your case filecontains non-conformal interfaces on which you plan to performadaption during the calculation, you will have to partition it in theserial solver, with the Encapsulate Grid Interfaces and Encapsulatefor Adaption options turned on.

5. If you have enabled the Encapsulate Grid Interfaces option in theserial solver, the Encapsulate for Adaption option will also be avail-able. When you select this option, additional layers of cells areencapsulated such that transfer of cells will be unnecessary duringparallel adaption.

6. You can activate and control the desired optimization methods(described in Section 28.4.4) using the items under Optimizations.You can activate the Merge and Smooth schemes by turning onthe Do check button next to each one. For each scheme, you canalso set the number of Iterations. Each optimization scheme will beapplied until appropriate criteria are met, or the maximum numberof iterations has been executed. If the Iterations counter is set to 0,the optimization scheme will be applied until completion, withoutlimit on the maximum number of iterations.

7. If you have chosen the Principal Axes or Cartesian Axes method,you can improve the partitioning by enabling the automatic testingof the different bisection directions before the actual partitioningoccurs. To use pretesting, turn on the Pre-Test option. Pretestingis described in Section 28.4.4.

8. In the Zones and/or Registers lists, select the zone(s) and/or reg-ister(s) for which you want to partition. For most cases, you willselect all Zones (the default) to partition the entire domain. Seebelow for details.

9. Click on the Partition button to partition the grid.



10. If you decide that the new partitions are better than the previousones (if the grid was already partitioned), click on the Use StoredPartitions button to make the newly stored cell partitions the ac-tive cell partitions. The active cell partition is used for the currentcalculation, while the stored cell partition (the last partition per-formed) is used when you save a case file.

Partitioning Within Zones or Registers

The ability to restrict partitioning to cell zones or registers gives youthe flexibility to apply different partitioning strategies to subregions of adomain. For example, if your geometry consists of a cylindrical plenumconnected to a rectangular duct, you may want to partition the plenumusing the Cylindrical Axes method, and the duct using the Cartesian Axesmethod.

If the plenum and the duct are contained in two different cell zones,you can select one at a time and perform the desired partitioning, asdescribed in Section 28.4.3. If they are not in two different cell zones,you can create a cell register (basically a list of cells) for each region usingthe functions that are used to mark cells for adaption. These functionsallow you to mark cells based on physical location, cell volume, gradientor isovalue of a particular variable, and other parameters. See Chapter 23for information about marking cells for adaption. Section 23.9 providesinformation about manipulating different registers to create new ones.Once you have created a register, you can partition within it as describedabove.

Note that partitioning within zones or registers is not available when!the parallel version of FLUENT is used, or when Metis is selected as thepartition Method.

Reporting During Partitioning

As the grid is partitioned, information about the partitioning process willbe printed in the text (console) window. By default, the solver will printthe number of partitions created, the number of bisections performed,the time required for the partitioning, and the minimum and maximum


Parallel Processing

cell, face, interface, and face-ratio variations. (See Section 28.4.5 fordetails.) If you increase the Verbosity to 2 from the default value of1, the partition method used, the partition ID, number of cells, faces,and interfaces, and the ratio of interfaces to faces for each partition willalso be printed in the console window. If you decrease the Verbosity to0, only the number of partitions created and the time required for thepartitioning will be reported.

You can request a portion of this report to be printed again after thepartitioning is completed. When you click on the Print Active Partitionsor Print Stored Partitions button in the parallel solver, FLUENT will printthe partition ID, number of cells, faces, and interfaces, and the ratioof interfaces to faces for each active or stored partition in the consolewindow. In addition, it will print the minimum and maximum cell, face,interface, and face-ratio variations. In the serial solver, you will obtainthe same information about the stored partition when you click on PrintPartitions. See Section 28.4.5 for details.

Recall that to make the stored cell partitions the active cell partitions you!must click on the Use Stored Partitions button. The active cell partitionis used for the current calculation, while the stored cell partition (thelast partition performed) is used when you save a case file.

Resetting the Partition Parameters

If you change your mind about your partition parameter settings, youcan easily return to the default settings assigned by FLUENT by clickingon the Default button. When you click on the Default button, it willbecome the Reset button. The Reset button allows you to return to themost recently saved settings (i.e., the values that were set before youclicked on Default). After execution, the Reset button will become theDefault button again.

28.4.4 Grid Partitioning Methods

Partitioning the grid for parallel processing has three major goals:

• Create partitions with equal numbers of cells.



• Minimize the number of partition interfaces—i.e., decrease parti-tion boundary surface area.

• Minimize the number of partition neighbors.

Balancing the partitions (equalizing the number of cells) ensures thateach processor has an equal load and that the partitions will be readyto communicate at about the same time. Since communication betweenpartitions can be a relatively time-consuming process, minimizing thenumber of interfaces can reduce the time associated with this data in-terchange. Minimizing the number of partition neighbors reduces thechances for network and routing contentions. In addition, minimizingpartition neighbors is important on machines where the cost of initiat-ing message passing is expensive compared to the cost of sending longermessages. This is especially true for workstations connected in a net-work.

The partitioning schemes in FLUENT use bisection algorithms to createthe partitions, but unlike other schemes which require the number ofpartitions to be a factor of two, these schemes have no limitations on thenumber of partitions. For each available processor, you will create thesame number of partitions (i.e., the total number of partitions will be anintegral multiple of the number of processors).

Bisection Methods

The grid is partitioned using a bisection algorithm. The selected algo-rithm is applied to the parent domain, and then recursively applied tothe child subdomains. For example, to divide the grid into four par-titions, the solver will bisect the entire (parent) domain into two childdomains, and then repeat the bisection for each of the child domains,yielding four partitions in total. To divide the grid into three partitions,the solver will “bisect” the parent domain to create two partitions—oneapproximately twice as large as the other—and then bisect the largerchild domain again to create three partitions in total.

The grid can be partitioned using one of the algorithms listed below.The most efficient choice is problem-dependent, so you can try differ-


Parallel Processing

ent methods until you find the one that is best for your problem. SeeSection 28.4.3 for recommended partitioning strategies.

Cartesian Axes bisects the domain based on the Cartesian coordinatesof the cells (see Figure 28.4.5). It bisects the parent domain andall subsequent child subdomains perpendicular to the coordinatedirection with the longest extent of the active domain. It is oftenreferred to as coordinate bisection.

Cartesian Strip uses coordinate bisection but restricts all bisections tothe Cartesian direction of longest extent of the parent domain (seeFigure 28.4.6). You can often minimize the number of partitionneighbors using this approach.

Cartesian X-, Y-, Z-Coordinate bisects the domain based on the se-lected Cartesian coordinate. It bisects the parent domain and allsubsequent child subdomains perpendicular to the specified coor-dinate direction. (See Figure 28.4.6.)

Cartesian R Axes bisects the domain based on the shortest radial dis-tance from the cell centers to that Cartesian axis (x, y, or z) whichproduces the smallest interface size. This method is available onlyin 3D.

Cartesian RX-, RY-, RZ-Coordinate bisects the domain based onthe shortest radial distance from the cell centers to the selectedCartesian axis (x, y, or z). These methods are available only in3D.

Cylindrical Axes bisects the domain based on the cylindrical coordi-nates of the cells. This method is available only in 3D.

Cylindrical R-, Theta-, Z-Coordinate bisects the domain based onthe selected cylindrical coordinate. These methods are availableonly in 3D.

Metis uses the METIS software package for partitioning irregular graphs,developed by Karypis and Kumar at the University of Minnesotaand the Army HPC Research Center. It uses a multilevel approach



in which the vertices and edges on the fine graph are coalesced toform a coarse graph. The coarse graph is partitioned, and thenuncoarsened back to the original graph. During coarsening anduncoarsening, algorithms are applied to permit high-quality par-titions. Detailed information about METIS can be found in itsmanual [110].

Note that when using the socket version (-pnet), the METIS par-!titioner is not available. In this case, METIS partitioning can beobtained using the partition filter, as described below.

Polar Axes bisects the domain based on the polar coordinates of thecells (see Figure 28.4.9). This method is available only in 2D.

Polar R-Coordinate, Polar Theta-Coordinate bisects the domainbased on the selected polar coordinate (see Figure 28.4.9). Thesemethods are available only in 2D.

Principal Axes bisects the domain based on a coordinate frame alignedwith the principal axes of the domain (see Figure 28.4.7). This re-duces to Cartesian bisection when the principal axes are alignedwith the Cartesian axes. The algorithm is also referred to as mo-ment, inertial, or moment-of-inertia partitioning.

This is the default bisection method in FLUENT.

Principal Strip uses moment bisection but restricts all bisections tothe principal axis of longest extent of the parent domain (see Fig-ure 28.4.8). You can often minimize the number of partition neigh-bors using this approach.

Principal X-, Y-, Z-Coordinate bisects the domain based on the se-lected principal coordinate (see Figure 28.4.8).

Spherical Axes bisects the domain based on the spherical coordinatesof the cells. This method is available only in 3D.

Spherical Rho-, Theta-, Phi-Coordinate bisects the domain basedon the selected spherical coordinate. These methods are availableonly in 3D.


Parallel Processing

Contours of Cell Partition

3.00e+00

2.25e+00

1.50e+00

7.50e-01

0.00e+00

Figure 28.4.5: Partitions Created with the Cartesian Axes Method


3.00e+00

2.25e+00

1.50e+00

7.50e-01

0.00e+00

Figure 28.4.6: Partitions Created with the Cartesian Strip or CartesianX-Coordinate Method




3.00e+00

2.25e+00

1.50e+00

7.50e-01

0.00e+00

Figure 28.4.7: Partitions Created with the Principal Axes Method


3.00e+00

2.25e+00

1.50e+00

7.50e-01

0.00e+00

Figure 28.4.8: Partitions Created with the Principal Strip or PrincipalX-Coordinate Method


Parallel Processing


3.00e+00

2.25e+00

1.50e+00

7.50e-01

0.00e+00

Figure 28.4.9: Partitions Created with the Polar Axes or Polar Theta-Coordinate Method

Optimizations

Additional optimizations can be applied to improve the quality of thegrid partitions. The heuristic of bisecting perpendicular to the direc-tion of longest domain extent is not always the best choice for creatingthe smallest interface boundary. A “pre-testing” operation (see Sec-tion 28.4.4) can be applied to automatically choose the best directionbefore partitioning. In addition, the following iterative optimizationschemes exist:

Smooth attempts to minimize the number of partition interfaces byswapping cells between partitions. The scheme traverses the par-tition boundary and gives cells to the neighboring partition if theinterface boundary surface area is decreased. (See Figure 28.4.10.)

Merge attempts to eliminate orphan clusters from each partition. Anorphan cluster is a group of cells with the common feature thateach cell within the group has at least one face which coincides with



an interface boundary. (See Figure 28.4.11.) Orphan clusters candegrade multigrid performance and lead to large communicationcosts.

Figure 28.4.10: The Smooth Optimization Scheme

Figure 28.4.11: The Merge Optimization Scheme

In general, the Smooth and Merge schemes are relatively inexpensiveoptimization tools.

Pretesting

If you choose the Principal Axes or Cartesian Axes method, you can im-prove the bisection by testing different directions before performing theactual bisection. If you choose not to use pretesting (the default), FLU-ENT will perform the bisection perpendicular to the direction of longestdomain extent.


Parallel Processing

If pretesting is enabled, it will occur automatically when you click onthe Partition button in the Partition Grid panel, or when you read in thegrid if you are using automatic partitioning. The bisection algorithmwill test all coordinate directions and choose the one which yields thefewest partition interfaces for the final bisection.

Note that using pretesting will increase the time required for partition-ing. For 2D problems partitioning will take 3 times as long as withoutpretesting, and for 3D problems it will take 4 times as long.

Using the Partition Filter

As noted above, you can use the METIS partitioning method througha filter in addition to within the Auto Partition Grid and Partition Gridpanels. To perform METIS partitioning on an unpartitioned grid, usethe File/Import/Partition/Metis... menu item.

File −→ Import −→ Partition −→Metis...

FLUENT will use the METIS partitioner to partition the grid, and thenread the partitioned grid into the solver. The number of partitions willbe equal to the number of processes. You can then proceed with themodel definition and solution.

Direct import to the parallel solver through the partition filter requires!that the host machine has enough memory to run the filter for the spec-ified grid. If not, you will need to run the filter on a machine that doeshave enough memory. You can either start the parallel solver on the ma-chine with enough memory and repeat the process described above, orrun the filter manually on the new machine and then read the partitionedgrid into the parallel solver on the host machine.

To manually partition a grid using the partition filter, enter the followingcommand:

utility partition input-filename partition-count output-filename

where input-filename is the filename for the grid to be partitioned,partition-count is the number of partitions desired, and output-filename



is the filename for the partitioned grid. You can then read the parti-tioned grid into the solver (using the standard File/Read/Case... menuitem) and proceed with the model definition and solution.

When the File/Import/Partition/Metis... menu item is used to import anunpartitioned grid into the parallel solver, the METIS partitioner parti-tions the entire grid. You may also partition each cell zone individually,using the File/Import/Partition/Metis Zone... menu item.

File −→ Import −→ Partition −→Metis Zone...

This method can be useful for balancing the work load. For example,if a case has a fluid zone and a solid zone, the computation in the fluidzone is more expensive than in the solid zone, so partitioning each zoneindividually will result in a more balanced work load.

28.4.5 Checking the Partitions

After partitioning a grid, you should check the partition information andexamine the partitions graphically.

Interpreting Partition Statistics

You can request a report to be printed after partitioning (either auto-matic or manual) is completed. In the parallel solver, click on the PrintActive Partitions or Print Stored Partitions button in the Partition Gridpanel. In the serial solver, click on the Print Partitions button.

FLUENT distinguishes between two cell partition schemes within a par-allel problem: the active cell partition and the stored cell partition. Ini-tially, both are set to the cell partition that was established upon readingthe case file. If you re-partition the grid using the Partition Grid panel,the new partition will be referred to as the stored cell partition. To makeit the active cell partition, you need to click on the Use Stored Partitionsbutton in the Partition Grid panel. The active cell partition is used forthe current calculation, while the stored cell partition (the last partitionperformed) is used when you save a case file. This distinction is mademainly to allow you to partition a case on one machine or network ofmachines and solve it on a different one. Thanks to the two separate


Parallel Processing

partitioning schemes, you could use the parallel solver with a certainnumber of compute nodes to subdivide a grid into an arbitrary differentnumber of partitions, suitable for a different parallel machine, save thecase file, and then load it into the designated machine.

When you click Print Partitions in the serial solver, you will obtain infor-mation about the stored partition.

The output generated by the partitioning process includes informationabout the recursive subdivision and iterative optimization processes.This is followed by information about the final partitioned grid, includingthe partition ID, number of cells, number of faces, number of interfacefaces, ratio of interface faces to faces for each partition, number of neigh-boring partitions, and cell, face, interface, neighbor, mean cell, face ratio,and global face ratio variations. Global face ratio variations are the min-imum and maximum values of the respective quantities in the presentpartitions. For example, in the sample output below, partitions 0 and 3have the minimum number of interface faces (10), and partitions 1 and2 have the maximum number of interface faces (19); hence the variationis 10–19.

Your aim is to achieve small values of Interface ratio variationand Global interface ratio while maintaining a balanced load (Cellvariation).



>> Partitions:P Cells I-Cells Cell Ratio Faces I-Faces Face Ratio Neighbors0 134 10 0.075 217 10 0.046 11 137 19 0.139 222 19 0.086 22 134 19 0.142 218 19 0.087 23 137 10 0.073 223 10 0.045 1

------Partition count = 4Cell variation = (134 - 137)Mean cell variation = ( -1.1% - 1.1%)Intercell variation = (10 - 19)Intercell ratio variation = ( 7.3% - 14.2%)Global intercell ratio = 10.7%Face variation = (217 - 223)Interface variation = (10 - 19)Interface ratio variation = ( 4.5% - 8.7%)Global interface ratio = 3.4%Neighbor variation = (1 - 2)

Computing connected regions; type ^C to interrupt.Connected region count = 4

Note that partition IDs correspond directly to compute node IDs whena case file is read into the parallel solver. When the number of partitionsin a case file is larger than the number of compute nodes, but is evenlydivisible by the number of compute nodes, then the distribution is suchthat partitions with IDs 0 to (M − 1) are mapped onto compute node0, partitions with IDs M to (2M − 1) onto compute node 1, etc., whereM is equal to the ratio of the number of partitions to the number ofcompute nodes.

Examining Partitions Graphically

To further aid interpretation of the partition information, you can drawcontours of the grid partitions, as illustrated in Figures 28.4.5–28.4.9.

Display −→Contours...

To display the active cell partition or the stored cell partition (whichare described above), select Active Cell Partition or Stored Cell Partition


Parallel Processing

in the Cell Info... category of the Contours Of drop-down list, and turnoff the display of Node Values. (See Section 25.1.2 for information aboutdisplaying contours.)

If you have not already done so in the setup of your problem, you will!need to perform a solution initialization in order to use the Contourspanel.

28.4.6 Load Distribution

If the speeds of the processors that will be used for a parallel calculationdiffer significantly, you can specify a load distribution for partitioning,using the load-distribution text command.

parallel −→ partition −→ set −→load-distribution

For example, if you will be solving on three compute nodes, and onemachine is twice as fast as the other two, then you may want to assigntwice as many cells to the first machine as to the others (i.e., a loadvector of (2 1 1)). During subsequent grid partitioning, partition 0 willend up with twice as many cells as partitions 1 and 2.

Note that for this example, you would then need to start up FLUENTsuch that compute node 0 is the fast machine, since partition 0, withtwice as many cells as the others, will be mapped onto compute node0. Alternatively, in this situation, you could enable the load balanc-ing feature (described in Section 28.5.2) to have FLUENT automaticallyattempt to discern any difference in load among the compute nodes.

If you adapt a grid that contains non-conformal interfaces, and you want!to rebalance the load on the compute nodes, you will have to save yourcase and data files after adaption, read the case and data files into theserial solver, repartition using the Encapsulate Grid Interfaces and Encap-sulate for Adaption options in the Partition Grid panel, and save case anddata files again. You will then be able to read the manually repartitionedcase and data files into the parallel solver, and continue the solution fromwhere you left it.


28.5 Checking and Improving Parallel Performance


To determine how well the parallel solver is working, you can measurecomputation and communication times, and the overall parallel effi-ciency, using the performance meter. You can also control the amount ofcommunication between compute nodes in order to optimize the parallelsolver, and take advantage of the automatic load balancing feature ofFLUENT.

28.5.1 Checking Parallel Performance

The performance meter allows you to report the wall clock time elapsedduring a computation, as well as message-passing statistics. Since theperformance meter is always activated, you can access the statistics byprinting them after the computation is completed. To view the currentstatistics, use the Parallel/Timer/Usage menu item.

Parallel −→ Timer −→Usage

Performance statistics will be printed in the text window (console).

To clear the performance meter so that you can eliminate past statisticsfrom the future report, use the Parallel/Timer/Reset menu item.

Parallel −→ Timer −→Reset

28.5.2 Optimizing the Parallel Solver

Increasing the Report Interval

In FLUENT, you can reduce communication and improve parallel perfor-mance by increasing the report interval for residual printing/plotting orother solution monitoring reports. You can modify the value for Report-ing Interval in the Iterate panel.

Solve −→Iterate...

Note that you will be unable to interrupt iterations until the end of each!report interval.


Parallel Processing

Load Balancing

A dynamic load balancing capability is available in FLUENT. The prin-cipal reason for using parallel processing is to reduce the turnaroundtime of your simulation, ideally by a factor proportional to the collectivespeed of the computing resources used. If, for example, you were usingfour CPUs to solve your problem, then you would expect to reduce theturnaround time by a factor of four. This is of course the ideal situa-tion, and assumes that there is very little communication needed amongthe CPUs, that the CPUs are all of equal speed, and that the CPUsare dedicated to your job. In practice, this is often not the case. Forexample, CPU speeds can vary if you are solving in parallel on a hetero-geneous collection of workstations, other jobs may be competing for useof one or more of the CPUs, and network traffic either from within theparallel solver or generated from external sources may delay some of thenecessary communication among the CPUs.

If you enable dynamic load balancing in FLUENT, the load across thecomputational and networking resources will be monitored periodically.If the load balancer determines that performance can be improved byredistributing the cells among the compute nodes, it will automaticallydo so. There is a time penalty associated with load balancing itself, andso it is disabled by default. If you will be using a dedicated homogeneousresource, or if you are using a heterogeneous resource but have accountedfor differences in CPU speeds during partitioning by specifying a loaddistribution (see Section 28.4.6), then you may not need to use loadbalancing.

Note that when non-conformal interfaces are present, or the shell con-!duction model is used, you will not be able to turn on load balancing.

To enable and control FLUENT’s automatic load balancing feature, usethe Load Balance panel (Figure 28.5.1). Load balancing will automati-cally detect and analyze parallel performance, and redistribute cells be-tween the existing compute nodes to optimize it.

Parallel −→Load Balance...

The procedure for using load balancing is as follows:



Figure 28.5.1: The Load Balance Panel

1. Turn on the Load Balancing option.

2. Select the bisection method to create new grid partitions in thePartition Method drop-down list. The choices are the techniquesdescribed in Section 28.4.4. As part of the automatic load bal-ancing procedure, the grid will be repartitioned into several smallpartitions using the specified method. The resulting partitions willthen be distributed among the compute nodes to achieve a morebalanced load.

3. Specify the desired Balance Interval. When a value of 0 is specified,FLUENT will internally determine the best value to use, initiallyusing an interval of 25 iterations. You can override this behavior byspecifying a non-zero value. FLUENT will then attempt to performload balancing after every N iterations, where N is the specifiedBalance Interval. You should be careful to select an interval that islarge enough to outweigh the cost of performing the load balancingoperations.

Note that you can interrupt the calculation at any time, turn the loadbalancing feature off (or on), and then continue the calculation.


Parallel Processing


chapter 28. parallel processing - enea · chapter 28. parallel processing the fluent solver allows...

Documents