some experiences on parallel finite element computations using ibm/sp2
DESCRIPTION
Some Experiences on Parallel Finite Element Computations Using IBM/SP2. Yuan-Sen Yang and Shang-Hsien Hsieh National Taiwan University Taipei, Taiwan, R.O.C. Contents. Parallel Substructure Method Three Issues : Mesh Partitioning Nodal Renumbering within Substructures - PowerPoint PPT PresentationTRANSCRIPT
Some Experiences on Parallel Some Experiences on Parallel Finite Element Computations Finite Element Computations
Using IBM/SP2Using IBM/SP2
Yuan-Sen Yang and Shang-Hsien HsiehYuan-Sen Yang and Shang-Hsien Hsieh
National Taiwan UniversityNational Taiwan University
Taipei, Taiwan, R.O.C.Taipei, Taiwan, R.O.C.
ContentsContents
• Parallel Substructure Method
• Three Issues : – Mesh Partitioning– Nodal Renumbering within Substructures– Solution of Interface DOFs
• Conclusions
Parallel Substructure MethodParallel Substructure Method
• Partition a structure into several substructures.
• Assign each substructure to a processor.
• Matrix assembly & static condensation within each substructure.
CondensationCondensation
Interior NodesInterface Nodes
Parallel Substructure Method Parallel Substructure Method (cont.)(cont.)
• Solve the displacements of interface DOFs.
• Solve the displacements of inter
nal DOFs in each substructure.
• Perform force recovering in eac
h substructure.
RecoveringRecovering
Interior NodesInterface Nodes
Mesh PartitioningMesh Partitioning
• Requirements
– Automatic Partitioning
– Handling regular & irregular meshes.
– Balanced distribution of number of elements.
– Minimization of number of interface nodes.
Experiences Experiences (Mesh Partitioning)(Mesh Partitioning)
• GR, RST, METIS are used in this work.
• Balanced distribution of number of elements is achieved.
• Condensational load are unbalanced.
0 5 10 15 20 25 30
P00
P01
P02
P03
RSTRST
Substructural Nodal RenumberingSubstructural Nodal Renumbering
• Purpose:– To reduce the skyline of substructure matrix.
• Constraint:– Interface nodes must be numbered after internal nodes
• Reversed Cuthill-Mckee (RCM, Liu & Sherman 1975) is modified and used.
Experiences Experiences (Substructure Nodal Renumbering)(Substructure Nodal Renumbering)
• Help to Reduce the conde
nsational loads.
• Rarely balance the conde
nsational loads among pr
ocessors.
0 5 10 15 20
P00
P01
P02
P03
0 5 10 15 20 25 30
P00
P01
P02
P03
Without Substructure Nodal Renumbering
With modified RCM Substructure Nodal Renumbering
30STORY. RST. With 4 processors
RSTRST
Solution of Interface DOFsSolution of Interface DOFs
• Achieving high parallel effici
ency for linear equation solve
r is not an easy task.
• When NP increases
NI increases
Parallel Efficiency decreases
Experiences Experiences (Solution of Interface DOFs)(Solution of Interface DOFs)
• In this work, a sequential direct method(Cholesky decomposition)
is used.
• NI is affected by both NP and the performance of the partitionin
g algorithm.
Partitioning Algorithms NP NI TIRST 4 48 2.2 sRST 8 112 7.0 sGR 8 127 16.5 s
NP : Number of processors.NI : Number of internal nodes.
TI : Time for solving interface DOFs.
ConclusionsConclusions• Mesh partitioning
– Computational loads of each processor is not necessarily proportional to its number of elements.
– Minimization of interface nodes reduces the interface equations and usually improves the parallel efficiency.
• Substructural nodal renumbering – Substructural nodal renumbering always reduces the condensational
loads.
– But rarely balance the condensational loads among procesors.
• Parallel solution of interface DOFs– High-efficiency parallel solvers of interface equations are needed fo
r improving the efficiency of parallel substructure method.
AcknowledgementAcknowledgement
• This research is supported by the National Science Council of R.O.C., under the project Nos. NSC 86-2211-E-002-029 and NSC 87-2211-E-002-034.
• The parallel computations are performed on IBM/SP2 comupters of National Center for High-performance Computing, Hsin-Chu, Taiwan, R.O.C.
IBM/SP2 in NCHC
• Model– IBM POWER2 SuperChip (P2SC)
• Floating Peak Performance– 480-MFLOPS
• Memory– 128 Mbtyes per node