fem solver paper

Upload: rahul-chandel

Post on 08-Apr-2018

218 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/7/2019 fem solver paper

    1/5

    An Efficient Parallel Solver using Matrix Inversion Method for Linearand Non-linear Finite Element Problems

    P K Gupta, Associate MemberR N Khapre, Non-member

    This paper presents a modified parallel matrix inversion algorithm to solve a set of linear equations. It also discusses itssuitability for finite element analysis. This algorithm is implemented on supercomputer PARAM 10000. Computationaltime results were obtained by solving a problem of analysis of anchorage zone in prestressed post-tensioned concrete beam.These results are compared and discussed with the existing results of a parallel matrix inversion algorithm which oneavailable in literature. This algorithm is then implemented in computer codes for linear and non-linear finite elementanalysis. One typical problem from each category is solved and the computational time variation results are obtained anddiscusses.

    Keywords: Parallel solver; Finite element method; Matrix inversion; PARAM 10000

    P K Gupta and R N Khapre are with Civil Engineering Group, BirlaInstitute of Technology and Science, Pilani, Rajasthan, 333 031.

    This paper was received on September 1, 2004. Written discussion on thepaper will be entertained till August 31, 2005.

    NOTATION

    BW : band width

    e : eccentricity

    [I] : identity matrix

    [K] : global stiffness matrix

    Pk : prestressing force

    INTRODUCTION

    Structural analysis using finite element method is one of thearea in which huge data is require to be handled duringcomputation. Conventional computer takes significantcomputational time to complete such analysis. Parallelcomputing technique can be a better option for such kind ofanalysis that takes much computational time. In finite elementanalysis major portion of the computational time is spent ingetting the solution of the linear equations. Therefore, parallelsolvers could be employed to reduce the computational time.There are different mathematical methods available for gettingsolution of a set of linear equations. These are already in use indevelopment of parallel solvers1-7.

    Shahand Kant1 used parallel Cholesky solver to determine thesolution of a set of linear equations, which are obtained duringthe finite element analysis of fibre-reinforced polymer shells.Khan and Topping2 presented a modified parallel Jacobi-conditioned conjugate gradient method. They discussed andimplemented element-by-element and diagonally conditionedapproaches on distributed memory MIMD architectures.Thiagarajan and Aravamuthan3 presented a preconditionedconjugate gradient finite element solver on 32-node Pentium II

    350 MHz clusters. They also presented and discussed the resultsof components of computational time. Ramesh and Shah4 madea similar attempt of implementation of parallel preconditionedconjugate gradient method. They used Master-Slave approachon transputer-based machine PARAM with 256 nodes having4 MB of RAM. They employed the ring topology for the subworkers (Slaves).

    Gupta and Khapre5,6 developed three parallel solvers using

    matrix inversion method, Gauss-Seidel method and Gausselimination method, to solve a set of linear equations on theplatform of a supercomputer PARAM 10000. These solverswere implemented to get solution of a set of linear equationsgenerated during the analysis of a linear elastic structuralproblem by finite element method. After comparing thecomputational time variation obtained by different solvers, itwas found that the solver developed using matrix inversionmethod is suitable for supercomputers while GaussElimination solver is suitable for conventional computer. Theyalso carried out the comparison of blocking and non-blockingcommunication mechanism and found that both mechanismsare equally effective in communication when incorporated in

    parallel solvers6. They also carried out a study to find the effectof user activities on computational time and found that as theuser activities increase the Real time also increases6. They alsointroduced a term User time that remains unaffected by useractivities and recommended that it should be used to evaluatethe performance of parallel programs6. A study was also carriedout by Gupta, et al7 to explore the suitability ofCandFORTRAN 77 language on supercomputers. A parallel solverusing matrix inversion method was developed on PARAM10000 machine using Cand FORTRAN 77 as programminglanguages. After comparing the computational time results, itwas found that the program written in Clanguage takes less

    time as compared to the program written in FORTRAN 77language. It was also found that the percentage difference in

    44 IE (I) JournalCV

  • 8/7/2019 fem solver paper

    2/5

    User time obtained by both the programs was nearly 25% forevery number of processors.

    An attempt has been made through this paper to improve theprevious work5,6. Parallel solver using matrix inversion method,implemented on PARAM 10000 using Clanguage, presented by

    Gupta and Khapre5,6

    is improved. The suitability of matrixinversion method is also discussed exclusively for finiteelement analysis. A set of linear equations taken fromliterature5 was solved using this developed solver. Thecomputational time results obtained by developed solver arediscussed and compared with the previous literature5,6. Thissolver is then implemented in two finite element codes to solvethe obtained set of linear equations. A typical problem in eachcategory is solved and the computational time variation isobtained. Speed-up achieved by both the finite element codesare calculated and discussed.

    PARAM 10000 ARCHITECTURE OVERVIEW

    PARAM 10000 has a MIMD distributed memory machinearchitecture, developed by Centre for Development ofAdvanced Computing (C-DAC). The machine has four nodes,each having two UltraSPARC-II 64-bit RISC CPUs @ 400 MHzeach, with 2 MB external cache. Each processor has 512 MBmain memory extendable to 2 GB. The PARAM 10000 has thetwo interconnection networks, namely, PARAMNnet and FastEthernet8.

    MATRIX INVERSION METHOD

    Matrix inversion method is one of the basic methods of solving

    system of linear equation [A] [x] = [B ]. In this method, theinverse of matrix [A] is computed and then multiplied with thevector [B ] to get the unknown vector [x]. Mathematical relation[A] [A]1 = [I] is used to generate the matrix [A]1. In the processof matrix inversion, an identity matrix [I] is generated and rowwise operations are carried out on matrix [A] and matrix [I]such that matrix [A] takes form of matrix [I] and matrix [I] getsconverted to matrix [A]1.

    In finite element method, matrix [A] represents the globalstiffness matrix and vectors [B ] and [x] represents global forcematrix and global displacement matrix, respectively. The

    global stiffness matrix is a diagonal matrix in which theelements inside the bandwidth are non-zero and rest otherelements inside the upper and lower triangles of matrix arezero. The appearance of global stiffness matrix and identitymatrix is quite similar. Both matrices contain zero elements intheir upper and lower triangles (Figure 1), hence if one used this

    method, computational affords can be saved. Further, zeroelements also exist inside the bandwidth of stiffness matrix;therefore, lesser computations are required to inverse the globalstiffness matrix.

    Figure 1 Stiffness matrix and identity matrix

    Global P {Number of Processors}

    n {Number of Equations}

    MyRank {Rank of the Processor}

    Rank {Rank of processor holding current row}

    start {Flag indicating starting row number for eachprocessor}

    end {Flag indicating ending row number for eachprocessor}

    i {Variable indicating current row}

    [I] {Matrix indicating inverse of matrix [A]}

    for all P i where 0 < i < Pdo

    Set start

    Set end

    for i = 0 to n 1 step 1

    if diagonal of [A] i = 1.0

    continue

    else

    Set diagonal element of [A] i = 1.0

    Change elements of matrix [I]i

    endif

    for all P i where 0 < i < Pdo

    Find the Rank of current row

    IfMyRank = Rank

    Broadcast current row

    endif

    endfor

    for j= start to endstep 1

    if [A] ij 0.0

    Change non-diagonal element of [A]ij

    = 0.0

    Change elements of matrix [I]ij

    endif

    endfor

    endfor

    for i = start to endstep 1

    Compute [x] i

    endfor

    for all Pi where 0 < i < Pdo

    Broadcast [x] i to All Processor

    endfor

    Figure 2 Parallel algorithm for matrix inversion method

    BW

    Vol 86, May 2005 45

  • 8/7/2019 fem solver paper

    3/5

    ALGORITHMInitially the range of data to be handled by each processor wasdecided. If data distribution was not even, then the remainingdata was distributed to the processors with lower ranks. Afterproper data distribution among the processors, an identitymatrix [I] of size [A] was created by all processors. In theprocess of matrix inversion, row wise operations were carriedout. Every non-diagonal element of matrix [A] was converted tozero and every diagonal element of matrix [A] was made unity.While doing this, the operations were skipped at locationswhere non-diagonal elements have zero value and diagonalelement have unity value. This helped in reducing the number

    of computations.Whatever operations were carried out on matrix [A], sameoperations were also carried out on matrix [I] simultaneously.Each processor operated only those rows, which weredesignated to it to achieve less computational time. Afterfinding the inverse of matrix [A], the unknown vector [x] wascalculated by multiplying [A]1 with [B ]. At this juncture, eachprocessor was having elements of vector [x] those belong to itsshare. Then each processor broadcasted these elements ofvector [x] to the all other processors so that every processorshould have complete vector [x]. Figure 2 shows the algorithmof matrix inversion on parallel computers.

    COMPUTATIONAL TIME RESULTS

    Based on the algorithm discussed, a parallel solver is developed.Data of size 1226 1226 generated from finite element analysis6

    is solved by this developed solver. Computational time resultswere generated and compared with the results of the originalsolver developed by Gupta and Khapre6 in the Table 1. One canobserve that all the components of computational time, namely,Real, User and Communication9 reduce dramatically whenthis solver was used. One can also observe that for singleprocessor nearly 67% of User time as well as Real time can besaved by using the presented solver. As the number of

    processors increase the percentage saving in both timecomponents reduces. The percentage saving in Real time and

    User time reduces up to 29% and 43%, respectively, when eightprocessors were employed. Sudden reduction in Real time canbe observed from one processor to four processors. It can alsobe observed that after four processors, reduction in Real time isgradual but insignificant. It can also be observed that variationof reduction in percentage saving in User time with increase innumber of processor is continuous, whereas in case of Real timethe variation is abrupt (sudden fall of percentage saving at fourprocessors). The user activities are mainly responsible for suchvariations.

    Table 1 Comparison of computational time components

    Processor Developed solver

    Real (s) User (s) Comm (s) Real (s) User (s) Comm (s) Real User

    1 189.86 188.51 0.00 578.16 575.42 0.00 67.16 67.23

    2 173.76 109.30 1.59 299.72 294.14 1.50 42.02 62.84

    3 136.06 90.94 3.57 208.52 200.37 3.61 34.75 54.61

    4 128.05 76.19 4.01 163.36 153.00 4.18 21.61 50.20

    5 110.73 65.25 4.97 155.09 127.17 12.96 28.60 48.69

    6 105.03 56.98 5.34 141.84 108.41 15.61 25.95 47.44

    7 92.35 50.65 6.61 125.84 92.24 14.62 26.61 45.08

    8 94.88 47.11 3.01 134.43 82.95 16.15 29.42 43.20

    Note : Communication (s): Communication time (s)

    Figure 3(a) Eccentrically loaded prestressed concrete beam; and(b) Discretised beam for present study

    (b)

    (a)

    Centre of anchorage plate

    Centre of end block

    ePk Pk

    46 IE (I) JournalCV

  • 8/7/2019 fem solver paper

    4/5

    CASE STUDY

    The above-developed solver was implemented in two differentfinite element codes. One typical problem from each categorywas solved using these two codes and the results of differentcomponents of computational time were obtained anddiscussed.

    Case I: Linear Finite Element Analysis

    A problem of anchorage zone in prestressed post-tensionedconcrete beam is analyzed5. The problem was considered astwo-dimensional plane stress problem (Figure 3(a)) and thebeam was discretized using 4800 three-noded triangularelements with 2501 nodes (Figure 3(b)) resulting in globalstiffness matrix of size 5002 5002. The problem was analyzedby increasing the number of processors from one to five. Eachprocessor required 480 MB of memory for every execution.

    Figure 4(a) shows the variation in the different components ofcomputational time with increase in number of processors. It isobserved that all components of computational time reduceconsiderably with the increase in the number of processors.Figure 4(b) shows the variation in speed-up achieved by theFEM code. It shows almost linear variation in speed-upachieved by Real time as well as in User time. Maximum speed-up

    achieved was 2.8 for five numbers of processors. It can beobserved that the Real time speed-up curve is just below theUser time speed-up curve.

    Case II: Non-linear Finite Element Analysis

    A problem of simple compression of solid cylinder10 havingdimensions, 25 mm radius and 25 mm height was analyzed. The

    Figure 4 Variation in (a) computational time; and (b) speed-up withnumber of processors

    (b)

    30000

    20000

    10000

    01 2 3 4

    Number of processors

    Tim

    e,s

    (a)

    5

    4

    3

    2

    1

    Speed-up

    1 2 3 4

    Number of processors

    Real

    User

    Ideal

    Figure 5 Discretized cylinder and deformed undeformed shape of solid

    cylinder

    Axis of rotationUndeformed mesh

    Undeformed

    profile

    Deformed

    profile

    Figure 6 Variation in (a) computational time; and (b) speed-up withnumber of processors

    (a)

    Real

    User

    Communication

    Vol 86, May 2005 47

    1 2 3

    Number of processors

    10000

    8000

    6000

    4000

    2000

    0

    Time,s

    Real

    User

    Communication

    (b)

    9

    7

    5

    3

    1

    Speed-up

    1 2 3

    Number of processors

    Real

    User

    Ideal

  • 8/7/2019 fem solver paper

    5/5

    cylinder was compressed with a velocity of 25 mm/s till 30%reduction in height was achieved. The reduction was occurred

    in 15 steps. The bottom surface was considered as frictionlessand for top surface, friction factor of magnitude 0.5 was

    considered. The error norm was considered as 0.001 and

    limiting strain rate value was considered as 0.01 to define therigid portion of cylinder. The material behavior was expressedby the equation = k m& ; where the values ofk and m were

    taken as 10 and 0.1, respectively. The cylinder was discretized

    using 400 four-noded rectangular elements with 441 nodes(Figure 5) resulting in global stiffness matrix of size 882 882.

    The problem was analyzed by increasing the number ofprocessors from one to eight. Each processor required 14 MB ofmemory for every execution. The solution procedure was

    iterative and 84 iterations were carried out to analyze theproblem in 15 steps. Figure 5 also shows the deformed-undeformed shape of the cylinder.

    Figure 6(a) shows the computational time variation withincreasing number of processors. It can be observed that bothReal time and User time reduce with the increase in number ofprocessors, whereas Communication time increases with theincrease in number of processors. Rapid reduction in Real timeand User time can be observed from one processor to fourprocessors, after which the reduction is insignificant. Figure6(b) shows the variation in speed-up with the increasing numberof processors. Maximum speed-up achieved by Real time is threeat eight numbers of processors, while the maximum speed-upachieved by User time is 8.6 at eight numbers of processors.

    CONCLUSIONThe paper shows the proper implementation of matrixinversion method in development of parallel solver for finiteelement method. It also discusses how this solver is suitable forfinite element analysis. An efficient parallel solver is presentedto reduce the computational time involve in the finite elementanalysis. When the results of the computational time werecompared with the available literature5, it was found that thedeveloped solver is more efficient than the solver developedearlier5. The paper also shows the efficient implementation ofparallel solver in linear and non-linear finite element codes. Itpresents two different problems and shows how the

    computational time reduces by adopting parallel solver.According to the literature6, to analyze the data of size 1226 1226 using Gauss Elimination method, single processor ofPARAM 10000 machine takes 173.46 s of Real time and 170.36 sof User time. When the same data was analyzed using singleprocessor with the present solver it was found that Real timeand User time taken by present solver were 189.96 s and 188.51 s,respectively. One can observe that the present solver for matrix

    inversion method takes slightly more time than the GaussElimination method and hence it can be concluded that thedeveloped solver can be effectively used on single and multipleprocessor machines.

    ACKNOWLEDGEMENT

    The authors would like to thanks the C-DAC, Pune for thesupport given to this research work through research projectComputer Simulation of Large Deformations Process. Theauthors also acknowledge the support of Image and ParallelProcessing Laboratory, BITS, Pilani, India, for providingparallel computing facilities for this work.

    REFERENCE

    1. M S Shah and T Kant. Finite Element Analysis of Fibre-reinforced

    Polymer Shells using Higher Order Shear Deformation Theories on Parallel

    Distributed Memory Machines. International Journal of Computer

    Applications in Technology, vol 31, 1998, p 1.

    2. A I Khan and B H V Topping. Parallel Finite Element Analysis using

    Jacobi-conditioned Conjugate Gradient Algorithm. Advances in

    Engineering Software, vol 25, 1996, p 309.

    3. G Thiagarajan and V Aravamuthan. Parallelization Strategies for

    Element-by-element Preconditioned Conjugate Gradient Solver using

    High-performance FORTRAN for Unstructured Finite-element

    Applications on Linux Clusters. ASCE Journal of Computing in Civil

    Engineering, vol 16, no 1, January 2002, p 1.

    4. K S Ramesh and M Shah. Implementation of Parallel Preconditioned

    Conjugate Gradient Solver for FEA on PARAM. Proceedings of the

    International Symposium on Scientific Computing and Mathematical Modelling,

    Banglore, December 1992, p 49.

    5. P K Gupta and R N Khapre. Finite Element Analysis of Anchorage Zone

    using Supercomputer PARAM 10000. Proceedings of the International

    Conference Structural Engineering Convention, an International Meet, IIT,

    Kharagpur, December 2003, p 465.

    6. P K Gupta and R N Khapre. Comparative Study of Solution Methods of

    System of Linear Equations on Supercomputers. Proceedings of the

    International Conference Structural Engineering Convention, an International

    Meet, IIT, Kharagpur, December 2003, p 522.

    7. P K Gupta, J P Mishra, R N Khapre and P K Jain. Comparison ofCand

    FORTRAN 77 Languages based on their Performance on PARAM 10000.

    Proceedings of the National Conference on Distributed Computing, NITTE,Karkala, March 2004, p 33.

    8. http://param.bits-pilani.ac.in/

    9. S Das. UNIX : Concepts and Applications. Tata McGraw-Hill, New

    Delhi, 1999, p 48.

    10. S Kobayashi, Soo-Ik Oh and T Altan. Metal Forming and the Finite-

    element Method. Oxford University Press, New York, 1989, p 364.

    48 IE (I) JournalCV