securit page aform iiii~illunliiiiientation approved · a systematic determination of lumped and...
TRANSCRIPT
SECURIT____________________ ___PAGE AForm
ApprovedIIII~illunliiiiiENTATION PAE0MB No. 0704.0188
Ia. REF ''""'I'IIIIIIiI b. RESTRICTIVE MARKINGSUn'classiz iuna SCUIT CLASSIFICTION-AUTH
2a. SECURITY CLASSIFICATION AUTHO LTw_, E C-T-E- 3. DISTRIBUTION /AVAILABILITY OF REPORT
2b. DECLASSIFICATIONDOWNGRADI, EQ 1 991 kt Unclassified/Unlimited
4. PERFORMING ORGANIZATION REP VMBER(S) .S. MONITORING ORGANIZATION REPORT NUMBER(S)CU-CSSC-91-17
,B(
6a. NAME OF PERFORMING ORGANIZATION 6b. OFFICE SYMBOL 7a. NAME OF MONITORING ORGANIZATIONCenter for Space Structures & (If applicable)Controls (Univ of Colorado) , ,,_Naval Research Laboratory6c. ADDRESS (City, State, and ZIP Code) 7b. ADDRESS (City, State, and ZIP Code)Campus Box 429University of ColroadoBoulder, CO 80309-0429 .8a. NAME OF FUNDING/SPONSORING 8b. OFFICE SYMBOL 9. PROCUREMENT INSTRUMENT IDENTIFICATION NUMBER
ORGANIZATION - (If applicable)Naval-Research Laboratory
8c. ADDRESS (City, State, and ZIP Code) 10. SOURCE OF FUNDING NUMBERSCode 5130S PROGRAM PROJECT TASK WORK UNIT
ELEMENT NO. NO. NO. ACCESSION NO.Washington D. C., 20375
11. TITLE (Include Security Classification)Development of Improved Modeling and Analysis Techniques for Dynamics of Shell StructuresFinal Report
12. PERSONAL AUTHOR(S)
K. C. Park and C. Farhat13a. TYPE OF REPORT 13b. TIMECOVERED 114. DATE OF REPORT (Year, Month, Day) 115. PAGE COUNT
Final Report FROM 6/9/87 TO6/8/90 July 24, 1991 11816. SUPPLEMENTARY NOTATION
17. COSATI CODES 18. SUBJECT TERMS (Continue on reverse If necessary and identif by block number)
FIELD 'GROUP SUB-GROUP modeling, shell structures, parallel computations,
. A C finite elements
19. ABSTRACT (Continue on reverse if necessary and identify by block number)
This report contains two related research thrusts: improved shell structuralmodeling for underwater acoustics and massively parallel computations of shell
dynamics response . Improvements on the existing ANS shell elements have been
made and implemented on a testbed software; the module was then delivered to
NRL for their applications. In addition, the so-called frequency-window tailoringof finite element models has been developed so that very high-frequency com-ponents can be accurately modeled with relatively coarse finite element grids,thus resulting in about a factor of five to ten times larger element size than
has been possible in conventional finite element modeling.
(Continued on other side)20. DISTRIBUTION/AVAILABILITY OF ABSTRACT 21. ABSTRACT SECURITY CLASSIFICATION
Q UNCLASSIFIED/UNLIMITED 0l SAME AS RPT. 0l DTIC USERS- Unclassified/unlimited22a. NAME OF RESPONSIBLE INDIVIDUAL 22b. TELEPHONE (Include Area Code) I 22c. OFFICE SYMBOL
K. C. Park (303)492-6330DD Form 1473, JUN 86 ddo en SECURITY CLASSIFICATION OF THIS PAGE
docmen h"s e"n approvedDDoorrpbc UNe.easO and sale tdirst'ribu o s unhnuite&
BLOCK 19 (CONT'D)
Second, several parallel modules using the C* language have been devel-
oped to run large-scale shell dynamics problems on the Connection Machines.
These include: decomposer which takes as input an arbitrary mesh description,
and produces a set of finite element data structures that can be loaded within
one generic CM2 chip; mapper that assigns each of the data structures produced
by the decomposer to a well defined chip; residual evaluator that controls the
direct calculation of element residuals; element library that includes var-
ious finite elements; and visualization kernel that greatly facilitates the
understanding of the computed results.
CU-CSSC-91-17 CENTER FOR SPACE STRUCTURES AND CONTROLS
DEVELOPMENT OF IMPROVEDMODELING AND ANALYSISTECHNIQUES FOR DYNAMICSOF SHELL STRUCTURESFinal Report
by
K. C. Park and Charbel Farhat
July, 1991 COLLEGE OF ENGINEERINGUNIVERSITY OF COLORADOCAMPUS BOX 429BOULDER, COLORADO 80309
91-07000 91 8 05 164
Final Report
on
Development of Improved Modeling and Analysis Techniquesfor
Dynamics of Shell Structures(Contract No. N00014-87-K-2018)
K. C. Park and Charbel FarhatDepartment of Aerospace Engineering Sciences and
Center for Space Structures and ControlUniversity of Colorado,Campus Box 429
Boulder, Colorado 80309
Accesion :or -.... ...
i e :~ ,- July 1991
By.......................... Submitted to:
D A z1i z-Ic i;
Naval Research LaboratoryDist Code 5130S
Washington D. C., 20375.- Technical Monitor: L. Schuetz-Couchman
6
TABLE OF CONTENTS
SUMMARY
ENCLOSED PAPERS AND REPORTS
Alvin, K. F. and Park, K. C.Frequency-Window Tailoring of Finite Element Models forVibration and Acoustics AnalysisCenter for Space Structures and Controis,Report No. CU-CSSC-91-16, July 1991,University of Colorado, Boulder, CO.Also to appear in Computational Methods forAcoustics Analysis, ASME 1991 Winter AnnualMeeting, Atlanta, GA
Park, K. C. and Jensen, D. D.A Systematic Determination of Lumped and ImprovedConsistent Mass Matrices for Vibration Analysis,Proc. the 30th Structures, Dynamics andMaterials Conference, AIAA Paper No. 89-1335,
April 3-5,1989, pp. 1532-1540.
Farhat, C., Sobh, N. and Park, K. C.Transient Finite Element Computations on 65,536 Processors:The Connection Machine,International Journal on Numerical Methods in Engineering,30(1), 27-55 (1990).
Farhat, C., Sobh, N. and Park, K. C.Dynamic Finite Element Simulationson the Connection Machine,International Journal of High Speed Computing,Vol. 1, No. 2, pp. 289-302 (1989)
TABLE OF CONTENTS
SUMMARY
ENCLOSED PAPERS AND REPORTS
Alvin, K. F. and Park, K. C.Frequency-Window Tailoring of Finite Element Models forVibration and Acoustics AnalysisCenter for Space Structures and Controis,Report No. CU-CSSC-91-16, July 1991,University of Colorado, Boulder, CO.Also to appear in Computational Methods forAcoustics Analysis, ASME 1991 Winter AnnualMeeting, Atlanta, GA
Park, K. C. and Jensen, D. D.A Systematic Determination of Lumped and ImprovedConsistent Mass Matrices for Vibration Analysis,Proc. the 30th Structures, Dynamics and
Materials Conference, AIAA Paper No. 89-1335,April 3-5,1989, pp. 1532-1540.
Farhat, C., Sobh, N. and Park, K. C.Transient Finite Element Computations on 65,536 Processors:The Connection Machine,International Journal on Numerical Methods in Engineering,
30(1), 27-55 (1990).
Farhat, C., Sobh, N. and Park, K. C.
Dynamic Finite Element Simulationson the Connection Machine,International Journal of High Speed Computing,Vol. 1, No. 2, pp. 289-302 (1989)
Park, K. C., Pramono, E., Stanley, G. M.
and Cabinegs, H. A.The ANS Shell Elements: Earlier Developmentsand Recent Improvements,Analytical and Computational Models of Shells,Noor, A. K. et al (eds.), CED -Vol. 3,1989, ASME, New York, 217-240.
Stanley, G. M., Cabiness, H. and Park, K. C.
Revised ANS Shell Elements: Implementation andNumerical Evluations, Computational Mechanics '88,S. N. Atluri and G. Yagawa (editors),Vol. 1, Springer-Verlag, 1988, pp. 26.v.1-4.
SUMMARY
This is a final report on the research project supported by the Naval Research Laboratoryunder Grant N00014-87-K-2018, entitled Development of Improved Modeling and AnalysisTechniques for Dynamics of Shell Structures, which covered the period of 09 June 1987to 08 June 1990. The objectives of the research have been: 1) to develop modeling andcomputational techniques suitable for the dynamic analysis of naval shell structures, and2) to investigate, implement and evaluate tools for concurrent processing of very largestructural engineering problems on the Connection Machine.
I. Research Accomplishments
Task 1: Shell Structural Modeling Techniques
This task consists of two related efforts: 1) improvement of the ANS shell elements (Refer-eies 1 through 4) to be; ter capture the coupling effects of mambrane-bending, membrane-transverse shear, and beliding-transverse shear phenomena; 2) modeling techniques forimproving the modes and mode shapes of 'dry' shell structures, particularly for the inter-mediate frequency ranges.
As a results of the first effort, Aashell element software module was implemented anddelivered to NRL and its theoretical aspects was documented in References 13 and 14.Specifically, the new version of the ANS shell elements pass the patch test and considci ablystreamlined, resulting in substantial computational efficiency.
Regarding the modeling of shell structures by the finite elements for accurate intermediate-frequency computations, our initial effort began with tailoring of the mass inatriccs asdocumented in Reference 7. Even though such mass-matrix tailoring gave ribe to a signifi-cant improvement of low-frequency computations, it fell short of yielding alny appreciableimprovement on intermediate to high-frequency computations. This has led us to tailornot only the mass matrices but also a component-by-componcnt tailoling of btiffncs matli-ces. For example, the tailored stiffness matrix consists of the tailored membrane, tailoredbending and tailored transverse shear stiffness matrices. The synthcsis to realize such atailoring was facilitated by the use of the symbolic analysis technique de- eloped Ilevioublyin References 8 through 11.
The so-called frequency-window tailoring of finite element model, as applied to bars andbeams (Reference 12) demonstrates that it can accurately obtain very high-frequency com-ponents with relatively coarse finite element grids, about a facto:- of five to tell times largerelement size that has been possible in conventional finite element modeling. This improve-ment, if proved to be the case for general shells, can have a significant impact on the finiteelement modeling capability of structural acoustics problems in the future.
Task 2: Parallel Computations on the Connection Machine
This task has focused on the use of the Connection Machine as applied to the explicit tran-sient analysis of 'dry' shell structures. Our experience has been documented in References5 and 6. In addition, a framebuffer generated visualization of the transient analysis of ageneric submarine structure was produced as a video tape and delivered to the NRL tech-nical monitor. Specifically, our effort concentrated on the development of several modulesusing the C* language provided by the Thinking Machine Corporation. The modules de-veloped so far include: decomposer which takes as input an arbitrary mesh description, andproduces a set of finite element data structures that can be loaded within one generic CM2chip; mapper that assigns each of the data structures produced by the decomposer to a welldefined chip; residual evaluator that controls the direct calculation of element residuals.where "direct" means that no element stiffness matrices are evaluated; and clement library
that includes a 3D 2-node truss, a 3D 2-node Bernouilli beam, a 3D 2-node Timoshenkobeam, a 3D 8-node brick, a 2D 4-node quadrilateral and a 4-node ANS shell element. Thesemodules, when interfaced with visualization kernel that was developed under AFOSR andNSF grants greatly facilitates the understanding of the computed results.
Our experience so far indicates that this highly parallel processor can outperform vectorsupercomputers such as the CRAY family on explicit computations but not on implicitones. Based on the observations obtained during the present study, the following is asummary of the key conclusions:
(1) The current CM2 processor memory size of 64 Kbits penalizes high order elementsin the sense that only small VP (virtual processor) ratios can be achieved. Thus tilecurrent configuration favors simpler elements. (This restriction should disappear infuture CM2 models which will have 1Mbit of memory per processor.)
(2) Mesh irregularities slow down the computation speed in various ways.
(3) The Data Vault is very effective at reducing I/O time.
(4) The Virtual Processor concept outperforms substructuring.
2
REFERENCES
1. Park, K. C. and Stanley, G. M., "A Curved C' Shell Element Based on Assumed Natural-Coordinate Strains," To appear in Journal of Applied Mechanics, 1986.
2. Stanley, G. M., Continuum-based shell elements, Ph.D. Thesis, Stanford University, 1985.
3. Stanley, G. M., Park, K. C. and Hughes, T. J. R., "Continuum-based Resultant Shell Eie-ments," Finite Element Method for Plate and Shell Structures, Volume 1: Element Technol-ogy, ed. by Hughes, T. J. R. and Hinton, E., Pineridge Press International, Swansea, U. K.,1986, pp. 1-45.
4. Park, K. C., Stanley, G. M. and Cabiness, H., "A Family of C' Shell elements Based onGeneralized Hrennikoff's Method and Assumed Natural-coordinate Strains," Finite ElementMethods for Nonlinear Problems, P. G. Bergan(editor), Springer-Verlag, 1986, 265-282.
5. Park, K. C., Pramono, E., Stanley, G. M. and Cabiness, II. A., "The ANS Shell Elements:Earlier Developments and Recent Improvemcrts," in Analytical and Computational Modelsof Shells, Noor, A. K. et al (eds.), CED -Vol. 3, 1989, ASME, New York, 217-2,10.
6. Stanley, G. M., Cabiness, H. and Park, K. C., "Revised ANS Shell Elements: Implementa-tion and Numerical Evluations," ,nomputational Mechanics 'W8, S. N. Atluri and G. Yagawa(editors), Vol. 1, Springer-Verlag, 1988, pp. 26.v.1-4.
7. Park, K. C. and Jensen, D. D., "A Systematic Determination of Lumped and ImprovedConsistent Mass Matrices for Vibration Analysis," Proc. the 30th Structures, Dynamnics andMaterials Conference, AIAA Paper No. 89-1335, April 3-5,1989.
8. Park, K. C. and Flaggs, D. L., "An Operational Procedure for the Symbolic Analysis of theFinite Element Method," Comp. Meth. Appl. Mech. Engr., 42, (1984) 37-.16.
9. Park, K. C. and Flaggs, D. L., "A Fourier Analysis of Spurious Mechanisms and '.ocking inthe Finite Element Method," Comp. Meth. Appl. Mech. Engr., 46, (198-1) 65-81.
10. Park, K. C., "Symbolic Fourier Analysis Procedures for C0 Finite Elements." in : Inmovative
Methods for Nonlinear Analysis, W. K. Liu, T. Belytschko and K. C. Park(editors), PineridgePress, Swansea, (1984), 269-293.
11. Park, K. C. and Flaggs, D. L., "A Symbolic Fourier Syntl sis of a One-Point IntegratedQuadrilateral Plate Element," Comp. Meth. Appl. Mech. Engr., 48, (1985) 203-236.
12. Alvin, K. F. and Park, K. C. "Frequency-Window Tailoring of Finite Element Models forVibration and Acoustics Analysis," to appear in Cornputational .\Iethods fior .lcoutic Anal-ysis, ASME Symposia Series, 1991, Winter Annual Meeting, 1991.
13. Farhat, C., Sobh, N. and K. C. Park, "Transient Finite Element Computation.% on 65,536Processors: The Connection Machine," International Journal on Naimcrical Mctlhd in En-gineering, 30(1), 27-55 (1990).
14. C. Farhat, N. Sobh and K. C. Park, "Dynamic Finite Element Simulations on the ConnectionMachine," Intcrnational Journal of High Speed Computing, Vol. 1, No. 2, pp. 239-302 (1989)
3
CU-CSSC-91-16 CENTER FOR SPACE STRUCTURES AND CONTROLS
FREQUENCY-WINDOW TAILORING
OF FINITE ELEMENT MODELSFOR VIBRATION AND ACOUSTICSANALYSIS
by
K. F. Alvin and K. C. Park
July, 1991 COLLEGE OF ENGINEERINGUNIVERSITY OF COLORADOCAMPUS BOX 429BOULDER, COLORADO 80309
Frequency-Window Tailoring of Finite Element Modelsfor
Vibration and Acoustics Analysis
K. F. Alvin and K. C. ParkDepartment of Aerospace Engineering Sciences and
Center for Space Structures and ControlsUniversity of Colorado
Campus Box 429Boulder, Colorado 80309
January 1991
ABSTRACT
A frequ-ncy.window tailoring tcchnique is proposed for improved finite clement modelingof structures for frequencies and their mode shapes in tLe acoustic range. The techniqueis based on the tailoring of three element attributes: frequenc) -tailored mass matrix, en-hancement of stiffness matrix by a weighted spectral decomposition of membrane, bendingand transverse shear energy for a desired frequency range (window), and a discrete Fouriersynthesis of the resulting elemental eigenproblcm models. The proposed technique has beenapplied to the vibration problems of bars and beams, which illustrate the effectiveness ofthe technique over conventional finite element modeling techniques.
I . (
1.0 Introduction
The response accuracy of finite element methods applied to linear structural dynamicsproblems is a furuction of both the finite element spatial discretization and the time domainintegration techniqLe applied to the coupled ordinary differential equations of the discretemodel. Traditional techniques in improving the spatial discretization obtained via thefinite element method are dominated by the so-called h-refinements and p-refinements.
In the first approach, the element mathematical formulation is held fixed while the numberof elements (and number of global variables) is increased to obtain the desired spatialaccuracy. The p-refinement, in contrast, holds the number of elements constant whileincreasing the order of the displacement field interpolations within the element. Thisalso leads to an increase in the number of variables, but alters the fundamental elementbehavior. For example, one might refine a simple one-element truss model (i.e. springelement) by introducing a mid-point node. With a h-refinement: we would change themodel from one linear-displacement element to two linear elements which share the mid-point node. A p-refinement, on the other hand, would exploit the additional node to replacethe linear element with a single three-node bar element employing quadratic interpolationsof the internal displacement field.
A common limiting factor for both of these refinements is that, once the grid sizes and ap-proximate interpolation functions are decided upon, the accuracy that can accrue from theresulting discrete model is fixed. Specifically, while the convergence of the low frequenciesand their mode shapes is in general assured as the grids and/or interpolation order areincreased, there has been lack of a systematic convergence measurement for frequenciesand their mode shapes ranging from intermediate to acoustics components. Consequently,this lack of high-frequency convergence assessment has led t) the belief that it is hopelessto capture with high accuracy an acoustic range of frequencies and their mode shapes bythe finite element approach within feasible computational means.
The development of consistent mass discretization [11, however, has moti,ated _ numberof investigators to study the wave dispersion characteristics of various mass modelingprocedures for finite element analysis [2-11). These efforts hhave included assessments ofmass lumping for both constant-strain and higher-order elements, and point out clearly howmass modeling, independent of mesh size and displacement interpolation, can significantlyaffect model accuracy. Park and Jensen [121 use wave dispersion analysis to provide asystematic relatiionship between lumped and consistent mass discretizations, and showhow averaging or tailoring the mass lumping can significantly improve rc- ,onse accuracyat specific higher frequencies. This approach is not adequate, how c cr, for obtaininghighly accurate acoustic frequency ranges, and furthermore leadb to tion diagonal massmatrices, which are undesirable for simulation via explicit time integration methods onmassively-parallel computers. Thus, there remains a need for finite element approximationswhich accurately capture acoustic components while ideally maintaining a diagonal masscoefficient for computational considerations.
2
The present paper can be viewed as an initial attempt to fill this void so that a method thatcan eventually lead to adequate finite element modeling of the acoustic range frequenciesand their mode shapes. To this end, we retain the two conventional model improvementtechniques, viz., the h and p-refinements, but not to their extreme. We introduce a thirdcomponent, which first breaks down the elemental attributes, parameterizes those decom-posed attributes, and then recombines them based on a discrete Fourier synthesis so thatthe discrete characteristic dispersion curves match, for a specified range of frequencies, asclosely as possible to those of the continuum case, hence the name frequency-window tai-loring technique. The elemental attributes usually consist of mass matrices of linear andquadratic intcrpolations, stiffness matrices of membrane, bending and transverse shearcomponents of constant and linear strain interpolations. It will be subsequently shownthat the piesent frequency-window tailoring technique yields dramatically improved vibra-tion analysis performance beyond either of the traditional methods for the same mesh size,thus providing an adequate accuracy with an affordable mesh refinement. The rest of thepaper is ogranized as follows.
Section 2 briefly reviews the discrete Fourier analysis technique [13,14] as applied to two-noded and three-noded bar elements. The discrete dispersion curves are then comparedwith that of the continuum case. The establishment of this comparison forms the basisfor the present element synthesis or frequency-window tailoring technique. In addition,a similar discrete Fourier analysis of a Timoshenko beam and its correlation with thecontinuum case is presented. Thus, discretization accuracy of not only the membrane butalso the bending and transverse shear phenomena can be assessed in a quantitive manner.Of particular interest from these analyses is the appearance of a pronounced jump in thefrequency for the case of the quadratic bar element, and also for the case of the quadraticTimoshenko beam, although not as much pronounced. These jumps occur at t'ie modeshape that corresponds to ke = 7r/2 where k is the wave number and e is the elementallength, which is clearly at the admissible wave number range.
A parameterized tailoring of the bar element discretization is described in Section 3. Forthe case of the bar, a diagonal mass is first constructed as a linear combination of the linearand quadratic elements. A parameterized stiffness matrix is then constructed in a similarmanner. These two parameters are then determined by requiring, for a specified range offrequencies, the discrete frequencies as close as possible to those of the continuum case.Encouraged by the success of the bar synthesis, a similar synthesis technique is applied tobeam elements. This is carried out by introducing a three-parameter optimization process,viz., the mass parameter, the bending parameter and the transverse shear parameter.These results are reported in Section 4. Finally, discussions and some concluding remarksare offered in Section 5.
3
2.0 Discrete Fourier Analysis in Vibration of Finite Elements
The basic analysis tool that we are about to employ throughout the paper is the discreteFourier method [12-14]. There are two important properties of the discrete Fourier methodthat are attractive for the present purposes: symbolic representation of the element at-tributes and the direct comparison of the discrete dispersion curves with the correspondingcontinuum ones. We now illustrate the method, for the sake of clarity and simplicity, byway of one-dimensional bar problems.
2.1 Fourier Analysis of Bar Elements
Consider the governing partial differential equation for a uniform elastic bar given by
02u _02up-- t 2 E(1)
where p is the mass density, E is Young's modulus, u is the axial displacement variable,and t and x are time and the axial position along the bar, respectively. The Fouriertransformation of (1) can be performed by introducing the following form of u:
u -- iji(wt - kz) (2)
which, when substituted into (1), yields its characteristic equation as
(1 = (kl) 2 (3)
where I is a characteristic length, k is the wave number, c is the continuum wave speedequal to V-l, and (El) and (kl) are the normalized (nondimensional) frequency and wavenumber, respectively. The characteristic equation (3) implies that, for the continuum case,the normalized frequency wi/c is linearly proportional to the nondimensional wave number(kl).
Let us now consider a two-noded linear bar element. Assembling a uniform mesh of twobar elements of length I (see Figure 1), we obtain the discrete equation at the interior nodemn, as
EApAln= (u.. - 2Un + u,.+i) (4)
The discrete solution analogous to (2) is of the form
Ui (t) = lt~ne i (w i - k ( zi - .)) (5)
4
which, when applied to (4), yields the following discrete characteristic equation:
(1) = (kl) 2 (6)
where the discrete wave number k is defined by
Ssin 1 0 < kl <7r (7)2
Hence, comparing (6) with (3) it is observed that the effect of discretization is embodiedin the approximation of the continuum wave number k by the discrete wave number k.
In order to extend the above analysis applicable to higher-order element interpolations, wegene:alize the discrete Fourier analysis as follows. Instead of representing the displacementsof adjacent nodes by (5), the Fourier expansion is assumed to hold only for alternate nodes(see Figure 1), and so consider the two coupled difference equations at nodes m - 1 andM:
pAMim'.,_ = A(u - 2Un-I + Urn)1 (8)
EApAlfm = T (urn-i - 2umn + urn+l)
Substituting (5) into (8) together with the expressions
ii - - W2 UM 1iM = _ 2Umr--1i -W2miUrn-1m"- -O2Urn
2ik U-2ikl (9)Urn--2 =-- Ur Un+l "- e-2 i-
"we obtain the following Fourier-transformed equation:
L(k,w)fi 0 (10)
whereC 2-2) - k+ )]
L(k,o, (= +e2 2-(!, (11)
U{um-i umT (12)
The characteristic equation is found by requiring a non-trivial solution to (10), viz.,det L = 0, from which two characteristic roots are found as
5
W12 =1)
(13)
The first root clearly agrees with that of the case of single interior-node equation (6),whereas the second does not appear to be consistent with the physics of the problem. Ona closer examination, however, it can be shown that the second root corresponds nothingbut to the r-phase-shifted case of (6). This can be explained as follows.
Note that, in terms of their eigenvectors, the first root is associated with um,-1 = eiklum,while the second with um-1 = ei(-kI)um. Therefore, the proper wave number for thesecond root is
(kl) 2 = r- kl , O< k <_ (14)
so that
(W1~ -(i (;1 )24-1 (11) 2c :2 2 l =(- 2 (5
The preceding analysis in terms of two coupled difference equations enables us to properlyinterprete the multiple characteristic roots associated with high-order elements. We arenow in a position to take on the discrete Fourier analysis of quadratic bar elements.
The discretization of the bar equation (1) by the quadratic elements (see Figure 2) yieldsthe following two coupled difference equations:
EApAliim-1 = I (uB-2 - 2urn-1 + urn)
pAl.. EA2---u, =T (-Um-2 + 8 Urn-1 - 14u, + 8 um+l - Utn+2)
where the first is for the mid-node and the second for the end node and a diagonal massmatrix is used. The discrete Fourier-transformed operator L for the above 3-noded barequations and the resulting characteristic equation can be derived as:
q(k,w)= [2-( )- (1 + e2ikl)
7re 2ikl) 7 cos2k I (1 _ 2 ]) 2 4
1 ck- +3(1-cos2kl) = 0 (18)C ~ 2
6
Figure 3 compares the dispersion curves obtained for the linear and quadratic elements,as compared with that of the corresponding continuum equation (1). By invoking thesame interpretation discussed in conjunction with the double roots given by (13) for thelinear element, the upper root of the quadratic element is plotted versus the redefined wavenumber 7r - ki. It is clear from the results that the quadratic element performs better thanits linear counterpart for an equivalent mesh size.
It is also noted, however, that the quadratic element gives rise to a discontinuity in the wavenumber/frequency relation at ki = 7r/2. That is, there is a range of frequencies the discretemodel will simply "skip" over. As this discrete "forbidden" zone occurs in the middle ofte spectrum of discrete behavior for the element mesh, it can be of particular concernand importance to acoustics and wave propagation analyses. Hughes [15], in studyingmass matrix formulations and their effect on low frequency convergence for course meshes,produced numerical results consistent with the discrete Fourier analysis for quadratic barand beam elements.
Figure 4 illustrates how dramatic this behavior can become for the solution of transientdynamics problems using quadratic bar elements. It shows the power spectral density ofa nodal displacement response for a random initial displacement input, using a uniformmesh of 25 quadratic (3-node) bar elements and an implicit mid-point time integrationalgorithm. The "forbidden" discrete frequency zone in the range of frequences between1.45 and 1.75 verifies numerically the Fourier analysis results for the quadratic bar elementshown in Figure 3. Such forbidden discrete frequency zones, if they persist for quadraticelements such as 6 and 9-node shell elements, 10-node tetrahcdral solids, and so forth, willhave a profound effect on their ability to capture accurately high-frequency responses.
7
2.2 Discrete Fourier Analysis of Timoshenko Beam Elements
The strong form for the transverse beam vibration problem is the following sct of coupleddifferential equations
82w /8 2 w89\
pd-Ot = GA X2 -_ (19)
pI- = EI--2 - GA aw - ja t2 8x _X2 - U
where A and I are the cross-sectional area and moment of inertia, G is the shear modulus,and w and 0 are the transverse displacement and generalized rotation, respectively. TheFourier solution of (19) is assumed to be of the form
1 - fio0ei (w t - k x ) (20)
where
fU=LW 9jT
fi0 = Lw0 J T
wo = W(x t = 0) (91)
Oo = O(X = 0, t = 0)
leads to the transformed system
L(w,k)fiO = 0 (22)
with the continuum Fourier matrix operator, L(w, k), given by
- ~2+ \(kl) 2 !L -iAl(kl) 123L~, ) _A](23)
i_(kl) )2 + 1(kI)2 +
wherew = A = Vr-i1E, -y = I/Al2 (24)
and the continuum frequency equation derived from (23) is
(~4- [(A + 1)(kl) + -] + (U)" = o (25)
By inspection, there are two roots of (25) for each wave number value, and thus two uniquevibration modes, the bending and the shear wave modes, which are plotted in Figure 5.The shear wave is associated with extremely high frequencies and not of primary concern
8
to the dynamist; hence we will focus on element tailoring only with respect to the bendingmode.
For the case of the two-noded linear beam approximation, a similar procedure as employedfor a linear bar element leads to the following discrete Fourier matrix operator, L(w, k),given by
[-) (-)2 + A(k1I) 2 -iAl(!l)Var ]L(w),k) = ,Cz)/ ( )2 + }(1l)2 +(26)
where k is defined by (7) and oer is given by
ar= 1 (l) 2 (27)4
Therefore, the discrete frequency relationship for the linear Timoshenko beam element isgiven by
(~4 _ [(A + 1 )(1-1)2 + eAr] + 1)4 = 0 (8
Figure 5 shows the resultant linear element frequency/wave number relati..nshipi obtainedfrom (28), along with the corresponding continuum results from (25). Jus o as with the barelement, the linear beam element converges quite well at low frequencics, then graduallydiverges from the continuum curve.
Similarly, for the case of a three-noded quadratic beam element, utilizing the transverse
shear and bending stiffness and the lumped mass matrices, we obtain the discrete Fourier-
transformed matrix operator L as follows
L Q = s~hear + K beding - ) MIUMped (29)
where
[ ~~2 2A0- 1 2ikl) L (i + e2ikl)1
Koshear 0A ,k ((- k) 2i+ -" (30)-1+ e-2'' A( (" ( .-- -- -t (I. - i t ) " .- 2k,) iAl ('h, 2kI)
2k)+ -2ikl) _iA (sin 2k1)(- C- _f 1 L(2 - cos 2kl).
[0 0 0 0K- 0 --n (o A ( 3 1 )'0 00b(1id +0 e-2ik)- ( +c2k) (
[0- (1 + o~1 i-x(7 co,2kl)]
10 0 0
Ml1umpd = 0 0 (32)0 0 1/2 0 ()1-/2
from which the following characieristic equation results:
(1) (W,) 6 ±C(+W1 S~?± (33)+C c - + C2 + C3 + C4l""(3
where
c = A(cos2kl-4)-(A + ) 11+ cos2k1
C2 = 3(1 - cos2kl)(A' + 1)+ A (1 - cos2k +l) + 1+ cos2k1)
A " (34)+ (61 + 10 cos 2k1 + cos2 2k1) + -(11 + cos2k1
+127y 4 '
c3= 3(cos 2k1 - 1)(11 + cos 2kl)(A + I) + (cos92k1 - 26 cos 2k1 - 47)2 -y
C4 "- (I - cos 2kl)2
Figure 6 illustrates the characteristic frequency vs. wa, number relation represented by(33), along with the corresponding continuum curves f-,m (25). As with the quadraticbar vibration, the quadratic beam displays the frequency jumping phenomenon in both itsbending and shear wave curves, although it is not as pronounced as in the case of membranewaves approximated by standard quadratic bar elements. At this point the analyst mightconclude that, as the quadratic beam elements do not exhibit as pronounced forbiddendiscrete frequency zones as that found in the bar element, it may not be of conc(.rn for wavesother than membrane cases. In the next section, however, we show that, while tailoring theelement through parameterization can lead to overall improvements in performance, theanalyst must be careful to avoid "over-optimizing" the higher frequency ranges as it maytend to accentuate the frequency-jumping phenomenon inherent in highcr-ordcr elements.
10
3.0 Frequency-Window Tailoring of Bar Elements
We have shown in the preceding section that the discrete Fourier analysis can, for a uniformmesh size, give an analytic/symbolic characterization of the finite element diseretizations.For vibration analysis purposes, these characterizations can then be used to assess quantita-tively the discretization accuracy as they can be directly compared with the correspondingcontinuum characteristics, viz., wave dispersion characteristics. Thus, the disc:n-te Fourieranalysis technique can be applied not only for the prediction of the rcsultinj,, discrctiza-tion for vibration analysis, but more importantly for the tailoring of clemexl. attributesin order to improve the finite element model accuracy. For example, it was sllown in [12]that, for a given clement stiffness matrix, a tailored mass matrix as a linear c.)mbinationof the lumped and consistent mass matricies can significantly improve the .ccuracy ofthe low frequencies and their mode shapes. However, little improvement can '.e made forhigh-frequency components only by mass-matrix tailoring.
In order to improve the accuracy of the finite element models for the high-frequency com-ponents, we are motivated to tailor the element stiffness matrices for a given nodal patternin addition to mass tailoring. A simple case to test such a stiffness tailoring concept isto employ a three-noded discrete nodal pattern for a bar so that one can work with twodi-rete element stiffness matrices: a 3 x 3 quadratic element stiffness matrix and the as-s:"'bled stiffness matrix of two linear elements. Hence, we obtain the following threc-nodedt, .. ored element mass and stiffness matrices:
M = otmM qu adratic + (1 - am)Mmnear
K =kK quadratic + (1 - ak)Klinear
whe-e M tlnear and K linear are the assembled matrices of two linear elements, and &,n andak are coefficients to be determined.
i ! II
3.1 Discrete Fourier Analysis of Tailored Three-Noded Bar Element
The governing nodal difference equations at the mid-span node m - 1 and the adjacentend node m are given by (see Figure 2)
(3 + a)iimi =(3+ ak) (m -2UmI +Un) (36)
P 3-am)iim = (3 + ak) (Umi - Ju + &ml k U,, 2 2 + )m
The Fourier-transformed discrete operator L for the tailored bar with the embedded per-formance parameters am and ak can be derived as
L(k,)= [6+2ak -(3+ am) (_)2 -(3 + ak) (1+ e2ikl)1 ] (37)-(3 ( +ae2 ) 6 + ak(l + cos2k/) -(3 - am) '
The characteristic equation and its roots are then found to be
c1 - C2 + C3 = 0 (38)
where2
= am
C2 z (3 - am)(6 + 2 ak) + (3 + am) (6 + ak (I + cos 2k1)) (39)
C3 -- (6 + 2 ak) (6 + ak(1 + cos 2k1)) - 2(3 + ak)2 (1 + cos 2k1)
S) I 2c, (40)
W2= c. + /cT - 4cc 3
c ) 2 2c,
12
3.2 Frequency-Window Tailoring of the Bar Element
Our objective in tailoring the bar element is to minimize the error between the exact anddiscrete eigenvalues in an average sense over a finite range of the frequency spectrum.To this end, for each frequency-window range a < kI < b, we perform the followingminimization:
min 1, [1 2 2 d(kl) (41)
subject to (3), (39), and (40).
The optimization problem posed by (41) is not easily solvable in a closed-form sense,however, and since this element presents the simpliest form of Fourier equations thatare of practical interest, we wish to consider a numerical complement to (41) which canbe addressed by a standza quadratic optimzation technique. Therefore, the frequency-window tailoring method is recast as
min J (42)
where
(w= n 12
J= [1 - c discrdete (43)i=o C exact, k,=a+-'(b-a)
subject to (3), (39), and (40).
Numerical optimization of (42) was accomplished using quabi-Newton methods with cen-
tral difference derivative approximations [16]. Both DFP and BFGS formulae were utilized
in approximating the inverse Hessian matrix, though their performance was generally com-parable.
13
3.3 Tailoring Results for the Bar Element
Twelve frequency windows were considered for tailoring of the bar element. The first sixcover the wave number range of 0 < k1 < ,r in even intervals; the results can be found inTable 1. The last six frequency windows cover progressively wider ranges of k1 culminatingin the final result which tailors cem and ak for the entire element spectral range; theseresults are shown in Table 2. In all cases, the numerical optimization used the quadraticelement parameters as a starting point, and the initial and final values of J are noted.Figures 7 through 14 illustrate a selection of these resilts, along with the correspondingcurves for the continuum equation and the discrete linear and quadratic elements. Inall cases, the tailcring method produces marked improvements for each specified window,most dramatically for high-frequency and/or high-wave number ranges.
4.0 Frequency-Window Tailoring of Beam Elements
Encouraged by the improved accuracy of the tailored three-noded bar elements for window-by-window frequency ranges, this section extends the basic tailoring procedure to the Tim-oshenko beam element. In doing so, we adopt the same three-noded beam cl-ient nodalpattern as in the case of the tailored bar element with one important additional feature.The tailored element stiffness matrix consists of the tailored bending and the tailoredtransverse shear contributions, each of which in turn consists of a linear combination of aquadratic and a corresponding assembled two-element linear component, viz.,
K " Kshear + Kbending
•Khear =. Iquadratic (1 linearKsea --- s3-hear + (1- ce')Kt shea r (4Kbndn Cb quadrati:c l . _ Finear"(4
K bending =- O .bending + (1 -a Cb)Kbending,, Mquadratie lna
Mlumped mped -(1 -a Cm) lined
14
4.1 Tailoring Results for Beam Elements
The tailoring method proceeds as with the bar element by a discrete Fourier synthesis ofthe elemental eigenproblem, incorporating the resulting equations into the performanceindex (43), and a numerical optimization of the performance index to obtain the tailoringparameters a., ab, and a,. As noted in Section 2.2, the frequency-window tailoring hasfocused only on the bending curve waves. If necessary, a shear wave tailoring can beperformed as well.
Results of several frequency-window tailoring can be found in Table 3 and Figures 11through 14. Case I summarizes the tailored mass and stiffness parameters for the rangeof M: < kI < M. As can be observed in Fig. 12, the accuracy improvement by the tailoredelement is rather dramatic. Errors of up to 6% in predicted frequencies from the nominallinear and quadratic elements are reduced to less than 0.5% over this entire low frequencywindow. In addition, Figure 11 illustrates how the tailored element also exhibits a markedimprovement in frequency prediction over the full discrete frequency range, far beyond thedesignated "design" window.
Case II lists the tailoring of mass and stiffness matrices performed in the high-wave numberrange of 1 < kl < E. As Figure 13 illustrates, however, convergence within the desiredspectrum does not imply superior overall behavior (the limits of the tailored spectrum areshown as vertical lines in the figure). Here, the optimized parameters tend to accentuatethe forbidden discrete frequency zone, and allows significant errors in the low to middle
spectrum that was optimized in Case I. These errors are as much as 60% with respectto the exact continuum solution in this area of the spectrum. Clearly, the influence ofthis "forbidden zone" is significant enough to warrant careful attention to the selection offrequency optimization windows and performance indicies.
In addition to the aforementioned problems in low frequency ranges, 1he unconstrainedoptimization of Case II resulted in parameters which lead to a negative-definite elementstiffness matrix. Thus, the performance exhibited by the dispersion curves in Figure 13is not practically realizable. Case Ila used the unconstrained optimization results as astarting point, then relaxed the stiffness tailoring parameters to regain the correct positivedefinite character of the element stiffness while retaining much of the improved elementbehavior in the desired frequency window. The issues of element feasibility and tailoringparameter constraints are covered in more detail in Section 4.2.
The final case studied, Case III, is an attempt to optimize the overall element behavior,producing reasonably small errors in the middle to high spectrum while maintaining ac-curacy in the low spectrum. The result (see Figure 14) shows that the magnitude of theforbidden zone has been reduced and the accuracy improved to within ±10% for the rangeof 0 < k1 < M2, while the errors of both the standard linear and quadratic elements arerapidly increasing as kI increases.
15
4.2 Tailoring Parameter Constraints
So far, we have used unconstrained optimization techniques to determine appropriate ele-ment tailoring parameters, by ignoring contraints that must be imposed on these variablesto ensure element feasibility. Two important considerations are the preservation of ele-ment rigid-body modes and total mass. By using convex combinations of the linear andquadratic element formulations, it is easy to show that these are maintained for arbitraryvalues of the element performance parameters. In other words, these properties are pre-served automatically. However, we must also preserve the positive definiteness of the massmatrix, and positive semi-definiteness of the stiffness matrix, and ensure the optimizationdoes not introduce spurious element mechanisms.
For the bar element, these constraints result in the following parameter restrictions:
icmI <3 (4)
ak > - 3
Practically speaking, the restriction on the stiffness tailoring parameter is superfulous ifthe element is optimized over a reasonably broad range of its spectrum, and the Fourieranalysis can be used a posteriori to examine the tailored element behavior, as it is a robustmethod for identifying deficiencies in the element formulation [17]. In other words, if weenforce the tailored mass matrix to be positive definite through an inequality constraint ona., and the Fourier analysis identifies the vibration behavior of the element as acceptableover its full spectral range, there is no need to constrain the stiffness tailoring parameters.
It should be noted, however, that while the Fourier analysis will identify problematicbehavior in the element formulation, the evidence can be subtle and easily missed whenexamining the dispersion curve at a finite set of discrete wave number values. For example,in Section 4.1 it was noted that the optimization performed in Case II lead to a negative-definite element stiffness matrix. However, the negative eigenvalues found by examiningthe Fourier dispersion curve for the tailored element were in the range of 0 < k1 < 10000,in other words in the lowest .01% of the element spectrum. A more reliable method(given the difficulty in deriving a general parameter constraint for elements more complexthan bars) would to check the positive semi-definiteness of the element stiffness matrix ateach step in the optimization process, and then limit the step size of the iteration if thesemi-definiteness stiffness constraint is violated.
With this in mind, the beam element parameter constraint is given by
lImi < 3 (46)
By using a lumped mass formulation, it should be simple to determine the mass param-eter constraint even in the case of complex two and three-dimensional elements. Thecomputational overhead associated with checking the clement stiffuic.s matrix fur negative
16
eigenvalues is very small compared to the basic optimization, and certainly much eas-ier than deriving the dispersion curve at a sufficient nurrber of values to ensure elementfeasibility
4.3 Numerical Results for Tailored Finite Element Models
A final issue which must be addressed is the resultant accuracy of assembled finite ele-ment models '- -ng the tailoring methods described herein, both in terms of frequenciesand i±iode shapes. The primary utility of the Fourier analysis within the context of thefrequency-window tailoring technique is to closely correlate the temporal/spacial frequencycharacteristics of the parameterized element to the corresponding partial differential equa-tions of motion. However, if the eigenvectors of an assembled model do not correlate wellto the associated continuum wave shapes over the discrete spectrum, the model cannotbe assumed to accurately capture the continuum wave dynamics. Fortunately, throughnumerical tests, it can be verified that the tailored elements maintain the mode shapecharacteristics of the linear and quadratic elements on which they are based.
For the bar vibration problem, with constrained ends, the wave shapes follow the Fourierspacial expansion used in the tailoring method. That is
-Z-(x) = sin ln = 1,2,... (47)
Figure 15 demonstrates how the Fourier uispersion curve accurately predicts the resultantelement's numerical behavior. The discrete points shown are the calculated frequenciesof a 10-elenment mesh of tailored bars, with the given param eters, plotted against thecorresponding wave number as determined by best-fitting the eigenvector of the mode tothe Fourier spacial expansion. Not only is the appicability of the Fourier analysis verified,but the results also show how the discrete model eigenmodes uniformally cover the discretespectrum, as is the case with linear bar elements. Figurcs 16 and 17 show that, for bothmoderate and high frequency modes, the eigenvectors of the tailored model retain the samebasic character as their linear element counterparts, which themselves exactly match thecontinuum wave shapes at the node point locations. Figures 18 through 20 show similarresults for a high-frequency-window tailoring case.
17
5.0 Discussions
A frequency-window t;loring technique is prese. .I for improving finite element modelsthat can be used to capture accurately frequencie, nd their mode shapes up to acousticranges. The tailoring of the element mass and stiffness matrices is achieved by a combi-nation of linear and quadratic elements for both bar and beam elements. The tailoringparameters are optimized for each frequency window by minimizing the errors between thecontinuum and discrete characteristic dispersion curves. For the case of beam elements,
th3 stiffness tailoring is further partitioned into a component-by-component contribution ofthe transverse shear and the bending stiffness matrices. Such a component-by-component
tailoring has proved to be a key feature of the present tailoring technique.
The accuracy improvements realized by the present tailoring technique have been first
predicted by the discrete Fourier analysis. The results demonstrate that the tailoredelements dramatically improves the accuracy of both the frequencies and their mode shapesfar beyond that of the linear or quadratic elements. This is also coroborated via numerical
eigenvalue analyses. Although the results contained herein are primarily the analyticaland numerical eigenvalues of the various element formulations, it should be remember thatthe f-indamental product of the tailoring methods .are the stiffness and mass interpolationparameters themselves, which are a function of the frequency window chosen by the analystto optimize.
There are two overhead uspects associated with the present tailoring technique. The first isto employ a quadratic nodal topology for constructing the tailored element matrices. The
second is to carry out several frequency window-by-window eigenvalue analyses in orderto cove the entire range of frequencies of interest. These must be more than made up bya substantially reduced number of the nodal degrees of freedom by the present tailoredelements. For example, for the wave-number range of - < ki < 7r, the tailored bar and
beam elements yield the frequency accuracy within a few percent for k1 = 2.5 from Fig. 13,whereas the conventional linear and quadratic beam elements must reduce their clement
sizes by a factor of about ten and five, respectively, if the same accuracy is to be maintained
by these elements.
A straightforward extrapolation of the preceding accuracy compariscn to two and threedimensional problems would result in 100 and 25 smaller nodal unknowns when the presenttechnique is employed. As the extension of the present tailoring technique to plate, shell
and solid elements appear to be straightforward, a bigger pay-off of the present technique
may accrue as applied to two and three dimensional elements. For example, for shellelements, the tailoring of element stiffness can be achieved by synthcsizing the membrane,the transverse shear, and the bending components. This is being carried out at present
and we plan to report the results in a.future occasion.
18
REFERENCES
1. Archer, J. S., "Consistent Mass Matrix for Distributed Mass Systems," J. of theStructural Division, Vol. 4, pp. 161-178 (1963).
2. Bazant, Z. P., "Spurious Reflection of Elastic Waves in Non-Uniform Finite ElementGrids," Computer Methods in Applied Mechanics and Engineering, Vol. 16, pp. 91-100(1978).
3. Bazant, Z. P., "Spurious Reflection of Elastic Waves in Non-Uniform Meshes of Con-stant and Linear Strain Elements," Computers and Structures, Vol. 15, No. 4, pp.451-459 (1982).
4. Belytschko, T. and Mullen, R., "On Dispersive Properties of Finite Element Solu-tions," Modern Problems in Wave Propogation, Edited by J. Miklowitz and J. D.Achenbach, John Wiley and Sons, New York, pp. 67-82 (1978).
5. Celep, Z. and Turhan, D., "Transient Wave Propogation in Constant and Linear StrainElements," J. Sound and Vibration, Vol. 116, No. 1, pp. 15-23 (1987).
6. Fried, I. and Malkus, D. S., "Finite Element Mass Matrix Lumping by NumericalIntegration with No Convergence Rate Loss," Int. J. Solids Structures, Vol. 11, pp.461-466 (1975).
7. Hinton, E., Rock, T. and Zienkiewicz, 0. C., "A Note on Mass Lumping and RelatedProcesses in the Finite Element Method," Earthquake Eng. and Structural Dynam.,Vol. 4, pp. 245-249 (1976).
8. Krieg, R. D. and Key, S. W., "Transient Shell Response by Numerical Time Integra-tion," Int. J. Numerical Methods in Eng., Vol. 7, pp. 273-286 (1973).
9. Leckie, F. A. and Lindberg, G. M., "The Effect of Lumped Parameters on BeamFrequencies," Aeronaut. Quarterly, Vol. 14, pp. 224-240 (1963).
10. Tong, P., Pian, T. H. H., and Buciarelli, L. L., "Mode Shapes and Frequencies by theFinite Element Method Using Consistent and Lumped Mass," J. Comp. Struct., Vol.1, pp. 623-638 (1971).
11. Jiang, L. and Rogers, R. J., "Effects of Spacial Discretization on Dispersion andSpurious Oscillations in Elastic Wave Propogation," Int. J. Numerical Methods inEng., Vol. 29, pp. 1205-1218 (1990).
12. Park, K. C. and Jensen, D. J., "A Systematic Determination of Lumped and ImprovedConsistent Mass Matrices for Vibration Analysis", AIAA Paper No. 89-1335, April,1989.
19
13. Park, K. C. and Flaggs, D. L., 1984, "An Operational Procedure for the SymbolicAnalysis of the Finite Element Method", Comp. Meth. Appl. Mech. Engr., 42, pp.37-46.
14. Flaggs, D. L., Symbolic Analysis of the Finite Element Method in Structural Mechan-ics, Ph.D Thesis, Stanford University, 1988.
15. Hughes, T. J. R., The Finite Element Method: Linear Static and Dynamic FiniteElement Analysis, Prentice-Hall, 1987.
16. Fletcher, R., Practical Methods of Optimization, 2nd Ed., Wiley-Interscience, 1987.
17. Park, K. C. and Flaggs, D. L., 1984, "A Fourier Analysis for Spurious Mechanismsand Locking in the Finite Element Method", Comp. Meth. Appl. Mech. Engr., 46,pp. 68-81.
20
9
Case Window rm a JQ JilP
I 0 < k1 < 1 0.9990 1.0263 1.23 x 10-6 0.10 X 10-6
II k1 < 0.4948 1.1840 8.44 x 10- 4 0.19 x 10- 4
III k1 < _ 0.5409 1.3188 0.1018 4.60 x 10- 4
IV f < k1 < 2 0.6200 1.5631 0.1289 0.00412 - - 3V -.r, < k1 < 5_._r 0.7439 1.9880 0.2729 0.0216
3 - - 6-6VI 5 < kl < 7r 0.9435 2.7517 1.4866 0.0851
Table 1: Tailoring Results for Bar Element (Narrow Frequency Windows)
Case Window am ak JQ JHP
VII 0 < k- < 0.4948 1.1837 9.32 x 10- 4 0.25 x 10- 4
VIII -< h < 2.r 0.6539 1.4629 0.2136 0.01243 - 4-. 3
IX 7' < k1 < 7r 0.8505 2.3723 1.7977 0.3571X 0 < k! < 0.5577 1.2887 0.1385 9.85 x 10- 4
XI -< kI < 7r 0,8005 2.1897 1.9782 0.6210XII 0 < k1 < 7r 0.7804 2.1101 2.0701 0.7615
Table 2: Tailoring Results for Bar Element (Broad Frequ, ;y Windows)
21
Case aem ak ebJQ JHP
ITr k1 < 1 0.6692 1.0018 0.6244 0.0048 5.1801 x 10-5
II27r k1 < '- 1.5321 1.0854 -0.2374 32.7325 1.1998 X 10-4
Ha27r k1 < 5J' 1.5321 1.0800 0.5000 32.7325 0.1745
III0 < k1 < 17 0.8508 1.0331 0.6685 0.7713 0.0653
Table 3: Tailoring Results for IBeam Element
22
m-1 rn m m+l
Figure 1: Nodal Geometry for Linear 2-node Line Elements
m-2 m-i m m rm+1 m+2
Figure 2: Nodal Geometry for Quadratic 3-node Line Elements
23
3-~2.5- ...
Fo rb i ddenF Dsc re te
1.5 Frequency Zone
o 1 - _Exact Continuum
-- Linear Bar Element
0.5- ..Quadratic Bar Element
0 p___________________________________
0 0.5 1 1.5 2 2.5 3 3.5
Normalized Wave Number, kl
Figure 3: Fourier Analysis Results for Bar Elements
24
103
101.V
*~100
Cl 10-1-
o10-2-
10-3
10-40 0.5 1 1.5 2 2.5
Normalized Frequency, (wi/c)
Figure 4: Effect of Quadratic Element "Forbidden Zone"in Simulation of Bar Dynamics
25
102
1 0 1 .... ........
100
5 10-'-
S10-2-
_ Exact Continuum
10- --Linear Beam Element
10-4
10-5
0 0.5 1 1.5 2 2.5 3 3.5
Normalized Wave Number, kI
Figure 5: Fourier Analysis Results for Linear Beam Element
26
102
101
S100
S10-2-
Exact Continuum
10-3 -.. Quadratic Beam Element
10-4
10-5-0 0.5 1 1.5 2 2.5 3 3.5
Normnalized Wave Number, kI
Figure 6: Fourier Analysis Results for Quadratic Beam Element
27
i
3.5
3
~2.5
2 2
" 1.5
1 --- Linear
..Quadratic
0.5 .-Tailored
0 t I -, I I0 0.5 1 1.5 2 2.5 3 3.5
Normalized Wave Number, k1
Figure 7: Fourier Analysis, Tailored Bar Element, Case IIITailored Wave Range: f < k1 < M
Parameters: a,, = 0.5409, a, = 1.3188
28
10 -
8-
6-
4 - ....... ... °....-....".
.. ....... °.. "
2 0 ............................................. ..................... . ...
4
-2S -2Exact
-4 --Linear
..Quadratic
.-Tailored
-8
-10'1.1 1.2 1.3 1.4 1.5
Normalized Wave Number, k1
Figure 8: Frequency Error, Tailored Bar Element, Case IIITailored Wave Range: 1 < k1 <
Parameters: am = 0.5409, a, = 1.3188
29
50
40-
30
o 10.
0 -.................
-30-
-10 _Exact
-20- -Linear
S -20- ..Quadratic.-Tailored
-40
-50'2.7 2.8 2.9 3 3-1
Normalized Wave Number, kI
Figure 9: Frequency Error, Tailored Bar Element, Case VITailored Wave Range: LE < ki <
Parameters: am = 0.9435, ak = 2.7517
30
10 -
8-
6--
4-
2 -.......
-2.
_Exact
S -4- --Linear..Quadratic
-6 .-Tailored
-8-
-10,1.2 1.4 1.6 1.8 2
Normnalized Wave Numnber, kI
Figure 10: Fre quency Error, Tailored Bar Element, Case VIIITailored Wave Range: 1: < ki < 27
Parameters: a,, = 0.6539, ak = 1.4629
____ ____ ____ 31
100
S101
10-2 Exact~ 10-2--Linear
..Quadratic
.-Tailored
10.30 0.5 1 1.5 2 2.5 3 3.5
Normalized Wave Number, k1
Figure 11: Fourier Analysis, Tailored Beam Element, Case ITailored Wave Range: f ki < f
Parameters: a, = 0.6692, a. = 1.0018, ab = 0.6244
32
5
4Exact
3- --Linear..Quadratic
S 2 .-Tailored
- ............................................................................ ...............
-3
-4-
0.6 0.7 0.8 0.9
Normalized Wave Number, kI
Figure 12: Frequency Error, Tailored Beam Element, Case I
Tailored Wave Range: < ki < f
Parameters: am = 0.6692, a, = 1.0018, ab = 0.6244
33
100
80 winow
60 ..... ..... ....
40
= 20 .
2 0 --- .. ...... It
-20 /" Exact 7
-40 --Linear
..Quadratic
-60 .-Tailored
-80
0 0.5 1 1.5 2 2.5 3
Normalized Wave Number, kl
Figure 13: Frequency Error, Tailored Beam Element, Case IITailored Wave Range: 2-'< ki < .. r
Parameters: am = 1.5321, a = 1.0854, ab = -0.2374
34
50 -s
40
30
20
o' -10-
' --........... ............... . .
- ....... .........................
_Exact
-20 --Linear
..Quadratic-30 .-Tailored
-40
-50'0 0.5 1 1.5 2
Normalized Wave Number, k1
Figure 14: Frequency Error, Tailored Beam Element, Case IIITailored Wave Range: 0 < k1 < 2,
Parameters: a,,, = 0.8508, a, = 1.0331, ab = 0.6685
35
3.5
+ Tailored (discrete cigernodes)3- ..Tailored (Fourier result)
--Linear
2.5- - Exact.....
+
2- + --- ----------
1.5+
0.5
+00 0.5 1 .+ . .
Normalized Wave Number, k1
Figure 15: Comparison of Fourier Analysis to Discrete Bar Model, Case IIITailored Wave Range: < k1 < f
Parameters: cem 0.5409, 'ak =1.3188
36
0.8Discrete Mode #9 for 19-dof Bar
0.6 Wave number kl = 9*pV20
,0.4
0.
o 0
0 -0.4- ..Exact
--Linear element9 -0.6 --Tailored element
-0.8
-1. I I I I I I I
0 2 4 6 8 10 12 14 16 18 20
Position along bar length
Figure 16: Low Frequecy Mode Shape for Discrete Bar Models, Case IIITailored Wave Range: 1< k1 < ,
Parameters: am = 0.5409, aok = 1.3188
37
0.8
0.8 Discrete Mode #17 for 19-dof Bar
0.6- Wave number ki 17*pi120
.4
g 0.21
S 0
-0.2-
S-0.4- .Exact
--Linear element2 -0.6- --Tailored element
-0.8-
-1,0 2 4 6 8 10 12 14 16 18 20
Position along bar length
Figuare 17: High Frequecy Mode Shape for Discrete Bar Models, Case IIITailored Wave Range: I' < 1H
Parameters: cem, = 0.5409, ak = 1.3188
38
3.5
+ Tailored (discrete eigenmodes)
3- ..Tailored (Fourier result)-+ . 4. ...
+ --Linear+ "S2.5-+ Exact ..
+ .4.
€ + , 4."
o 22 + ---.- '''
1.5.
1 4.. * + + +
0.5
0 0.5 1 1.5 2 2.5 3 3.5
Normalized Wave Nu,, ber, kI
Figure 18: Comparison of Fourier Analysis to Discrete Bar Model, Case VI
Tailored Wave Range: -< ki < 7r
Parameters: am = 0.9435, oYk = 2.7517
39
1- I t
0.8Discrete Mode #9 for 19-dof Bar
S 0.6 Wave number kI = 9*pi/20E
0.4. ,' ~ S. ". Vt.
0.2 " ". .
0
-0.2- -,
- -0.4 ..Exact
--Linear element2 -0.6---- Tailored element
-0.8
-1 , ''0 2 4 6 8 10 12 14 16 18 20
Position along bar length
Figure 19: Low Frequecy Mode Shape for Discrete Bar Models, Case VITailored Wave Range: I < kI <,7r
Parameters: a,, = 0.9435, Ckk = 2.7517
40
0.8 Discrete Mode #17 for 19-dof Bar
Ei 0.6- Wave number ki 17*piI20
S0.4-
a 0.2 ~
-0.4 -\ .. x c
I) -- ina elemen
0 0 1 2 14 16 18 2
Fiur 20 HihFeuyMd Shp fo Dicrt Ba goes aeV
Ta-0.6e--Taoed Re:lcmcnt7
-0.8er:a. .431a =271
-141
100
80 TailoredFrequencyWindow
60
40.
ff~l...... ............
o 20.
o 0 V.I ..... *" .......
a. -20 :. - .... S
Exact-40 --Linear
..Quadratic-60 " :
.-Tailored
-80 * Tailored(discrete cigenmodes)
-100 -,1.
0 0.5 1 1.5 2 2.5 3
Normalized Wave Number, kI
Figure 21: Comparison of Fourier Analysis to Discrete Beam Model, Case HaTailored Wave Range: L-< kI < '
Parameters: am = 1.5321, , = 1.0800, ab = 0.5000
42
A Systematic Determination 89-1335-CPof
Lumped and Improved Consistent Mass Matrices for Vibration Analysis
K. C. Park' and Daniel D. Jensen'Center for Space Structures and ControlsUniversity of Colorado, Campus Box 429
Boulder, CO 80309Phone: 303-492-6330
Abstract 1971; Krieg and Key, 1973), the use of a scaled diagonalA systematic procedure for determining the lumped mass entries from the stiffness matrix (Hinton et al, 1976), amatrix and improved consistent mass matrices has been pro- selective sum of a low order-based consistent mass matrixposed for vibration analyses by the finite element method. (Fried and Malkus, 1975), and combinations of these.The procedure is based on the discrete Fourier analysis whichenables one to compare the numerical approximations with Although existing mass lumping procedures are intuitivelythe corresponding continuum characteristics. The procedure appealing for low-order elements such as constant strainis applied to vibrations of bar. Euler-Bernoulli beam and plate bar, and beam bending elements, such intuitive (or ad-bending elements. The results obtained by the present pro- hoc) procedures become quickly ambiguous for high-ordercedure clearly indicate that a judicious use of the improved elements. As a result, at present no agreed-upon lumpedmass matrices offered in the paper can lead to a significant mass matrices exist for cubic Euler-Bernoulli beams andaccuracy improvement for intermediate frequencies that can for eight-noded serendipity plate/shell elements. Hence,play important roles in modeling of control-structure interac- there exists a lack of a systematic procedure for mass lump-tion systems, dynamic localizations and acoustic responses ing.for space structures and underwater vehicles.
The objective of the present paper is first to develop a sys-1. Introduction tematic lumping procedure based on the discrete Fourier
analysis of the finite element method (Park and Flaggs,The question of mass lumping or rather the systematic 1984; Flaggs, 1988) and second to symbolically synthesizeconstruction of mass matrix for the vibration and zran- a series of more accurate mass matrices for vibration anal-sient analysis of structures by the finite element method ysis when intermediate frequencies become important. Toremains to date an unresolved issue. Apparently, it was this end, the paper is organized as follows.Archer.(1963) who first introduced a procedure for gener-ating mass matrices based on the same displacement ape Section 2 revisits a discrete Fourier analysis of a bar mod-functions that are used in the construction of element stiff- eled by the linear displacement approximation. A defini-ness matrices. The mass matrices generated according to ion of the lumped mass matrix is proposed by compar-Archer's procedure have become known as "consistent" ing the characteristic equation for the continuum bar withmass matrices. In contrast to the consistent mass matrix, that for the constant-strain bar. A simple synthesis of ana diagonal or diagonalized mass matrix is refered to as a improved mass matrix for the bar is proposed and its ac-lumped mass matrix. curacy is assessed in terms of its wave dispersion curve.
Even though the use of consistent mass matrices yields for An example vibration problem with simply-fixed bar endsmost applications better accuracy in the frequency analy- is analyzed, which demonstrates the systematic nature ofsis, the lumped mass matrix continues to be prefered by the proposed definition of a lumped mass matrix and thethe practicing engineers due to its computational simplic- general nature of discrete-Fourier synthesized mass matri-ity and a data storage saving in the computer. Such attrac- ces.tive features of the lumped mass matrix motivated several Section 3 applies the present definition of lumped massinvestigators in the past to propose various mass lumping matrices to a cubic Euler-Bernoulli beam element. A prin-procedures such as a row sum of the consistent mass ma-trix (Leckie Lnd Lindberg, 1963; Tong, Pian and Buciarelli, cipal theory from the analysis of the cubic Euler-Bernoulli
beam confirms the numerically well-known result that thelumping of the translational degree of freedom (w) and the
- Professor of Aerospace Engineering, University of Col. neglect of the rotational freedom (-) yields a most accu-orado. Member AIAA. rate frequency prediction. The present theory succinctly2 - Graduate Research Assistant, Department of Aerospace illustrates that such a mass matrix is the only theoreticallyEngineering Science. consistent approximation. A problem analyzed by ArcherCopyright Amencan Instituce of Acronzutics and
Asronauics. Inc.. 1989. All rights reserved. 1532
is revisited in order to asses our improved mass matrix. Itis shown that the proposed synthesized mass matrix con- L(co, k) = - p Ek2 (2-5)
siderably improves the third and fourth frequencies for atwo-element beam, thus establishing the soundness of the For our subsequent discussions, we reexpress the aboveproposed synthesized mass matrix, equation as:
The present improved mass modeling for plate vibrations L(w,k) = -- 2jd'6.€ + fC,
is presened in Section 4. To this end, a Fourier analysis of with Nk-. - , f = E+ (2-6)
the frequency vs. wave number characteristics is carriedout for an infinite plate of both the continuum and finite A corresponding discrete Fourier analysis can be per-element approximation by a four-noded element. Such a formed when the bar equation (2.2) is approximated by theFourier analysis is believed to provide insight into the ac- finite element method, which has been studied by many in-curacy of vibration analysis for an infinite plate. In oder to vestigators (Bazant, 1978; Belytschko and Mullen, 1978;utilize the mass matrix modeling based the discrete Fourier Vichnevetsky, 1982; Park and Flaggs, 1984; Celep andanalysis, finite element plate vibrations with free edges Turhan, 1987; Flaggs, 1988). A preoccupation of thesehave been performed with increasing meshes. The results studies, however, was to address the effect of internal en-obtained from the discrete Fourier analysis and numerical ergy discretizations on the wave propagation characteris-tests for free-edge plate vibrations indicate that a best ac- tics.curacy for an infinite plate, when analyzed by a four-node-element, is achieved by an average of the lumped and the In the present study, we will examine the effect of kinetic
consistent matrices. For plate with finite dimensions, a energy discretizations on the frequency characteristics and
best accuracy is achievable with quarter of the lumped deduce from such a study the proposed mass-lumping pro-
mass and three quarter of the consistent mass matrix. cedure as well as a synthesis procedure to obtain improvedconsistent mass matrices. To this end, let us revisit a con-
2. A Proposition for Lumped Mass Matrix stant bar element. An elementary finite clement implemen-
tation gives for a bar element of a uniform length, , theA mass-matrix lumping procedure that we are about to following discrete equation for an interior node, m (e.g.,propose is based on the discrete Fourier analysis. Since Park and Flaggs, 1984):an easy example of Fourier analysis that one can performis the ccntinuum equation for a ba- and its discrete coun- p Eterpart, we will first introduce their Fourier analyses. We (uim-i + 4uin Um+i) -?- (tm_, - 2u, -will then identify the Fourier operators for mass matrices. (2-7)
In a simplest term, our proposed lumped mass matrix is Substitution of Eq. (2.3) into Eq. (2.7) yields the discretedefined as follows: Fourier operator
Let k be the ware number and Me,.. the Fourier-transformed consistent mass matrix, then the lumped L.(o, k) = K (2-8)mass matriz; M g ump, in the Fourier domain is definedby where
Mggmp =lim M..c (2-1) L2O -;C-o p = ( ), ]RD " (_.oM. -0 - K = EV (2.9)
We will now illustrate our proposition for mass-lumpingprocedure via a bar element, in which the discrete wve number, i4 is defined as
The equation of motion for a uniform elastic bar can be ..= - . (2. 0)written as
E -(2-2)
where p, E,u, . are the density, Young's modulus, the dis- For a lumped mass matrix, we have the follo-ping discreteplacement and the coordinate, respectively. Fourier operator:
The traditional Fourier analysis begins by seeking a generalLPtw, k) = - rnp D (2-l)
harmonic wave solut' on of Eq. (2.2) of the form
whereu (2.3) Mi, rn - (2.12)
with cw being the circular frequency, k, the corresponding Test of Proposed Mass-Lumping Procedure (2.1):wave number and i = vfT. Substitution of Eq. (2.3) intoEq. (2.2) yields Note that .,, can be expanded to read
L((4k))- = 0 (2.4) ... (2. ,3)where the Fourier operator, L(w, k), is given explicity by n
1533
Hence, we. have 3. Mass-Lumr' of a Euler-Bernoulli Beam El-ement
k-0 In the preceding section we have succintly demon.
which proves our proposition (2.1) to be valid (at least for strated that a discrete Fourier analysis can provide numer-
the bar!), ical and physical insight into mass lumping as well as canlead to improved mass matrix approximation. In this sec-
We now address our second related task: a systematic way tion, we will demonstrate that the proposed mass lumping
of constructing consistent mass matrices that can lead to procedure is also applicable for cases that require discrateimproved vibration analysis. To this end, we note that the Fourier matrix operators.
characteristic equation that relates the wave number (k) The homogeneous differential equatioha of motion forto the frequency (w) is obtained by setting Lw, k) = 0 for the Euler-Bernoulli beam can be expressed asthe continuum case:
A2W 04w( _ )2 = k2 (2 1 )PA -j - + E l - T, (3 1)
where the wave speed, c, defined as c = V'E71 is constant. where A is the cross-section area of the beam, I is themoment of inertia. With a general harmonic wave solution
The characteristic equation, Eq. (2.15), indicates that for of (3.1) of the formthe continuum solution, the wave number, k, is directly w (
wt - k
z)
proportional to the frequency, w, i.e., k = w/c. With c W = I (3-2)
constant, each Fourier component of a wave group willpropagate without dispersion with the same phase velocity, we obtain the Fourier operator for the beam as
To examine the effect of kinetic energy discretizations on L(w, k) = -w 2 Mezoct + .bem (3.3)
the accuracy of vibration analysis, we propose the followingsimple modification for our proposed mass matrix: where Mezact = pA, 14,ear, = Elk4 (3.4)
M Ine = (1 - a) * 1 ump + a I eon (2.16) A cubic interpolation of w gives for each beam element of
where a is a constant to be determined. Note that a = 0 length t the following consistent mass and stiffness matri-
corresponds to the case of lumped mass matrix ,.-ad a = 1 cesto consistent mass matrix, respectively. Figure 1 illustrates 156 22t 54 -1311
frequency vs wave number for the continuum bar, the dis- pAt 22t 4t2 131 -V 2
crete jar with the consistent mass matrix (a = 1) and M -, = pal. 54 13t 156 -22twith the lumped mass (a = 0), and for an averaged mass -13t -3 2 -22t 412 Jmatrix(a = 0.5). Although not shown in the figure, othervalues of a can be selected in order to tailor the accuracy [12 61 -12 61requirement for different frequency ranges. For example, _ EI 61 42 -6t 22 (3.6)
a = (0.5,0.5682, 59,0.89207289) gives a most accurate Kbed- = 3 -12 -6t 12 -6(result for small k, ii around ir/2 and k around x, respec- 6 22 -6t 42tively. Hence, depending upon the critical accuracy rangeof interest, one can adjust the mass matrix accordingly. where the elemental nodal degrees of freedom are arranged
as
, . conventional consistet maa(a = 1) u = Lw,01,w 2 ,o 2JT (3.7)
. L .... impro, consistent =.S) "I" XACT2 -- continuum By assembling two interior beam elements and designating
. -- lumped man(a=O 0) a 0 . their nodes, m - 1,m and m + 1, respectively, we obtain.the following difference equations for the mth node:
./S - ( -((54v),_ 1 + 312,). + 4u),,+ ) + 3( ,,-, -~1.5-- -420
1.0 El
+/ +-- J-(-w,-, + 2w,- 1 - -+1) + t(-+ - - + 8.+)} °
Zo 0.5
2.0p~ 2. . {13t(-ki) + _ +' + t2 (-39__ + 3i.+&0.0 0.3 1.0 1.5 2.0 2.5 3.0 4.5 (3,9
NORMALIZED WAVE NUMBER (ki) +-(6t(w._ - w.+) + 2t 3 (5.._1 + 4#., + ..+,} 0 (5.9)
Figure 1. Dispersion Curve for Linear Bar
t1534
Since wi and 8i are treated independently, we seek a solu. PA 0 0 0 0 [0 t o ]tio _OM =' 420 0 0 0 I 4200i e e~wt-k) (3.10) mP4 0 La-30 a UL2 J 28i *J (3.19)
to Fourier-transform the coupled difference equation (3.8)and (3.9) to obtain Hence, for simple elements the ad-hoc d.o.f.-by-d.o.f, row-
summation procedure is justified in view of the present- ml proposed mass-lumping procedure (2.1). It should be men-
(-wAf 0:' + K~eom) , ) = 0 (3.11) tioned that care must be exercised in lumping matrices forhigher-order elements.
where Now, to address the accuracy associated with the choice
&11 ir12 of a mass matrix (either M1 u,, or MVo, or even theircombinations), Let us examine the characteristic equationM =pAt (3.12) from (3.11), viz,
- t2 ffi22
-t" sn + a21 2
2 -iw2 ,612 + ia2Lin kt 1- 12EI k2L2 iesinke I [ ' , - io2Agmn -W.Or22 + 02L(1 42) jl alo
Rb ,em = 1 (3.13)
-ilsin kt t2 (i - 2 a = 12B1 (3.20)
in which3 which yields the following frequency vs. wave numberrn1 L ( 1 - ( ) + 1.5k 12 - 2
10 (3.14) equation:i 2 2 = k( +1.O 2 2
The lumped-mass matrix, according to our proposition f (,hl - ,22 - - a' ( ,i + UiP kL -&12 + M )2 22
)(2.1), for the mth assembled node thus becomes. +',' 7 0
(3.21)
1 0 where ie = 1-
, ira -o = pAt (3.15) Comparing (3.21) with its differential equation counterpart
0 _e210 (3.3), one concludes that, since r l is the translationalpart, we must have
which, when translated into the element mass matrix, isequivalent to rn 12 = 0 " f"i2 = 0 (3.22)
2 if (3.21) is to have the following form:Ma.m = pAt - (3.16)&... (31 -W 2Munp + Kbeam = 0 (3.23)
420
Hence, we conclude that for the cubic Euler-BemoulliRemark: One of ad-hoc mass-lumping procedures is to beam element to be consistent with the original differential
sum up d.o.f.-by-d.o.f. contribution. When this ad-hoc ba lmn ob ossetwt h rgnldfeetasum p do~f-byd~o£ cntriutin, henthi adhoc equation, one must employ the following mass matrix (i.e.,technique is applied to the element mass matrix (3.5), one = o, one m 0):
obtains: r'1 0, ri = 0):
MICmp = +WM MlumP (3.17) f210 - 54az 0 54a 01pAt 0 0 0 0
where for the w-d.o.f.s we have M A0 0 < a < I=420 54a 0 210 -54a 0
0 0 0 0].1 056 0 54 0
M =A [4 o 156 (. pA18) Application of (3.24) yields the following characteristica a a 0 a] equation:
For the -d.o.f.s -_2(1 _ +p + 4 = (25
so that we have from (3.3) and (3.25) the following fre-quency equations
1535
--- conventional consistent ma-(,z= 1) EXACT .... conventional consistent man(a 1) /= 1 /
improved consistent mass/ 1.05) improved consistent M&33O0 = .00)
-continuum 4.0 -... improved consistent Mass(a = .25) ,/
o -- umped ma(, 0) Z...- "o"nu....- -- umped mau.Ps=0)
-ii a3.05.0 ,
W 2.0
S 2.5 L1.
0.00.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 0.0 0.0 0.5 1.0 1.5 2.0 2.5
NORMALIZED WAVE NUMBER (U) NORMALIZED WAVE NUMBER (ki)
Figure 2. Dispersion Curve for Beam with Rotatory In-ertia Figure 3. Dispersion Curve for Beam without Rotatory
Inertia
For continuum case: In order to assess the preceding a pri'ori determination of
-J (3.26) improved mass matrix based soley on the symbolic analy-
re, sis, we have performed the vibration analysis of a free-freebeam modeled by two elements and compared the present
For the consistent matrix (3.23): results with the one performed by Archer(1963). The re-suits are summarized in Table 1. It is observed that the
kconventional consistent mass matrix (a = 1) gives an error._ . Zt -2)-,1 (3.21) of about 1% for the first mode, 13% for the second mode,
p (1 6 2 2) 45% for the third mode and 40% for the fourth mode.
In contrast, for an averaged mass matrix (a = 0.5), the
Figure 2 shows the frequency vs. the normalized wave corresponding errors are 27% for the first mode, 2.5% for
number for the mass matrix (3.24), that is, neglecting the the second mode, 1.8% for the third mode and 1.7% for
rotational contribution to the element mass matrix (see the fourth mode, respectively. It is noted that the sym-
Eq. (3.25)). For this case, the larger the value of a, the bolic analysis results predict a = 0.5 to be most accurate
more accurate the frequency curve becomes. while the finite element solutions indicate that a = 0.25
to be most accurate. We conjecture that this discrepancy
Of course, as shown by Archer (1963), the use of the con- is due to the fact that the finite element solutions are for
sistent mass matrix (3.5) may improve the frequency ac- finite beams whereas the Fourier analysis assumes an infi-
curacy over the lumped mass matrix (3.16). In order to nite beam. Neverthless, the accuracy prediction based on
gain insight into the role of various mass modeling on the the discrete Fourier analysis as given in Figure 3 provides
accuracy of vibration frequencies and mode shapes, we ap- a qualitative measure of different mass modeling choices.ply our proposed mass matrix formula (2.16) by combining(3.5) and (3.16) to obtain: SOLUTION TYPE W w, W2_W3 i
EXACT S.S944 15.481 30.26 49.945r 210- 54 22tot 54as -13&s 1, i ( 22+ ,62(, - 3or) 13ta -3 4a c = 0.0 3.4273 28.983 41.428 50.438
420 | 54a 13ta 210 - 54a -22t .L -13ta -3t 2c -22 2(1 + 3oc) J .25 3.7075 19.101 32.911 SO.4
(2= .50 4.0924 18.377 30.821 50.794
Figure 3 illustrate the effect of the mass-matrix averag- a =.75 4 6810 15.867 32.474 53.S99
ing based on the above averaged mass matrix (3.28). Ap-plication of the above mass matrix yields a characterstic a = 1.0 5.6058 17.544 43.870 o10.08
equation that is similar to (3.21) except ini are modifiedaccordingly. As was the case for the bar, a = (0, 1) cor- Table 1. Frequencies Computed by Different Mass Mod-
responds to the lumped matrix(3.16) and the consistent els for Beam
matrix(3.5), respectively. It is observed that for a = 0.25the discrete frequency vs the wave number curve follows Incidently, the large error of the averaged mass matrix for
almost on top of the continuum case. the first mode is not too much of concern because as the
number of elements are increased, the first mode quickly
1536
L
converges. In fact, when five elements are assembled, the v 2 a2error for the first mode reduced to less than 0.2% with V2 = + T-2 (4.5)
a = 0.5 while maintaining high-accuracy for the higher
modes. On the other hand, the conventional mass matrix and( a = 1) continues to give large errors for the higher modes. 1) Ehi 1 -i'
4. Mass Matrices for Plate Vibration Analysis 12(1 - z2)' 21 + vD D , Eh 4 )
So far we have shown that the proposed mass-lumping D12 + D D = + (4--6)
procedure (2.1) and the improved consistent mass matrix 2 2(1 + v)
(2.16) lead to substantial improvements in beam vibra-tion analyses. In particular, depending upon one's desire in which 8, and w are the rotations and the transverse
for tailoring the analysis accuracy for certain frequency displacement; q is the transverse load per unit area; E, h, v
ranges, one can modify a priori the mass matrix as pro- and ic are Young's modulus, plate thickness, Poisson's ratio
posed by (2.16) and manifested ii Figs. I - 3. For La- and the shear correction factor; I and m are the rotatory
grange family of plate and shell elements, the proposed inertia and the plate mass with z, y and t denoting the
lumping procedure (2.1) is equally applicable. In this sec- Cartesian coordinates and time, respectively.
tion we will first examine plate vibrations based on the The Fourier-transformed matrix operator, L(k.,k 1 ,w)
Reissner-Mindlin plate theory. We will then employ a dis- can be obtained for the plate equations by seeking a har-crete counterpart of the continuum theory when the plate monic solution of the formis approximated by a four-noded plate bending element.
We will then compare the continuum and symbolically gen- u(z, y, t) u(oYo, to) exp[j(wt - kz - kyy)] (4.7)
erated discrete equations in the Fourier domain in order to
gain insight into the effect of mass matrices on the accuracy where a is the circular frequency and k ) are the (z, y)-
of vibration frequencies. We have found that our symbolic di nto wave yies
analysis of the discrete plate equations obtained from the (4.7) into (4.4) yields
finite element plate bending equations provides a qualita- -Dk -Di 2 kk v -jDk"
tive measure of solution accuracy. Numerical experiments -DItk 2
corroborate symbolic analysis results; for the case of the -D,four-noded plate bending element, the mass matrix that +Iw 2
yields a best accuracy is determined to be -D,,Dk2 kL(k., ky,w) =V
Ikew = 0.25Mr,, + 0.75 oo (4. 1)-D
We now present a detailed analysis and some nuerical +IW2
results. sym. -D.t2
The Reissner-Mindlin plate equations can be expressed in (4.8)
differential matrix form as (Park and Flaggs, 1985): with the corresponding uncoupled continuum Fourier op-
erator given by
L(x,y,t) "u = f (4.2)
where n = [1r; 2+IW2)(E,2+ f02) _
U = LO,TwJ, f = L,0, qJ (4.3) Lk D, j
D 1 2 ~ e DA
+DIX where the Fourier-transformed V2 operator is defined v.
-(k2 + 0) ( .4. 10)
L(zyt). D The Fourier-transformed continuum matrix operator,
+ D I IL(k.,ky,w), given by (4.8) and the uncoupled continuum--D, operator, 1 , by (4.9) will serve as our reference equations
with which the corresponding discrete operators and char-
____ acteristic equAtions will be compared in order to assess thesyrn. -D.V1 effect of mass lumping on plate vibrations.
(4.4) The general discrete Fourier operator that approximates
in which V2 is defined as the continuum case by the four-noded plate bending ele-ment can be shown to be (Park and Flaggs, 1985):
1537
t _
S= Table 2 summarizes the vibration analysis results with dif-ferent mass matrices for a plate with free edges. In com-puting the errors in frequency computations by the four-
-DEi, -D121,V, -jDK,-Xi, noded element, we have assumed that the frequencies given-DtDLkXz in (Leissa, 1969) to be the converged answers. The finite el-
-DXl ement analysis results indicate that the discrete frequency+rW2[( - a) vs. wave number curve shown in Fig. 4 overestimates the
+1.1,) frequency error from the above. We conjecture that this
-Di -(4 11) overestimation manifested in the present discrete Fourier-Dt1!.2X, analysis may have been due to the failure of the four-node-+D#x(i- element to satisfy rigorously the free-edge conditions.+ZiW2(1 - 0)+15i,j
Table 2. Frequency Errors Computed by Different MassD.(i, + , Models for Free-Free Plate
MASS TYPE MODE I 4x4 Mesh I 1Ox16 Meh I 36x36 Mesh1 -,30.1 -8.5 -,3.9
where the discrete fourier numbers, k, and k,, and the Lmepd 2 -32.8 5 15.7 -8.4so-called directional averaging operators, 1(-.,) and X(z.y), M tn -25.5 -1.2 -o.7
a = 0.0 4 - 19.4 -10.0are given by s - - .20.6
1 -24.7 -.2 -2.72 .26.4 11.5 .60{ = sin(k=/2)/l 3 /2 .25 3 .17.7 -7.0 .3.4
1-!.~s1L .17.7 -3.6 r 1.31(2 1, - (., ) ( .2 .11.7 -. 8 .-3.4
a= .50 3 -8.0 .2.1 -0.9and a is a coefficient defined in the mass matrix modeling 4 - .t .4.2
formula (2.16). 1 S.5 -0.7 -0.22 .5.1 -1.2 -0.7
The relation between the frequency vs. the wave number a= .75 3 6.1 3.7 1.84 - 2.2 -0.8for the continuum and the discrete cases can be obtained 5 - I - 8.2
by requiring the determinant of (4.8) and (4.11) to be zero. 1 4.8 2.4 1.1Consit.l 2 16.1 5.6 2.4Figure 4 shows the normalized frequency (Wt 2 /D ) vs. M 3 29.8 10.6 4.8
the wave number (k) for the continuum and the discrete a = 1.0 I .S 2.scases with various mass matrix choices based on (2.16). It s - 2.
is noted that the two cases correspond to an infinite plateor so-called interior solutions. Judging from Fig. 4, one The mode-by-L:.de convergence characteristics for the firstmay conclude that the choice of a = 0.5 (the average of thee modes ae shown in Figs. 5-7. With the exceptionthe lumped and the consistent matrices) should perform of the first mode with a (2 x 2)-mesh, it is observed thatbest for plate vibration analysis. For plates with finite the choice of a = 3/4 consistently yields the most accuratedimensions as we shall see, the influence of boundary edges results.must be taken into consideration in utilizing the dispersioncurve for selecting an appropriate choice of mass matrix.
5.05.0
consistent mas(a = 1) .// 0.0.improved cosistent mam(a =.o) /.
>. [ -- improved consistent mss(a = .25) ./ -5.0 - ...
Z -continuum ,-"
3.0 -- lumpedm (a ) -10.0"0
-15.0
• - 0.0 ."-consistent mw.(a fi1
1.0 -25.0 improved ma(a 50)...... Imped man(a 0)
-30.0
0.00240.0 0.5 1.0 1.5 2.0 2.S 0 4 6
NUMBER OF ELEMENTS ALONG SIDENOR.MALIZED WAVE NUMBER (Al)Figure 5. Error in First Mode for Different Mass Model-
Figure 4. Dispersion Curve for Plate (V = 0.343) ing
1538
20.0 the mass and stiffness matrices to capture inermediate fre-quency components more accurately. It should be notedthat the accuracy of intermediate frequency components
10.0 -- conistnt m.N(a =1) based on the finite element methods is increasingly impor-- -improved mass(a =.5) tant in the areas of control of flexible structures, underwa-
.0 Proved mau(a =.0) ter acoustics and wave propagation through compositesa: ...... lumped mam(a =0) -...
-1o.0 -o.~ ° ." ....................... Acknowledgements.
-20 " The work reported herein was partially supported by Naval.•Research Laboratory under Contract N00014-87-K-2118.
-30.0 - We thank Ms. Louise Schuetz of NRL for her interest.
0 2 4and encouragement. We also thank Prof. Charbel Farhat
NUMBER OF ELEMENTS ALONG S for his assistance in our symbolic eigenvalue analysis usingMACSYMA.
Figure 6. Error in Second Mode for Different Mass Mod-eling References
1. Archer, J. S., (1963), "Consistent Mass Matrix for30.0 Distributed Mass Systems," J. of the Structural Diui-
-consistet m-(a = I sion, ST 4, pp. 161-178,
20.0 ---improved ma(s(a= .75) 2. Balant, Z. P., (1978), "Spurious Reflection of elastic--- improved ma,,(a = .SO) waves in non-uniform finite element grids," Computer
10.0 lumped mts(a =.2) Methods in Applied Mechanics and Engineering, 16,0. 91-100.
0 0.0
Ca.. 3. Baiant, Z. P. and Celep, C., (1982), "Spurious Reflec--10.0 ........................... tion of Elastic Waves in Nonuniform Meshes of Con-
stant and Linear Strain Elements," Comp. and Strue.,-20.0 ....... 15, No. 4, pp. 451-459.
.0 2 4. Belytschko, T. and Mullen, R., (1978), "On DispersiveProperties of Finite Element Solution," Modern Prob-4 flems in Wave Propagation, Edited by J. Miklowitz and
NUMBER OF ELEMdENTS ALONG SIDEJ. D. Achenbach, John Wiley and Sons, New York, pp.67-82.
Figure'7. Error in Third Mode for Different Mass Mod-eling 5. Celep, Z. and Turhan, D., (1987), "Transient Wave
Propagation in Constant and Linear Strain Elements,J. Sound and Vibration, 116(1), 15-23.
5. Discussions 6. Flaggs, D. L., (1988), Symbolic Analysis of the FiniteIn the present paper a systematic technique of obtain- Element Method in Structural Mechanics, PhD Thesis,ing the lumped mass matrices for vibration and transient Stanford University.analyses has been proposed. The technique is based onthe symbolic discrete Fourier analysis and reduces to the 7. Fried, I. and Malkus, D. S., (1975), "Finite Elementknown mass lumping techniques for simple elements. Mass Matrix Lumping by Numerical Integration with
No Convergence Rate Loss," Int. J. Solids Structures,In order to further increase the accuracy of frequency com- 11, pp. 461-466.putations, an improved form of mass matrix is proposed,which is a combination of the lumped and consistent mass 8. Hinton, E., Rock, T. and Zienkiewicz, 0. C., (1976),matrices. Numerical results conducted for a plate with "A Note on Mass Lumping and Related Processes infree edges indicate that substantial improvements in in- the Finite Element Method," Earthquake Eng. andtermediate frequency computations can be realized if one Structural Dynam., 4, pp. 245-249.judiciously employs a combined mass matrix.So far the present work has focused on modifying the mass 9. Krieg, R. D. and Key, S. W., (1973), "Transient ShellSo fr te peset wrk as ocusd o moifyng he ass Response by Numerical Time Integration," Int. i.
matrices in order to improve the frequency accuracy. Our Numerical Methods in Eng., 7, pp. 273286.
future work will focus on the combined tailoring of both
1539
10. Leckie, F. A. and Lindberg, G. M., (1963), "The Ef-fect of Lumped Parameters on Beam Frequencies,"Aeronaut. Quarterly, 14, pp. 224-240.
11. Leissa, A. W., Vibration of Plates, NASA SP-160,Washington, D. C., 1969, 87-114.
12. Park, K. C. and Flaggs, D. L., (1984), "A FourierAnalysis of Spurious Mechanism and Locking in theFinite Element Method," Comp. Meth. in Appl.Mech and Engng., 46, pp. 68-81.
13. Park, K. C. and D. L. Flaggs, (1985), "A SymbolicFourier Synthesis of a One-Point Integrated Quadri-lateral Plate Element," Comp. Meth. Appl. Mech.Engr., 48, 203-236.
14. Rock, T. and Hinton, E., (1974), "Free Vibration andTransient Response of Thick and Thin Plates Usingthe Finite Element Method," Int. J. Earthq. Engng.Struct. Dyn., 3, pp. 51-63.
15. Tong, P., Pian, T. H. H. and Buciarelli, L. L., (1971),"Mode Shapes and Frequencies by the Finite Ele-ment Method Using Consistent and Lumped Mass,"J. Comp. Struct., 1, pp. 623-638.
16. Vichnevetsky, R. and Bowles, J. B., (1982), FourierAnalysis of Numerical Approximations of HyperbolicEquations, SIAM Studies in Applied Mathematics,Philadelphia, Pa.
1540
INTERNATIONAL JOURNAL FOR NUMERICAL METHOCS IN ENGINEERING. VOL. 30. 27-55 (1990)
TRANSIENT FINITE ELEMENT COMPUTATIONS ON65536 PROCESSORS: THE CONNECTION MACHINE
C. FARHAT. N. SOBH AND K. C. PARK
Department of Aerospace Enqineermg Sciences and Center fir Space Structures and Controls. University of Colorado atBoulder. Boulder CO 30309-0429. U.S.A.
SUMMARY
This paper reports on our exp. ,ence in solving large-scale finite element transient problems on theConnection Machine. We begin with an overview of this massively parallel processor and emphasize thefeatures which are most relevant to finite element computations. These include virtual processors, paralleldisk IiO and parallel scientific visualization capabilities. We introduce a distributed data structure anddiscuss a strategy for mapping thousands of processors onto a discretized structure. The combination of theparallel data structure with the virtual processor mapping algorithm is shown to play a pivotal role inefficiently achieving massively parallel explicit computations on irregular and hybrid two- andthree-dimensional finite element meshes. The finite element kernels written in C*,'Paris have run withsuccess to bolve several examples of linear and non-linear dynamic simulations of large problem sizes. Fromthese example runs, we have been able to assess in detail their performance on the Connection Machine. Weshow that mesh irregularities induce an MIMD (Multiple Instruction Multiple Data) style of programmingwhich impacts negatively the performance of this SIMD (Single Instruction Multiple Data) machine.rinally, we address some important theoretical and implementational issues that will materially advance theapplication ranges of finite element computations on this highly parallel processor.
1. INTRODUCTION
Parallel computers are having a profound impact on computational mechanics. This is reflectedby the continuously increasing number of publications on finite elements and parallel processing.Not only have some computational strategies been re-designed for implementation on commer-cially available multiprocessors, but also some innovative algorithms have been spurred by theadvent of these new machines. However, many of the reported parallel finite element simulationshave been on systems with a few processors. Examples of these systems are Intel's iPSC with 32processors (reported by Farhat and Wilson'), JPLCaltech's hypercube with 32 processors,-Alliant's FX8 model with 8 processors '-' and CRAY's systems with up to 4 processors.5 (Formore complete lists of references on this topic see White and Abel 6 and Noor." While greatspeed-ups were measured on these coarse to medium grain machines, Farhat8 has shown thattraditional vector supercomputers could not be outperformed in finite element simulations(except of course on systems which connect more than one vector superprocessor, such as theCRAY X-MP and CRAY-2 systems. each of which has 4).
Recently, massively parallel machines have demonstrated their potential to be the fastestsupercomputers, a trend that may accelerate in the future. While solving the shallow waterequations, McBryan has reported that the Connection Machine (CM_2 in tht: s.:uel) (65 536processors) was three times faster than the four-processor CRAY X-MP.' Gustafson et al. have
0029-5981/90/090027-29S 14.50 Received 11 March /989'0 1990 by John Wiley & Sons. Ltd. Revised 12 September 1989
28 C. FARHAT. N. SOBH AND K. C. PARK
developed highly parallel solutions for baffled surface wave equations, unstable fluid flow andbeam strain analysis. and have reported performances on NCUBE's 1024-processor hypercubewhich are close to those of vector supercomputers.1 0
The objective of the present study has been: first, to evaluate the multiprocessing features of theCM_2 that are relevant to finite element computations: second, to develop a suitable finiteelement data structure which exploits the system architecture; third, to implement a decomposi-tion/mapping procedure that matches as far as possible the layout of the processors to the finiteelement meshes; and fourth, to assess those implications of finite element analysis on the CM_2that should be considered in the design of future massively parallel processors. Hence. we focusprimariiy on implementational issues that are critical for the full exploration of the multiprocess-ing capabilities of the CM_2. and only secondarily on solution algorithms, as far as they impactthe present study on implementational issues.
The finite element equations of motion for structural systems can be expressed as
M + F"(d, d) = (I)
where M denotes the positive definite lumped mass matrix. F'" and F" denote the internal andexternal force vectors, and d, d and d denote respectively the acceleration, velocity and displace-ment vectors. In the linear case, the internal force vector becomes
F" = Dd + Kd (2)
where D and K are the damping and stiffness matrices respectively, which are positive semidefinite. In this work, an eventual damping is assumed to be proportional to the mass andstiffness.
The algorithmic nature of a candidate solution method for the structural dynamics equation (1)can significantly influence the software requirements, data communications and arithmeticefficiency. As our main focus is on implementational issues rather than algorithmic ones. we havedecided on a simple explicit time integration procedure. Hence, we choose to integrate equation(1) with the fixed step explicit central difference algorithm because (a) it is inherently parallel, and(b) it has the largest undamped stability limit among second-order accurate explicit linearmultistep algorithms, as has been demonstrated by Krieg.' 1 In our context, it expressed as
a12 = ain-
42 + MM - I(F" (tn) - F (d, d')) (3)
d"+ 1 = dn - hl "+ 112
where h is the fixed time step and the superscript n indicates the value at the discrete time in.The remainder of this paper deals with the massively parallel solution of (1) using (3). and is
organized as follows. In Section 2, we give an overview of the CM_2 hardware configuration andemphasize those features which are pertinent to finite element computations. In particular. weaddress issues that are related to the processor memory size. to the SIMD architecture and to thefast interprocessor communication package. the NE WS grid. In Section 3, we discuss the floatingpoint arithmetic performance of the CM_2 and highlight its current dependence on the selectedlanguage compiler Algebraic manipulations coded in *Lisp are shown to be three times as fast aswhen written in C*. A general purpose finite element distributed data structure is presented inSection 4. Designed originally to handle massively parallel finite element explicit computationson irregular and hybrid meshes, this parallel data structure is also very efficient for parallel IOmanipulations and parallel graphic animation. Since the often-encountered mesh irregularitiesinhibit the use of the NEWS grid communication package. we discuss in Section 5 an alternatvedecomposition, mapping strategy. The decomposition technique is designed to minimize both the
TRANSIENT FINITE ELEMENT COMPUTATIONS 29
amount of communication between different chips and the amount of wire contention withina chip. The mapping algorithm attempts to reduce the distance that information must travel.Section 6 summarizes the overall organization of the massively parallel transient simulation. InSection 7, our parallel data structure and processor mapping are applied to (3) for the solution ofvarious large-scale transient problems. Measured performances are analysed in detail. Meshirregularities are shown to be the source of several factors which considerably slow down themachine. Finally, in Section 8, we address some important theoretical and implementationalissues that will materially advance the application ranges of finite element computations on theCM_2. In particular, we note that time integration numerical algorithms such as explicit finitedifferences and equation solvers such as the preconditioned conjugate gradient are implementedusing the same parallel data structure and mapping algorithm which are presented in this paper.We compare the substructuring technique and the virtual processor approach, and comment onthe implications of implicit algorithms for the effective use of the CM_2.
2. THE CONNECTION MACHINE HARDWARE ARCHITECTURE
Here we present an overview of the CM_2 system organization and discuss issues that arepertinent to massively parallel finite element computations. See Hillis' 2 for an in-depth discussionon the rationale behind the CM_ I (a previous model of the Connection Machine), the TechnicalSummary of Thinking Machines Corporation 3 for further architectural information andMcBryan9 for initial studies of scientific computations on the CM_ 1. For the sake of clarity, wesummarize the architectural features before discussing their impact on finite element simulations.
2.1 System organization
2.1.1. CM_2: The parallel processing unit. The CM_2 is a cube 1-5 m on a side, made of up toeight subcubes (Plate 1). Each subcube contains 512 chips and every chip includes 16 bit serialprocessors which are connected by a switch. Each individual processor has 64 Kbits (8 Kbytces) ofbit-addressable local memory and an arithmetic-logic unit (ALU) that can operate on vari-able-length operands. Every two chips may share an optional Weitek floating point acceleratorchip. A fully configured CM_2 thus has 4096 (2"2) chips, 2048 floating point accelerator chips,65536 processors and 512 Mbytes of memory. The chips are arranged in a 12-dimensionalhypercube. A chip i is directly connected to 12 other chips j, with the binary representation ofi andj differing only by I bit. The CM_2 system provides two forms of communication betweenthe processors.
" A general mechanism known as the router which allows any processor to communicate withany other processor. Each CM_2 chip contains one router node i which serves the 16processors on the chip, numbered 16i to 16i + 15. The router nodes on all the chips are wiredtogether in a 12-dimensional Boolean cube and together form the complete router network(Figure 1). For example. suppose that processor 117 (processor 5 on router node 7) hasa message to send to processor 361 (processor 9 on router node 22). Since 22 = 7 -- 2" - 20.
router 7 forwards the message to router 6 (6 = 7 - 20) which forwards it to router 22(6 + 2-). which delivers the message to processor 361.
" A more structured and somewhat faster communication mechanism called the NEWS grid.Each processor is wired to its four nearest neighbours in a two-dimensional rectangular grid(Figure 2). Communication on the NEIS grid is extremely fast and recommended wheneverit is possible.
30 C. FARHAT. N. SOBH AND K. C. PARK
Figure 1. The router network
N
W7 E
S
Figure 2. The NEWS grid
An important practical feature of the CM_2 is the support for virtual processors. When theCM-2 is initialized for a run, the number of virtual processors (vp in the sequel) may be specified.If it exceeds the number of available physical processors, then the local memory of each processoris split up into a number of regions equal to the ratio between the number of vps and the numberof physical processors. Automatically, for every Paris (PARallel Instruction Set) instruction, theprocessors are time-sliced among the regions. If a physical processor is simulating N vps, eachParis instruction is decoded by the sequencer (as explained below) only once for V executions.This results in an enhanced user performance. Also, the use of a vp > I allows the pipelining offloating point operations in the Weitek chips, which provides an additional enhancement tomacline performance. The system organization of a CM_2 is shown in Figure 3.
The CM_2 is an SIMD machine. All processors must execute identical instructions or someprocessors may choose to ignore any instruction. Consequently, an instruction which involvesa nested binary branch can see its execution time increased by a factor of two. The SIMD natureof the CM_2 has some disadvantages in finite element computations, as will be shown.
2.2. Impact on finite element computations
It is well known that the solution algorithm (3) can be implemented using only element-levelcomputations. Hence, if each vp of the CM_ 2 is mapped onto one finite element, equation (1) canbe efficiently integrated in parallel. The rationale behind this processor-to-element assignment
- I
- CM 2
Plate 2. Discretization and decomposition of a tapered beam
Plate 3. Irregular celi
TRANSIENT FINITE ELEMENT COMPUTATIONS 31
Front end 0
(DEC VAX or
Symbolics)
Bus interfaceConnection Machine
Parallel Processing Unit
Front end IConnection Machine Connection Machine (DEC VAX Of
processors processocs Smois
04 3
eqeeSequencer Symbolics)
-4, 0- L- 2 - a I -Incf--Ci
processors processors Front end 3
Syrmbolics)
Connection Mac:.iae 110 System L Bus interface
Data Data Dat GrahicNetwork
Figure 3. Systcm oranization of a C.M2
will be analysed in Sections 4 and 8. Here. wC discuss the direct impact of the CM2 hardware on
such a decision.
2.2.1. Thze local memory and element lerel compuations. Consider the 9-node curved shell
element shown in Figure 4. Three displacements and two rotations are attrbuted to each node.
which amounts to a total of 45 degrees of freedom per element. Consequently. the symmetrc part
of the elemental stiffness matrix. Ki'e. contains 45(45 1i),l2 = 1035 words. If double precision is
used. the storage of K"' amounts to 1035*64 = 66 240 bits, which exceeds the 65 536 bits that are
available on a single CM2 processor. On the other hand. if single precision is used. the storage of
K"' requires 33 120 bits. so that 32416 bits are left for the storage of the vectors dt". dr'. theelemental lumped mass vector M"' and the forces F"' and P'". However. even in the latter
/
32 C. FARHAT, N. SOBH AND K. C. PARK
Figure 4. A 9-node shell element
N
Figure 5. A two-step NEWS mechanism on a regular mesh
case, only a vp ratio of I can be used. This limits the size of the finite element mesh to themaximum number of processors available on the CM_2 at hand. Also, it inhibits furtherperformance enhancement, as outlined in Section 2.1.
Forrurately, in our case the above storage requirements can be considerably decreased. Thenature of explicit computations is such that F'"(d") can be directly computed from the displace-ments at t" and the stress-strain conctitutive equation. As a result, the solution process defined in(3) involves only ,ector quantitics which do not require a large amount of storage, so that vpratios between I and 4 are possible. However, the readr s),-,uld keep in mind that the currentlocal memory size of a CM_2 processor may penalize sophisticated high order elements andimplicit finite element algorithms in general. This ,estriution is not encountered on othercinmercially avallae hypercub,.s such as iPSC, NICUBE ana AMETEK among others.
2.2.2. The NE"'S grid and finite element patches. Consider the regular 'Unite element meshshown in Figure 5. Except on the bounda-ies, each, element is coinected in the same pattern toexactly eight other elemer ts. Consequen'ly, during the explicit time integration algorithm, eachprocessor commitnicates wit.) its neighbour.- in the same manner. Interprocessor communicationcan be performed with a two-.;tep NEWS mi-chanisin (Figure 5). However, the beauty of the finiteelement method resides in the fact that it solves modes v;ith irregulir meshes. Typically, a finite
TRANSIENT FINITE ELEMENT COMPUTATIONS 33
Figure 6. Transition zones
element mesh consists of several patches which are connected together using irregular transitionregions (Figure 6). For these often encountered cases, the NEWS grid becomes impractical.Rather, the router has to be utilized. In Section 4. we describe how a distrbuted data structure canguide the router during this process.
2.2.3. SIMD hardware vs. MIMD finite element computations. Typical finite element meshescomprise more than one type of element. Consider the case where a discretized region is modelledwith shell elements that are stiffened with beam elements. Clearly, the instructions associated withthe shell elements differ from those associated with the beam elements. Consequently, the vpswhich are assigned to shell elements and the vps which are assigned to beam elements cannotexecute their segments of code in parallel; for example, the be-m processors have to execute first,then the shell processors. If T and T, denote the execution time: associated with the instructionsfor i beam and a shell element respectively, the total elapsed parallel time for a single instructionoi er the set (beams + shells) on an SIMD multiprocessor is Tb + T. On an MIMD multiproces-sor, this elapsed parallel time is max(Tb + T,). Similar situations arise when during the loadingsome elements turn to be materially non-linear and some remain linear. In this case, one shouldalways compute the linear component of the response (the elastic stiffness for example) beforeattempting to test the yielding criterion. However, in spite of these disadvantages SIMDprograms can still be attractive, because they tend to be easier to debug and rarely suffer from thesynchronization errors which are typical of MIMD codes.
2.2.4. Parallel I/0 in finite element computations. At each time step, the computed displace-ments, velocities, accelerations as well as strains and stresses need to be stored on disks. Thisrepresents a significant amount of 1, 0 traffic. It has been our experience that the CM_2 DataVault system is efficient at reducing the corresponding elapsed time (see Section 7).
2.2.5. Real-tne graphics antmations. The massively parallel real-time animation of the meshdeformations is a direct consequence of the availability of the Frame Buyfer and decision ofassigning a vp to a finite element. At each time step, after the node displacements are found all ofthe vps concurrently draw the outline of their assigned elements on the graphitL screen. The resultis a real-time finite element animation.
34 C. FARHAT. N. SOBH AND K. C. PARK
3. BENCHMARKING THE CM-2
At the time of writing this paper, the CM_2 supports three high level languages: C* (pronouncedsee-star), *Lisp (pronounced star-lisp) and CM-Lisp (pronounced see-m-lisp). The first two areextensions of C and Lisp respectively. Paris is somewhat the assembly language of this parallelprocessor.
In this section we comment on the results of a set of timing experiments that were carried outon the CM_2 of the Center for Applied Parallel Processing (CAPP), at the University ofColorado. Boulder. Since only one eighth of a cube was available on this system. all results wereobtained using 8192 processors. McBryan 9 has shown that all results demonstrated on subcubesof the CM_2 scale essentially linearly to the 65 536 processor system. Consequently, throughoutthis paper, megaflop rates are reported after they are linearly scaled to the full configuration.These experiments provided us with:
" a reference performance for the evaluation of our approach to massively parallel finiteelement explicit computations
" the influence of the vp ratio and that of the high level language compiler on attainableperformances. At this point, we remind the reader that, if an application requires an amolof local memory (per processor) in, the highest vp ratio possible is equal to the closestof two to the ratio between the maximum amount of local memory available on the m(currently 8 Kbytes), and rn.
Table I reports the megaflop rates for some scientific computations on the CM_2 at different vpratios. All statements were written in C*. Each statement is performed by each processor on itsvariables. All variables were declared parallel (local) and float (simple precision), except variabledp which was declared mono (serial) float, and variable i which was declared mono integer.Timings were measured using the cintimer routines. Each ' + ' operation or '*' operation wascounted as one flop.
Based on these results, we have observed the following:
1. Floating point performance is enhanced at higher vp ratios. This is due to the fact that for vpratios greater than one, computations in the Weitek chip are pipelined.
2. Vector saxpys are not slower than scalar ones. This is because memory addresses arecomputed on the front end. The additional speed noticed for vector saxpys is thought to bedue to the overlapping of addressing and floating point computations.
3. C* appears to handle poly (parallel) assignments poorly. This can be seen by comparing theperformances of the dot product and the vector multiply. Each of these two vector
Table 1. Megaflop rates using C*
Parallel processor = CM_2-Language = C*-Variable = float
Statement vp ratio
1 2 4 8 16 32 64 128 256
y[,] + = '*x[,] 740 808 848 850 880 - - - -
y = y + Oc*x 569 654 699 728 743 761 778 791 800z = x*y 409 485 535 569 579 585 600 610 623dp+ =xy 202 359 583 839 1075 1240 1348 1400 1500
//
/
TRANSIENT FINITE ELEMENT COMPUTATIONS 35
operations requires one floating point per processor. In addition, the dot product requiresa reduction (accumulation phase) which necessitates communication. However, at high vpratios, the dot product is twice as fast as the vector multiply! (At low vp ratios, the amount offloating point computations is not large enough to amortize the price of communication.)Since the dot product does not store any value in the processor memory and the vectormultiply stores the result of x * y back into z, this leads us to believe that the C* compilergenerates a code which is very inefficient at handling assignments. This also explains why thesaxpy exhibits a higher megaflop rate than the vector multiply: it has twice as many floatingpoint computations for onte assignment.
The same computations were repeated using *Lisp. The comparison of both sets of timings forthe maximum vp demonstrates a formidable superiority of the *Lisp compiler (see Figure 7). Thisis partly due to the fact that it has been used longer on the CM_2 than C*. In spite of the provensuperior efficiency of *Lisp over C*, we have chosen to implement our finite element code usingC* because of our familiarity with C.
4. FINITE ELEMENT PARALLEL DATA STRUCTURES
Consider again the explicit central difference algorithm:
a+ 12 = an- 1/2 + hM -I(FX(ti n) - F'n(d ", d))
d + I = dn + hla" 2
The global mass matrix M is assembled once. At each time step 0, the computations aredominated by the evaluation of the internal forces:
F" f [LS]radfI
where a is the stress vector, S are the shape functions, L is a partial derivative operator and f2 leI isthe area of the eth finite element. Clearly, the parallel computation of Fin is best done el-ement-by-element. Thus, equation (1) can be efficiently integrated in parallel if the CM_2 virtualprocessors are mapped onto the elements of the mesh. This is a departure from the grid pointmassively parallel computations advocated by Thinking Machines Corporation for the CM_2.t 3
First. all processors compute concurrently the local forces F"t'(0f) and Fi"n'l(d, dn). Next, thesecontributions are accumulated through communications among processors that are mappedonto neighbouring elements.
In this section, we describe the finite element data structures which we have selected to drive themassively parallel computations on the CM-2. These are element oriented, while similar datastructures proposed for other hypercubes are subdomain oriented (see Farhat et al."4 and Foxet al." S). In Section 8, we give further comments on this difference. We group these data structuresinto two sets.
The first set of data structures deals with element-level parallel computations. To be able toperform locally its assigned element-level computations-that is, to perform these computationswithout interacting with the front-end machine -each processor must store in its own memory itselement type (truss, beam. shell ..... number of Gauss points, . . . ), its element materialproperties (density, parameters and coefficients for constitutive equations, damping characteristi-cs, thickness, . . .), its nodal geometry (nodal co-ordinates, number of nodes per element) and itsboundary conditions (fixed, free degrees of freedom at each node, prescribed forces at each node).
36 C. FARHAT, N. SOBH AND K. C PARK
-- 7
0.z
MME.)
MoIm
..... .... . ..
...... ..... .....C, aC) C>a CD0 C C (D C 0 C 0 0 C a C 0 C5 C1 02 .
mmnn no C)NN NN~q N&,-- -- ? -- Q - 2 mr,(nv0
TRANSIENT FINITE ELEMENT COMPUTATIONS 37
This information is compacted in one-dimensional arrays. In addition, each processor must alsostore in its memory a set of scalars corresponding to computational parameters such as the fixedtime step h, and a scalar or one-dimensional buffer for the temporary storage of messages to bepassed to neighbouring processors.
The second set of data structures provides the router with the mechanism for parallelinterprocessor communication. The inability of the NEWS grid to handle irregular communica-tion patterns has been addressed in Section 2.2. Let p denote a virtual processor and ep itsassigned finite element. In order to exchange Fin e)(an) and F"I'((tn), virtual processor p must beable to identify at run time:
" the set of processors mapped onto elements adjacent to ep" the nodes that ep shares with these elements" at each shared node, the degrees of freedom which need to be assembled.
This particular information is vital for meshes with different types of elements. It guarantees that,for example, a moment is not accumulated with a force, or that a force in the x direction is notaccumulated with a force in the y direction.
If the above information is gathered in a global form on the front-end machine, most of theexecution time which elapses during the accumulation phase would be due to message-passingbetween the CM_2 processors and the front-end computer. On the other hand, if this informationis decentralized-that is, if the memory of processor p is loaded only with the subset of thatinformation which is relevant to the connectivity of e--the accumulation phase can be performedwithout any message-passing between the CM_2 and the front-end computer. Consequently,prior to any computation, the memory of processor p is loaded with the following one-dimen-sional arrays:
Proc.attto-node For each node connected to ep, it contains the identification of the processorsthat are mapped onto elements which are also connected to this node. Theseare stored in a stacked fashion.
Pointer This is a pointer array. It stores in position i the location in Proc-att-to-nodeof the list of vps that are attached to the node in the ith local position.
Location For each entry in Proc- att-to- node, this array specifies the local position ofthe shared node in the processor that is mapped onto an element adjacent to ep
The above arrays are set up by the dedicated finite element mesh analyser which was prescntedby Farhat et alt ' They require about 80 integer words per processor. Clearly, this is a very smalloverhead. The mechanism of these arrays is depicted in Figure 8 for element I. The mesh patch iscomposed of shell and beam elements.
There is, however, one penalty associated with assigning one element to each vp. The nodeswhich are common to several elements are duplicated in their corresponding processors. Asa result, about II per cent of the total memory available on the CM.2 is wasted. This is a smallprice for the highly parallel computations that are achieved. Given the low cost of memorynowadays, this seems a worthwhile trade off. Moreover, this assignment allows IO manipula-tions and graphic post-processing to be trivially parallelized. At each time step, after the nodaldisplacements are found, all of the processors draw concurrently the outline of their assignedelements on the Frame Buffer and send back the results to the front end in parallel.
5. THE DECOMPOSITION/MAP NG STRATEGY
Since the mesh irregularities inhibit the exploitation of the NEWS grid, we rely on the datastructures of Section 4 to guide the router during interprocessor communication. Hoever. there
38 C. FARHAT. N. SOBH AND K. C. PARK
6 9
4 S
Element IProc-att-to-node [2. 3. 3. 2]Pointer [I1. 2.1 3, 5]Location [1 2, 1, 2]
Figure 8. A distributed data structurc for interprocessor commu,,ication
is still one additional problem to resolve. Efficiency in massively parallel computations requiresthe minimization of both the distance that information must travel and, more importantly, the'hammering' on the router. In the case of finite element computations, this implies that adjacentelements must be assigned, as much as possible, to directly connected processors, and contentionfor the wire connecting neighbouring chips must be reduced. This defines the mapping prob-lem-that is, it defines which hardware processor is to be mapped onto which finite element ofa given mesh.
Farhat 6 developed a heuristic algorithm for mapping massively parallel processors onto finiteelement graphs and presented some analytical results for corresponding efficiency improvemcnt.Basically, the algorithm searches iteratively for a better mapping candidate through a two-stepprocedure for the minimization of the communication costs associated with a specific parallelprocessor topology. Because it seeks a very fast solution for a machine with thousands ofprocessors, this algorithm does not guarantee 'the' optimal mapping. However, it has prulucedvery encouraging results on a variety of non-uniform two and three-dimension. meshes.
In this work, we adapt the mapping algorithm of Reference 16 to our target parallel processor,the CM_2. The 65 536 processors of this machine are packaged into 4096 16-processor chips, eachhaving its own router node. The 4096 router nodes are arranged in a hypercube of dimension 12.To cope with this topology, we proceed in two steps. First, we decomposes the given mesh into4096 submeshes, each containing 16 connected finite elements. Next, we apply the mapper given
TRANSIENT FINITE ELEMENT COMPUTATIONS 39
in Reference 16 to identify which hardware chip is to be mapped onto which submesh. Finally,within each submesh, the elements are numbered randomly between the chip number and thechip number + 15.
Given a finite element mesh, there are several ways to decompose it into 16-element submeshes(see for example Farhat 7 and Malone"8 ). Here, each submesh is to be assigned to one chip of theCM_2. In Figure 9, 10 and 11, we show two different decompositions for a discretized squaredomain, D, and D,.
Both decompositions yield 16 submeshes. each with 16 adjacent elements. Decomposition D,was designed to minimize the communication bandwidth-that is, the maximum number of
*-* Figure 9. Domain to be decomposed
Figure 10. Decomposition D1-bandwidth minimization
40 C. FARHAT. N. SOBH AND K. C. PARK
III -II J
II
Figure II. Decomposition D.-interface minimization
different chips with which any chip needs to communicate. It can be seen (Figure 12) that for D,the bandwidth equals 2, while for D, it equals 8.
It should be remarked that, if the substructuring approach" '1S had been chosen-that isassigning a subdomain to a physical processor, D, would have been more efficient than D_. Forthis decomposition, each chip would buffer the contributions of its interface nodes and send only
1 2 2 4 S 1 I S 10 11 12 13 14 is 1S
2 x x
4 I I
S I
6 I I
I I I
11 Ir I
S1 x
14
x x
1iu 1 . I
Figure 12(a). Incerchip communication pattern for D,
TRANSIENT FINITE ELEMENT COMPUTATIONS 41
t 2 3 4 S 6 I S 10 i 1 2 13 14 Is 1s
I x X X
2 X X X X I
3 X X X X I
4 X r X
S X X I X X
S X X X X X X X X
SX X X X X X X
aX X X I X
9 X X X X X
10 1X X X X X X X
II X X X X X X X X
12 X X - I
13 1 1 X
14 X X X I X
IS X X X X X
16 + 1 11 1X -X 1J -IXI I
Figure 12(b). Interchip communication pattern for D2
two messages. one to the chip at its left and another to the chip at its right. The decomposition D.requires the same chip to send up to 8 buffered messages. These messages would eventually beshorter, but would still render D 2 more expensive because of message start-up costs. However, wehave opted for a virtual processor approach-that is assigning one element to a virtual processor,for reasons that are given in Section 8. For this case. processors exchange information one node ata time, so that the number of interface nodes associated with a decomposition is more importantthan its bandwidth. The reader can confirm that decomposition D, delivers 255 interface nodes.while D, delivers only 93. Indeed, there is another equally, if not more important, reason why D,is better for the CM_2 than D. In the case of D1, all of the 16 processors of any chip communicatesimultaneously with a set of processors which are on the same neighbouring chip (Figure 1L". Thisgenerates a significant amount of contention for the single wire that connects these two chips. Inthe case of D, however, one can observe (Figure 14) that:
" for each chip, only 12 out of the 16 processors communicate with processors onto anotherchip
" only 3 processors out of these 12 communicate simultaneousli with the same neighbouringchip, so that much less contention occurs for the wire connecting the two chips. We recallthat each chip is connected with up to 12 other ones using 12 different wires which canoperate in parallel.
The decomposition D, was obtained using a general purpose finite element decomposerpresented by the first author in Reference 17. We advocate its use in conjugation with the mappergiven in Reference 16 for massively parallel computations on the CM_2. The efficiency improve-ment potential of this preprocessing phase is demonstrated with the following finite element wavepropagation problem. Plate 2 shows the discretization of a tapered cantilever beam. The beam is
//
42 C. FARHAT. N. SOBH AND K. C. PARK
i WIRE is I
_ __ _ _ _
WIRE x I
WIREISIWIRE I* It "WIREIP I
WIRE I
WIREIs 1WiREIu:
WIRE Is IEl" "
Figure 13. Wire contention induced by decomposition D,
Ciit -. RE
Figure 14. Wire traffic for decomposition D:
modelled with 4-node isoparametric elements and linearly elstic plane stress constituti've equa-tions. It is fixed at one end and subjected at the tip of the other to an impact point loading. Thewave propagation nature of the problem dictates the meshing technique to create elements whichare, as far as possible, of equal size. Since the beam is tapered, transition zones with irregularelements had to be introduced. Other mesh irregularities arc due to the presence of a region with
/"
//
TRANSIENT FINITE ELEMENT COMPUTATIONS 43
a hole. The complete mesh contains 8192 elements. which corresponds to an SK CM_2. The useof a naive mapping (element i into processor i - 1) would have resulted in a maximum routingdistance between adjacent elements equal to 9. Our decomposer, mapper reduces this distance to5. If EFF denotes the efficiency (speed-up per processor) of the parallel computations usinga naive mapping, and f is the factor by which the decomposer/mapper reduces the maximumrouting distance between adjacent elements, the theoretical improved efficiency 6 is given by
EFF* (5)
I f+E
For this problem, we have measured an efficiency EFF 40 per cent on an 8K CM_..2. Sincef= 9/5, the predicted improved efficiency is EFF* = 54 per cent. A second run of the problemusing the decomposer/mapper has revealed a measured improved efficiency EFF* = 60 per cent.The discrepancy between the predicted and measured improved efficiencies is due to the fact that(5) does not account for the wire contention problem.
6. FLOWCHART OF THE MASSIVELY PARALLEL TRANSIENT SIMULATION
The overall organization of the solution on the CM_2 of a transient dynamic problem using theexplicit central difference algorithm is depicted in Figure 15. It consists of four phases, namely:mesh preprocessing, data loading, number crunching and data unloading.
A conservative stable time step for the central difference algorithm is given by
2 (6)
where wl 1., is the maximum element frequency of the undamped dynamic problem. Belytschkohas pointed out that it is in fact usually not practical to compute the maximum eigenvalues of theelement directly, for this would increase the cost of computation considerably. 9 Instead,formulas for upper bounds on wl., have been recommended. However, on massively parallel
Read Input File (Front End)Decompose Mesh and Form Parallel Data Structure (Front End)
Load Parallel Data Structure (Front End - CM.2)
Compute Lumped Mass Matrix (CM.2)
Compute Crttical Time Step (CM..2)
Loop on Time Steps (Front End)
Compute Internal and _:ternal Local Forces (CM2)
Assemble Global Forces (Interprocessor Communication)
Compute Velocities. Duplacements. Strains and Stresses (CM.2)
Vtsuale Results (CM.2 - Frame Buffer)Archive Results (CM. 2 - Data Vault)
Figure 15. Solution of a transient problem on the CM.2
44 C. FARHAT. N. SOBH AND K. C. PARK
processor at sk S processort , k 11
I I i . I____%onji / AMjl I so 2
0,-
Ficeso so :n31poc 2o processoni r a r 1h
Figure 16. Interprocessor communication for a hybrid patch
processors such as the CM_2, the parallelism inherent in the computation of U4 is such that thisfrequency is obtained at the cost of the frequency of one single element.
The interprocessor communication mechanism for a mesh with more than one type of elementis illustrated in Figure 16. For the example shown, the 4-node elements are activated first. Theycommunicate in four steps, one node at a time. Next. the 4-node eleme:ts are de-activated and thetruss elements are selected. These communicate in two steps. As explained in Section 2.2. theserialization between different types of elements is due to the STMD nature of the CM 2.
7. EXAMPLES
In this section. we apply our approach to massively parallel finite element explicit computationsto the solution of various transient problems on an 8K CM_2 with Weitek accelerators. Weanalyse performance results in detail. We assess the efficiency of our decomposition.mappingstrategy at reducing communication time. We highlight the impact on machine performance ofvariations in mesh topology, finite element modelling and problem non-linearuies. We also reporton the performance of the Data Vault system for problems that arc IO bound.
For each example. two simulations were carried out. The first one assumed a linear elasticmaterial. In the second simulation, the material was assumed to hae an elastoplastic beha,,iourgoverned by a von Mises yield condition.
TRANSIENT FINITE ELEMENT COMPUTATIONS 45
7.1. El: Transient response of a cracked aluminium plate
The quarter of a mesh in Figure 17 was generated to study the dynamic response of a crackedaluminium plate under a uniform time varying loading. The full mesh contained a total of 4008plane stress elements and 4073 nodes. Mesh irregularities were induced by transition zones. TheNEWS grid could not be used.
7.2. E2: Ware propagation in a three-dimensional bar
The second example considered was the impact of a metallic ball on an unsupported glassy bar.The bar was discretized using 8160 brick elements (Figure 18). The finite element mesh contained13 500 nodes and 40 500 degrees of freedom. Given the regularity of the discretization. the NEWSgrid was used for interprocessor communication. This example was also re-run using the routerfor performance comparison.
7.3. L3: Shuttle docking induced vibrations in a space station
This dynamic analysis was carried out to investigate the vibrations of a space station modelassembled from 5-m erectable struts. These vibrations were assumed to be induced by a shuttledocking. The finite element model (Figure 19) comprised 7584 three-dimensional truss elementsand 2304 nodes. It was generated by aligning identical cells along various axes. However. each cellby itself was irregular and did not allow the use of the NEWS grid.
7.4. E4: Three-dimensional glassy bar on an elastic foundation
The wave propagation example problem E2 was repeated with different boundary conditions.The glassy bar was assumed to be supported by a layer of foam. The mesh was comprised of
-il!
~IiII
F i .. . i A u.. 0 m h 1. - I....
Figurc 17. A quarter of a mcsh for a cr ke plaic
46 C. F~ARHAT. N. SOR AND K. C, PARK
Figure 18. Finite element discretization of a glassy bar
Figure 19. A space station model
TRANSIENT FINITE ELEMENT COMPUTATIONS 47
a total of 8164 elements (which is very close to the number of elements in the former mesh), ofwhich 1636 truss elements were used to model an elastic foundation.
7.5. Performance results and analysis
The large majority the code segments was written in C*. Occasionally we have used Parisfunctions to speed up some manipulations. Floating-point arithmetic was performed in singleprecision (32 bit words). Measured performance resultu are gathered in Tables I. III, IV, V andVI. The reported Mflops rates account for every integer and floating point operation, whetherused for addressing or number crunching. Only example E2 could make use of the NEWS grid.However, all timings, except those given in Table VI, correspond to runs where communicationwas carried through the router. Execution times are given in seconds and correspond to a sampleof 2000 time integration steps and a vp ratio equal to 1.
Table II. Overall measured performance for various transient finite element computations
Mesh Data loading Equation of motion SustainedExaam, pre-processing in the CM.2 solying Mflops
El-ei4.;, 1.04 see 5.47 see 861 see 400El-.-elastoplastic 1.04 sec 5.47 see 1033 see 480
E2--elastic 1.98 see 31-78 sec 4139 see 392E2-elastoplastic 1.98 see 31-78 see 4718 see 440
E3-elastic 1.28 see 13.56 sec 887 see 254E3-elastoplastic 1.28 see 13.56 see 896 see 256
E4-elastic 2.11 see 33.00 see 4770 see 340E4-1--stoplabtic 2.11 see 33.00 sec 5440 see 386
Table ill. Data Vault system performance
Solving equation Unloading results Unloading resultsExLmsw. i of motion on front end on Data Vault
El 861 see 5340 see 3.81 seeE2 4139 see 16400 see 12.61 seeE3 887 sec 9500 see 7.04 see
Table IV. Computation vs. communication
Solving equation Computation CommunicationExample of motion time time
El 861 sec 460 sec 401 secE2 4139 see 1959 see 2180 secE3 887 sec 260 scc 627 seeE4 4770 st 2340 see 2430 sec
48 C. FARHAT. N. SOBH AND K. C. PARK
Table V. True communication time
Computation Effective SoftwareExample time communication time overhead
El 460 sec 81 sec 320 seeE2 1959 sec 1380 sec 1280 secE3 260 see 146 sec 481 see
Table VI. Router vs. NEWS grid
Computation Communication time Communication timeExample time using the NEWS grid using the router
E2 4139 sec 560 see 2660 sec
The mesh pre-processing phase corresponds to the decomposition of the fini:e element mesh, asexplained in Section 5. It also includes the setup of the finite element parallel data structure, whichis then distributed across the processors. Both of these phases are shown to require relatively verylittle computer time. It can also be observed that, in the worst case, the non-linear computationsconsume only about 15 per cent additional time. This is due to the explicit nature of the radialreturn mapping algorithm that was used. Because of 'what you see is what you get', the reportedMflop rates should be compared to those measured in Section 3 and not to the theoretical peakperformance of the machine. It should also be noted that our C* code still leaves room for furthero.'imizations.
For examples El. E2, and E3, the computed displacements, strains and stresses were archivedon secondary storage after each time integration step. Two solutions were compared. In the firstcase, these results were brought back to the front end and stored in appropriate disk files. For thatcase, the measurements given in Table III demonstrate that the amount of involved I/Odominated the simulation total time. In the second case, the results were transferred in paralleldirectly to a Data Vault system. The speed-up provided by the Data Vault is shown to be of theorder of 1400! This parallel I/O capability is what was most lacking on earlier hypercubes.' 8
If Tp and Tm,, are respectively the computation parallel time and the communication paralleltime, and NP is the number of available processors on a given parallel machine, the achievedefficiency (speed-up per processor) can be expressed as
EFF = I NPT, 1N, T:,+ T. I +T-
TIP
The results given in Table IV indicate that efficiencies of 53, 47, 29 and 49 per cent are achievedrespectively for examples El. E2, E3 and E4. If one refers to the performance results of Section 3,it can be seen that the sustained Mflop rates reported in Table II are consistent with theseefficiencies. At the first glance, these efficiency results appear to be very pessimistic. However. theyare well above the 10 per cent often obtained on current vector supercomputers.20 The reader canobserve that the timing results for example E4 are very close to the cumulative timings ofexamples E2 and E3, which illustrates the impact of the SIMD nature of the CM_2 on the MIMD
TRANSIENT FINITE ELEMENT COMPUTATIONS 49
nature of finite element computations. It should also be noted that. while rE. :ommunication timeis fixed for a given mesh, the computation time increases with the complexity of the analysis.Thus, highly non-linear formulations which include large deformations are expected to yieldhigher efficiencies than those deduced from Table IV.
At this point, we give further details regarding interprocessor communication in the context offinite element explicit computations. As outlined in Section 5, the finite elements of a meshexchange their local contributions one node at a time. For a given finite element, this informationexchange procedure is organized around two nested loops. The outer loop is carried over thenodes that are connected to this element. The inner loop is carried over the neighbouringelements that are attached to each local node. Using a C notation, this is written as:
for (node = I; node < my._nodes; node + +) (7)
start = pointer[node]; stop = pointer[node + 1] - 1;
for (position = start; position < stop; position + +) (8)
{neighbor = proc.att-tonode [position];
exchange (variable, myself, neighbor);
}}
where ny-nodes is the total number of nodes that are connected to a given finite element andproc-att-to.node is the array containing the identification of the neighbouring eiements. Clearly,these variables are element dependent. The total number of communications to be performed byone processor is determined by the product P," = d * (pointer[mynodes + 1] - 1) which is bothelement and mesh dependent. The CM_2 being an SIMD machine, the communication time isdetermined by I. For a regular mesh composed of three-dimensional truss elements(d = 3) or 4-node plane elements (d = 2), every node is attached to 4 elements, so that 24communication instructions per time integration step are required for the truss element and 32 forthe 4-node plane element. However, Table TV indicates that the space station example exhibitsa longer communication time than does the aluminium plate problem. The reason is that in themesh of example E3, some truss elements are connected to 12 other elements. Because of theSIMD nature of the CM_2. the element with the highest degree of connectivity determines thecommunication time. For a regular mesh with 8-node solid elements (d = 3) each time integrationstep is followed by 192 communication steps. since each node can be attached up to eight differentelements. This is reflected in Table IV where example E2 is shown to posse~s by far the longestcommunication time (2180 sec). In summary, ;he amount of communication :nvolved in finiteelement explicit computations on the CM_2 is determined by the element topology and order,and the mesh irregularities. Because only d nodal information is exchanged at a time among thuCM_2 processors, three-dimensional and high order elements substantially increase the com-munication time. Mesh irregularities also adversely affect the amount of communication becauseof the SIMD nature of the CM_2. It is interesting to note that elements which transmit physicalinformation across edges and faces such as those proposed by De Veubeke,Zi would require muchless communication than traditional elements. The;e elements should be revisited for computa-tions on massively parallel processors such as the CM._2.
50 C. FARHAT. N SOBH AND K. C. PARK
An in-depth investigation of the communication phase was carred out. It was found that mostof the communication time was elapsed in the header of loop (8). This loop header involves thequantities start and stop which differ from one processor to another in the presence of meshirregularities and different element types. Consequently, the front-end computer has to processand manage several different loops rather than a unique one, which is not very efficient on anSIMD machine. The time associated with the headers of loops (7) and (8) is referred to as softwareoverhead in Table V. The true time that is elapsed in effective communication among theprocessors is shown to be only a fraction of the overall communication time (see Table V).
Because it was designed to handle arbitrary meshes, our C* code did not make use of theNEWS grid package. However, a special module that incorporated calls to the NEWS grid waswritten specifically for the regular mesh of example E2. Execution times for this example usingboth the NEWS grid and the router are shown in Table VI. Clearly, a high price is paid for thehandling of eventual mesh irregularities.
However, the irregular pattern of communication is fixed in time. Thus, a considerableimprovement can be achieved if this pattern is evaluated at the first time step, then somehowstored in the CM_2 for use during subsequent time steps. We believe that this is an issue thatmassively parallel computer architects should investigate.
2000 Time Integration Steps
4200- SECONDS
4000-
3800-
3600 _'__ Tot. w. OM
3400
3200.
3000.
2800.
2600 ,
2400 Tot. w. OM
200
1800
0 oo E ea-e
1200!
1000
600
400 :.m .
200
0El 82 3
Figure 20. The decomposer/mapper performance
TRANSIENT FINITE ELEMENT COMPUTATIONS 5 I
In order to assess the performance of the decomposer/mapper module, examples, El, E2 andE3 were re-run with the naive shifted identity mapping (element i in processor i - 1). Figure 20demonstrates that the true communication time can be reduced by as much as 60 per cent.Unfortunately, the total execution time is reduced only between 10 and 17 per cent because of thecommunication software overhead associated with mesh irregularities.
S. CONCLUDING REMARKS
We have reported herein on our experience in performing transient finite element computationson the CM_2. We have presented the architectural features of this parallel processor anddiscussed their impact on finite element computational strategies. In particular, those featureswhich distinguish the CM_2 from earlier hypercubes have been emphasized. These include thevirtual processor concept and the fast parallel I/O capabilities. The processor memory memorysize of 64 Kbits has been shown to penalize high order elements. We have also described anddiscussed a domain decomposition strategy and a mapping algorithm which are suitable formassively parallel processors such as the Connection Machine. The main idea behind thedecomposition technique is the minimization of both the amount of wire contention withina chip, and the amount of communications between different chips. A given finite element mesh ispartitioned into 16-element subdomains which correspond to the 16-processor chips of theConnection Machine. This partitioning is carried out in a way that minimizes the number ofnodes at the interface between the subdomains. As a result, only those processors which aremapped onto finite elements at the periphery of a subdomain communicate with processorspackaged on different chips. Moreover, this partitioning is such that the connectivity bandwidthof the resulting subdomains is large enough to allow an efficient use of the interchip wires. Themapping algorithm attempts to reduce the distance information has to travel through communi-cation network. In essence, the algorithm searches iteratively for an optimal mapping througha two-step minimization of the communication costs associated with a candidate mapping.Various issues related to the single instruction multiple data stream nature of the CM-2 andpertinent to computational mechanics have been addressed. Measured performance results forrealistic two- and three-dimensional transient problems have been reported. Three-dimensionaland high order elements have been shown to induce longer communication times. Meshirregalarities have been shown to slow down the computation speed in many ways. The DataVault has been demonstrated to be very effective at reducing the [/0 time.
Now, we briefly highlight some additional implementational and theoretical issues that wehope will materially advance the application ranges of finite element computations on this highlyparallel processor.
8.1. Virtual processor ratio vs. substructuring
In this work. we have assigned when possible more than one finite element to a single processorusing the virtual processor feature of the CM_2. However, another way to obtain the same resultis to assign a substructure to an individual processor."- ' - From a numerical point of view. bothapproaches are equivalent. However. theoe two distinct approaches differ in their implementa-tions and may perform differently. The substructure approach requires each processor to workwith both external and internal data structures. The set of external data structures storesinformatzon about substructure Interconnections. These are similar to the ones described in thispaper. The set of internal data structures stores the connectivity table of the elements within
/
52 C. FARHAT. N. SOBH AND K. C. PARK
a substructure. The computations within each substructure are carried out by looping over theelements of that substructure. The advantage of this approach is a saving in storage since thesubstructure internal nodes are uniquely defined, and a faster computation of the resultsassociated with these nodes. Moreover, the global results at the internal nodes can be accumu-lated without any explicit call to a message-passing function. The global quantities at theboundary nodes are accumulated using the router and the external data structures. However, thesubstructuring approach requires that the sequencer broadcast the same instruction severaltimes, once for each element of the substructure, which increases the overall wall clock executiontime. Moreover, this approach does not allow the Weitek chip to pipeline the computations overthe elements of the substructure.
On the other hand, the virtual processor approach requires that each element communicateexplicitly with its neighbours. even if these are assigned to the same processor. Of course, thiscommunication is virtual since it is within the processor itself and generates minimal additionaloverhead. On the positive side, the virtual processor approach utilizes only one type of datastructure and exploits the pipelining capabilitieis of the Weitek chip. The latter feature signific-antly enhances overal performance, as demonstrated in Section 3. Consequently, we advocate theuse of the virtual processor ratio rather than the substructuring technique, especially if theprocessor memory size is to be increased in the future.
8.2. Implicit algorithms and the CM_2
In this report, no attempt has been made to design a novel parallel algorithm for the solution ofthe differential equation of motion. We have selected the central difference algorithm because ofits inherent parallelism, which allowed us to focLus on implementational issues and to fully explorethe multiprocessing capabilities of the CM-2. Our experience suggests that a whole class ofexplicit and semi-implicit dynamic and static algorithms can be implemented on the CM_2 ina very similar way. Among others, we cite the EBE algorithms,2 2 the EBE preconditioners2 3 andthe Jacobi preconditioned conjugate gradient algorithm.24 However, the solution of some staticand transient problems may necessitate the use of an implicit algorithm, which usually implies thesolution of a set of simultaneous banded equations. If the global symmetric stiffness matrix K isbanded, with semi-bandwidth b, then it is well known (see for example Ortega and Voigt ' 5) thatGaussian elimination methods for solving Kd = F allow at each step on the order of b2,2 pairs of( +, x) to be processed concurrently, but require significant communication because the b entriesof the pivot column must be made available to all other processors. Several parallel algorithmsbased on these elimination methods were designed for finite element applications and wereimplementated on earlier hypercubes (see for example, Farhat and Wilson26 and Utku et al."7 ).Typically, a processor was assigned to a set of matrix columns. Results from our previousexperience with the early version of lntel's iPSC suggests that direct solvers are feasible onhypercubes only when the number of available processors, Np, is much smaller than thebandwidth h of the given finite element problem, so that communications do not dominatecomputations. On the iPSC-I, a message that was sent from one extreme corner of a 5-di-mensional cube to the other would result in an elapsed time 475 times longer than the time toperform a floating point multiplication (see Rudell.28 ). However. on a 10-dimensional subcube ofthe CM_2 we have measured the ratio of a broadcast to a floating point computation to be onlyabout 2.87. This observation suggests that. for problems with b > 360. a processor could bemapped onto a few matrix entries and a parallel direct solver could be feasible on the CM_2. Forproblems with smaller bandwidth, direct solvers which operate on more than one pivot ata time29-' o should also be investigated aor implementation on massively parallel Pocessors.
/./
/
TRANSIENT FINITE ELFMENT COMPUTATIONS 53
There is .. additional issue which has to be examined before attempting to solve finite elementequations on the CM_2 with a parallel direct solver. This issue is related to the balance onmassively parallel processors between the number of available processors, NP, and the processormemory size. Let M" denote a two-dimensional regular n by n finite element mesh, where n is thenumber of elements along one side. If d is the number of degrees of freedom at a given node. thesemi-bandwidth of M" is b = d(n + 3) and the total number of mathematical unknowns isN = d(n + 1)2. For this mesh. the storage cost of K amounts to Nb = d2 (n + 3)(n + 1)2 words.The total amount of storage available on the CM_2 is S = N,. rap, where NP is the number ofavailable processors and tnp = 8 Kbytes is the current size of the processor memory. LetNE = n23 be the maximum number of elements for which M' has a banded stiffness matrix thatcan be factored in-core on the CM_2. Table VII gives the values of NE for different values ofd and for the case of a fully configured Connection Machine (NtP = 65 536). Values of NE areshown for both single precision (32 bit words) and double precision (64 bit words) floating pointarithmetic.
Clearly, except for che case where d = 2 and floating point arithmetic is done in single precision.NE is smaller than N,,. Similarly, the case where Mn is an n by n three-dimensional regular mesh isassessed in Table VIII for various values of d.
For this case, NE is much smaller than NP, even for d = 2 and for single precision floating pointarithmetic. For d = 6 (some shell elements), only 8000 elements (4000 elements) can be included inM" when computations are carried out using single precision (double precision) floating pointarithmetic.
It is noted that the eventual solutiona v,- ,aytem of equations is only one phase of several finiteelement computational sequences. In linear three-dimensional analysis, this phase dominates thecomputer exection time. However, in the non-linear analysis of flexible space structures most ofthe computational time is usually spent in modules that perform element level computatons. 3 'These include the evaluation of generalized nodal internal forces andor elemental stiffnessmatrices. Consider now a mesh M' wher± the number of elements NE is chosen so that the upperpart of the banded stiffness matrix K fids the N, processor memories completely. The precedingcomplexity analysis demonstrates tt at the balance on the CM_2 between the number ofprocessors and the memo-v ' e of 2-ach processor is such that NE is much smaller than NP.Hence, if a direct algorithm i. ased to solve a finite element system of equations. the V;, processorswill be active during the solution phase. but V - E processors will remain idh: during the rest of
Table VII. Number of allowable elements vs. DOF/node for the two-dimensional case:VP = 65536 d2 d=3 d=4 d5 d6
Single precision NE 102400 59536 40401 29929 23409Double precision NE 64009 37249 25231 18769 14884
Table VIII. Number of allowable elements vs. DOF-node for the three-dimensional "ase
-V, = 65536 dd=3 d=4 d=5 d=6
Single precision NE 29791 19683 13824 10648 8000Double precision NE 19683 12167 9261 6859 4913
54 C. FARHAT. N. SOBH AND K. C. PARK
the phases which involve element level computations. Consequently, an in-core direct solutionstrategy would not efficiently utilize the computational power of the CM_2 in a highly non-linearfinite element analysis.
ACKNOWLEDGEMENTS
The authors wish to acknowledge the support by the Naval Research Laboratory (NRL) underGrant DOD N00014-87K-2018, with Dr Hank Dardy and Dr Louise Schuetz as technicalmonitors. The first author also acknowledges the support by the National Science Foundationunder grant ASC-8717773. Both CM_2s at NRL and at CAPP were utilized to develop thiswork. The parallel I/O experiments were done using the Data Vault system at NRL. Specialthanks to Bob Whaley (Thinking Machines Corporation and NRL), Eric Hoffman (NRL) andRoldan Pozo (CAPP).
REFERENCES
1. C. Farhat and E. Wilson. 'A new finite element concurrent computer program architecture'. Int.j. numer. methods eng.,24, 1771-1792 (1987).
2. G. A. Lyzenga. A. Raefsky and B. H. Hager. 'Finite elements and the method of conjugate gradient on concurrentprocessors'. CalTechIJPL Report C3P-119. California Institute of Technology. Pasadena. CA. 1984.
3. T. Belytschko and N. Gilbersten. 'Concurrent and vectorized mixed time. explicit nonlinear structural dynamicsalgorithms', in A. K. Noor (ed.), Parallel Computations and Their Impact on Mfechanics. Amencan Society ofMechanical Engineers. New York. 1987, pp. 279-290.
4. C. Farhat and L Crivelli. 'A general approach to nonlinear FE computations on shared memory multiprocessors'.Comp. Methods Appl. Mech. Engng, 72, 153-172 (1989).
5. M. Benten. C. Farhat and H. Jordan. 'The force for efficient multitasking on the CRAY senes of supermultiproces-sors', Proceedings of the Fourth International Symposium on Science and Engineering on CRA Y Supercomputers.Minneapolis. Minnesota. Oct. 12-14. 1988. pp. 389-406.
6. D. W. White and J. F. Abel. 'Bibliography on finite elements and supercomputing. Commun. appL. numer. methods, 4,279-294 (1988)
7. A. K. Noor. 'Parallel processing in finite element structural analysis', in A. K. Noor (ed.). Parallel Computations andTheir Impact on Mechanics. American Society of Mechanical Engineers. New York. 1987. pp. 253-277.
8. C. Farhat. 'Parallel computat:onal strategies for large space and aerospace flexible structures. Algorithms. implemen-tations and performance'. Proceedings. Supercompuuing in Engineering Structures, Obedech. Austna. IBM EuropeInstitute. July 11-15. 1988.
9. 0. McBryan. 'New architectures: Performance highlights and new algonthms. Parallel Comp.. 7. 477-499 (1988).10. J. L Gustafson. G. R. Montry and R. E Benner. 'Development of parallel methods for a 1024-processor hypercube'.
SIAM J. Sci. Statist. Comp.. 9. 609-638 (1988).11. R. D. Krime 'Unconditional stability in numencal integration methods'. J. Appl. Aech. .ISME. 40. 417-521 11973).
12. W. Daniel Hillis. The Connection Machine. MIT Press. Cambndge. Mass. 1987.13. Thinking Machines Corporation. 'Connection Machine model CM_2 technical summary'. Technical Report Series.
HAS?.4. 936.!4. C. Farhat. E. Wilson and G. Poell. 'Solution of finite element systems on concurrrt processing computers'. Eng.
Comp., _2 157-165 f 1987).15. Fox et a!.. Sol'inq Problems on Concurrent Processors. Prentice-Hall. Englewood Cliffs. N.J.. 19S8.16. C. Farhat. On the mapping of massively parallel processors onto finite element graphs'. Comp. Struct.. 32. 347-354
(1989).17. C. Farhat. 'A simple and efficient automatic FEM domain decomposer*. Comp. Struct.. 2-. 579-602 (1988).18 J G. Malone.'Automated mesh decomposition and concurrent finite element analy is far hypercube multipro cessors
computers'. Comp. Mlethods .4ppl. Mech. Enq,. 70. 27-58 (1988).19. T. Bclytschko. 'Oeriew of semidiscretization'. in T. Belvtschko and T. 1. R. Hughes (edsj. Computational Methods
for Transient An'q~lsis, North-Holland. 1983. pp. 1-63.20. Kuck ei a.. 'The effects of program restructuring, algorithm change, and architecture ,.hoice on program perform-
ance'. IEEE. C-27. 129-138 (1984).21. M. Geradin led.). B. M1. Fraejis De Veubeke Memorial Volume of Selected Paperv. Sijtholl & Noordhoff. 1980.22. T J. R. Hughes et al. 'Element-by element implicit algorithms for heat conduction'. J. Eng. Mech,. 109, 576-585
(1983).23 T I. R Hughes. R M. Fereiut. and J. 0. Hallquist. 'Large.scale ,cctorized implicit Lalculations in solid mechanics an
a Cray X-MP-43 utilizing EBE preconditioned conjugate gradients'. Camp. Methods Appl. Atech. Enq. 61. 215-248(1987).
TRANSIENT FINITE ELEMENT COMPUTATIONS 55
24. G. H. Gloub and C. F. Van Loan. Matrix C~omputations. The John Hopkins University Press. 1983.25. J. M. Ortega and R. G. Voigt. Solution of partial differential equations on vector and parallel computers'.SIAM Re.,
27, 149-240 (1985).26. C. Farlb- and 2. Wilson. 'A parallel active column equation solver'. Comp. Struct.. 28. 289-304 (1988).27. S. Utku. M. Salama and R. Melosh. 'Concurrent factorization of positive definite banded Hermitian matrices'. Int. j.
numer. methods eng., 23. 2137-2152 (1986).29. R. Rudell. 'Parallel processing efficiency on the hypercubc. CS252 Project Report. The University of California at
Berkeley. 1985.29. G. Aaghband and H. F. Jordan. 'Multiprocessor sparse L#/U decomposition with controlled fill-in'. ICASE Report
No. 85-48. NASA Langley Research Center. Hampton. Virginia 23665. 1985.30. F. J. Peters. 'Parallel pivoting algorithms for sparse symmetric matrices'. Parallel Camp.. 1. 99-110 (1984).31. N. F. Knight et al., 'CSM testbed development and large scale structural applications'. Proceedings of the 4th
International Symposium. Minneapolis. Minnesota. 1988. pp. 359-388.
0 .
o2 o.E
Cl)u
Cq C0 0
oo z-~ C) .. ~bO-v C .~ v .X- 1-
cq V
CoO C %) - C
CPA vr. 0 0 -
-G 00&0
bo to-
C.~S .0 oCI zts C; U; OC
ce= It 0--C 0 c
.bo Z.40 CJ
> V
0 0 100C C U
C0 0. Z
0
'-0 -tz
C4)
w o go C1: C)** C) C)
.- - -o ,ze0 0 o E a t t
I;: -z
C) EO C - Z;> '.
U- r- 01' C .!Z x ue
-n r- -. C)-
C) ~ ~ c X.C C'QC C I --O6
E-C q e o. x) ~ .-
0) C)~
C, C~lC, ,
CO 0 >
a'. - V -- '7 a
bC. u - It -= - .:I a' Q-CQC 0 .~ I
QCC EU~C lii 5C) ~ -
wfl . ; r--C C --- E +)*0j C- j W w
a.C o V~*-o C Cu E .. aCto
- -C --
C U > 0 0- b
> c ~ U
-Z 0 C) 0) .C )
to )C C CCC: Sr ;s) f: .C) C) C: >, 0
'A - - = v~
H~U Un <C
0 >,) >C C ~ , C C
E -- E-4cc 40
aC~ -) Q .2 u -
I*. x C' ~ . . c C
o = M x'a - =>- E o j t
to 0 r- 0
0 f7 -2
.2 uoo aC
5 C, A-
c_ to. ch c:
I a -~ -= a - 2A
CO .2 - oa Z i
___ __ a - a u C:U-2__ 0- r- 0 >.
_____~~ W~~~a ~
bo r u' c i f a~
aaa Ex e- ci
0.. ........ . 2 !! b-.-r± .
C.,C -. C
_j C,
U 0
oC~~~C C:~ a--U -0Q
, C 0 f:
C~~~ 00' 0
0 >0c c 0 e. - ~~~~ r. J'.. C.1C I - ~ ~ .C cac.~~~~~',~ ri.U ~ a- - ~ ~ ~ .
um -Z -
t: r a= a - a0 r 0j to to 0A,.."C x~-
UU 0 0 A
0' e-.~ E a- a 00 lo E- .- a -0. u. - Z; .-- a
0- u. A--u 0- E se -c > 0 . 0 -
'IV =- -i! - 0-= a a ":
-; > ~ aa t
z . .CC a v...
CEa
j o r
0 x___A__
E 0 L-
a.=a
Z-5-~-----V'- _______
4 0____
0 a= c
r~ - 0
Ic Al I .- C o- 0
cU2 vr rj ur CO
.i co aC
o r- C- - - u
LO r. I) U)~) >1 0 L. 4)CD x ..
Ca~cio 0 -- A 2 0
M, =)C o2E~ M-~ C) u - Ez 0e
c 0 w0 w 'J v o E -
w,* -
C;~~~~ W *s(o ' aV0) cc o .. =
0o 0 0 0 00 i .
0 -0MI qC) tn..
r- ci0 v 0 bo 0
0U m) CCOi C 0 bO' J 0 0 .
w ) -E~E O. I..~' ..
-) =. mw ECL)a -. SC) .- en > u.
w ~ ~ U ca0 -
m - r.~ c 000X rM0. 0 la' )E 0
- C) 00
C ot
.v > . 'a E5 O U 0 -' r.,
C 5 0 CD .-- Z 5
0 ra vto o , C)-
0 C) E-1 e) v ecc 0 uEa0 'aO m. CUU 0
00 1. tU. r
3 * .C t' c - t Cm) Nou0- ) o-
C.5~0
* - E 3 .0..~C
CU > 0 0U c
CU ci.CU )3'a U0C C)C 0 0C) rU- >
0 0, O~ 0 0~.3 ~ ~
0 '4;
in r- a)UU 2~* .C. 0 0 G) u
C...ci. C. Q. di > - -- 0 -
CA 0 co- to
cc ebO ccaC C) 07U - , ) 0 0 0uC 0 C
_0. u to to~ 00W C
-~ CIO3c C ~0 o C
> 2 rd ccv 0
0 ~coa 0a)) ~ E
Q).2 5 C) KC av.a C*f C) P.E
IM)C 'aE o 0; CC to 0CC-i In r , -'o L.- t
m~ r. C) 0a- C)I' 0 0 uc2
M La -0 E. -) 0, oEE Or- C). -- ~ )C
cc . ~ 0 C.a- -0 -0 0'--4!'±a C C)a0bC CUa:. u -
o.E~~ 02
a~.2 o-. L - .2 0
La Q,- Q-t C)C) acS 0 W ~E-
r- C= ri M 0C)X o
-- C* n
- -a - . 0- -. c .. 'C' C a. .. o CP-4 1 0 to~ -, E
ed 0 r-~~L % Ea~ O 0
- ro E- cc g V ca
aa Uo -
0C )0 E0 CJ 0L C
00
0 r-
UV : 0 ' . !
Q) -,, C" Co0 W:3M
07 C)QU .7 0u c)E >C, C
bO Xa WT o Cd l
E~ "C CCW
-C)lx a~'
E L
cc 2 ~-C
~C)>.
0CLa to
- 01 bO bCO W C4aIi
0C C rn -Z
0 -E -= u ~~ o --
E- 'o to -c 0 00~~ r- r--C-t
02r> 0
Ci .- 2 0. E S ~ - 0- j
0'- =C. 00~ >,c W -- Ci E0C=- ci CO -
a.) to - 2
.- r 0c - "Nw 0 " " -Z 0 0~C 0 S - =-
ccc oo E 03
0~~~e 0 Ci) -~
u. co- E 0o r- Ci. 0 E0i C 0C O ~ 2 O en30C
(3w -Ci CiOX, E00 .C ~ ~ Cv gi CJ E i r 7 . 2 Q 0C 0
ui.O cc 0 u r- M W-0.
0, ocior.
;5 0 0 r- 0 Ecic 0 ejE c 02 -Ci
0 CZ~C o *-- 'L -n 0 v,-r00
oc0 0... 0 r- .oC C
5-0 Q'Ci. I-,' -- ~ _ 0 V 00. m -j~ r-- >W ~
vsi C'b50 0 -X --
c2 = 02 EE Q c-
000 .5. -a E -0 c-~ E-4i~ 02 0 0
0~ !2~ o u 4
es (3C 0 t 0 0 0 c0C0 c0 ~ ~~ ~ ~ ~ Ci2 _C c-0" .0C
0.oC 0o~c c 00. M 0"Z- u 0 Ci
-O bO " ZC -M U0 - E5--
-J -0 =-i0
0i 'd 0.2 r0~ U O..c025
C, ci.0~ 00 .- ui c i
go 0b: - -5 0.
-) 00 , ;: - oC .0 00 o
0 Go C) w WO 0 i0 ~025- 00 CA.2 0 r- r- ri 0 C( Ci.0) = *-.
m.2 00 E2. Z~ C6 0 5) Ci-MEE , u 0 r:2 0 Ia. o i5C
00 2 E a0000.2 coi cc E~O L'0 2C i2.2--coi 5- ur- 00 0 E 6 - -
02*e ed 0 0.-'. 5-5- o- i0
0 -
-. ~~ =2 r- 0 =32 ~ 0.i U --5c,( 0 E 0i C Ci .=0 I 3
-i
0 i (3 . ~>~C 0 ol 0-5 a V4C
0( C C Z0 >, Q. 55> t- C)
ccoE' 2 a2 5 ~ C~0 0 . 0 c~ 0 2
00
Ci~~~~~~~ .. ii( .s0 5C 2
-0 V~O~ bO - Z
-~. C-3.~-. - co : 0 0
t- - 00: 00
E. -2 v 4 t .:5t .
? to - r- LO
, o~ 0 03--
Ci t.~- C%% r , =E~ ~0~~~~5 0 .r
a i a Ci M o Ci
Ci ue -oo 0 euCiiC eq 11 C, 1. :
0 Ci u Q o
co ~ ~ co< E> ~ C4Cc
o~~E M ui..1 0
=~~ C)~u >9-. * -u~DO. 0 cn 0 ~ - i - -
0 ,r =C E0 ~a o' CSUto ,a- . 0 0 cc rZ 9: <. a *)u
r.b U 5, W r- c:> E t3 E-8i a O ~ C, CC .=
E-~ ~ ~ = dc
C 'a ~~ j -0 4 , %
- 0i 0 cv~C~ O0. D -CC
0 - .2 -o < x
7,~ c !CC
X~ E $. F .
I- c - v -I: c
0 S-c
Fo o~ .2O vC toCZ -. a . - u C1*
an 0 all - 'eZ'
mF- E L E tc c
7. C . -d
w =U
~< E c wL!
E ~ 0 Z Z
o - c E
- C - '- E o~z Z5C
vo~
too ~ -=,C it v -r c 7 = v -:
'Z -- E-! to z R -- -,5
.To t to -3 Eo
-to- to r4t! ~ Sott~t c
cc 4;c -
- 't
0oo to 0 t
041. -oo'tt Ci'~ ~ t Zc I
"Z 'c E vi 0 I 0 3
to c3
0 -D--
to~ to
;,to - E c c =.
7-r,~ t- to-Zt 41- *.! I %1~ -
o co -t u)
o
IC~~~- 2t S! -1 E o*I1t.io.-io toit
c1 - c.
to~41v4 cc
o toZ ~ o...0to - - ..- tv -
-, E- - z- -c
to~~~ .51to o-
ca to
be - t
to to
-: to-o_ .: 4 to ; : = : - - r_. to- 7
-0 tot -oo -t= .*~
to =u t c:o 4
-l C-- t !ooo -
0to - .to0 CIC t~ "i0C~~ ~ ~-=*~ -c
('2 (0 . - . 0(
- .-- - ' '4 *0.4,O .. -
C'1I -
.0C c - D
.~ - --- vv-.n
o - CD Z'J r
U. ..
CI,,. -.
.401 "a MVI.
- -l C . 4*4I. - .0 i
- ~ ~ C '4, - - o ' 4. *NJ
o , - .
- - ~cC 4
- cI c >
- C-
43 0 .04-0
0 0 .~3
.0 z e-.. 0 C)C.Cc 0 .
- .4 0 C c -
'0 0' r , -
-0- ~ ~ ~ C a, : ~ ~ 0 0
co 54, E ~ 0
r.: cc~mr3 -0 > .C
c.' ~%
C *
-C.
- ~ ~ t C.C .o-
a~ (b a- - E E. -m-.0 ~ ~ ~ ~ ~ 1) CZ- . - ~ 4 C
c. C. C~ C. C4 C. C. C, C C't,
r~ +I.-- -I C
-0-
r...
+3 +
- C- se
0*
s -a i-:>. Q c~C++
*...C*I ICI- a V
I,-~~~~~~~~ Z.~ir -~ IC - * C .--
4.) -Xr - - ~ - .0 0
- 4
C4 ig c.. I ~ CC :. ,~~~ 33 c 5
03 9 a do 0
2~ CC.~
-7 a r7~ O~
E 2- 0
0. - .Z.2 2-
0 C 0
2 r- o -bOC
O~0C.C - - ~ -t~".
0C.C=,0 o 2 '-~~ :! -a c--
a-0 C. E
>c C CC AC oo4,c 08 Iw =
.2 :a y 0!0C
C.*
r- I;0 0" t. .. 0 -_
-ck 0 aC Wo C 0 1 0
-oE z.~ -z
c .e- llzi L-E Ir e .- r.) I ' 2E.:
r-00 r- E .02 0 - V.- !. 0-7 5t
Or 0-. r c- .
ac.. z.c ,C =~ C =00 -0 B a 7
- Cr- 5-
>0 r 0 -
- -i 0 r-- a 00 Z
c. 0
C- C
ro 2.-
0d 0me' -~.~ - 0 - C.)
-z ap E 0.
-a 'F C I
0 .
.2 3 >CZ
______ x
0.0 .0 >EE~
r c r '. --
~c* t2 0 Lm-
Z- EC.-.~~e E~C 0
> ~ .. -.
AI r- v, > c, 5a 00 rux
, c - .I. . ; .x
"2C C C 0 X c
C.. c- M. o- -.
c .-
'. % 6
c~d _
~'. ~ C 5, 5 ~ - - - a 5
C, C. r_ - -
C a 4C
6- Z0 r cz. -1 a ~ - -
-W 7 'C. a.- C
a c, a
C C= z .. -9 a
*~~~r .,,o - .*:' =, 5,a---i r.-. 5 Z; 7
2.c- -
N a . a a a . ~4*;. c.5- , m .2 -a - a c
be- w
fi -
c; 5, tC-.
-~: vo o te' s %~ - a~*
-z 4r . ~- --- 9 =--. c.
~'Cr-5 4 cc -
- -- 5C a 5, , 5 ** 3 *:-a -=- ' ~CC -a - =.. .c4 ;E
E.~ -C r- E- c6 E- * C:SC C.~C ~ a 5 ,
C Q , ~ 0'
aa.. ., - cc
C. wC Z5s '= . , a or'5W; a E- d"
oCC e; Ci -- :-o C W
": S -a C- -;E E ca - c 3 c FC -
X a 5 '
c *c aE S Z a
+ 2 aac ~ 'a
a. ; . uar
r- E .5 .*-a w = ; C:.J "
r. . _, o-C - ,, !!"- - 4cI a. a
aaC
5-a5 EA- oC a'n,a 2' . E 5.fl C r. . a 5 C
rl t: - o i C a a
*0 -cc C.,-
SO E -Za-'4S .- I. T
0*beE Vo
C -c 0o
c.IC -4 - 4
3*0 0e c
vi ri c.~ o -C ~- r- z - z -4 - r.
I. c- c-C 0U 4. ct" " -~0 '
Z. E z, o' t
41. Ir a 0 t-
C0 0C-' c !-c re
-0 3
-~~~~-C E1 E44*114C~ E."~4 ~C612 f-E-v
2. -00 .
-W 0 c - - -Z'4 -- C1.0-
a'. 2 oz-~C o.4 ~ :z0 *
* ~ ~~~~ --.. 20 x.
.~ 0 0V-6o'
C, 0
-~ ~ a, cc -' .'b-C m 00 0. . . ~ "
E -. ,~-~ - = 0C-~~ - t ZZ0 E 1 .E 3 c-
II =. . -2. a W, e r 0g .
27 g~ %~ .~C4~ E
0- o E r- c
-U 0~S O 5±415 Ee~ ~ 9 .2
o~ ccC ~- -- --- - 2
0 - - =
C.
--. - T c
-" zc 0c
~~,~Br 1.CA2 z C 5
-a .. , z~c.
- - . c 30.2 ~ -~- . E*~~~ ~~~ E. ~ CC.- -- C
-. * .. C C Ec
- C; "
-~ <S
- or-
C2~~ 0 ~0CC_ C -
be-: -s
-~r -Z E
ro- -
r- -2 -0c c- o- ro s.-- .2 w 9
r .2 ;-!. : . r- r -
2- c.-- -- .
0 rj .'2a
72~ - Z .93C- C
-CC0.0 - -a s -
LC ci C-CZ .2.*~C.r-~~~ r-V 2C -- C-0 rzC
-- C ~ ~ o ~ -
be -4. . Z 04 40x o
44 o
r. cn g a-
a = cn r- : -
4,Z0 0
.00
'~~~~'E 22 E.~.44 4 - 4 . ag -0 EC4, 0 4'04 E
to E.0 C -r. v4- - 0~~40
-a, 4) - v E a4, C -0a
E -.- -ar '?.~ -r '. r, 4
1 4 ).0 . )
-0 In4
o o s v -Z a
0 t.40 x 5.~u x2
.a0 r
E~ -. , _ 4
_z r 0 2. two 44
0'.04 0 t-.0 >04
0~~ o4 4
V) r7; > or- N E o
o7 2
-r' c .ro' a...~ + " i
(I N .da4 ~ . ' ~ .
J. 3t a .e .
r4. Ea -0 -S. -F >- .12o o4
o +
J!f o" 0 .
- a +r 0 5 '
> Z o a4
or I, .E~ G44~
X
Iv v
0. P" * -0 .~o-~ .0 0
.2~,U~ s C a
E. E . ~ o 0tm- 2-' E ~E.2
% E
-= =o -r =a0
E. 0 o w m~ C aC
ow -0 - 0 0&5 V. o >C-13
0 .0 E .aC.a
ID 0 a . 'dir -
0 r E 6
0 r o-0 0
> IL.L w. -= --
lA)- 00 TE0 . -CC oU m~
Ln W-- m. o N
aC 0
0 0
0 00 0 0
- = t = 00
bl A C, -o4C E ZI v LNV>U1Z1 :3ip* 0Yorn o c1 4 e c
'S E
S.s L; k. 4, ta~20~ EC r~~ 'o 00 0.
E= o -0 C En '4 - 0
0 En E>~ toE~ 00 - ±
S. v- .4 10.-0 0~.. . ~ -.. O -c. . 0 > o 'E 4 -0 -
r 0 .2.00 2. ~ ct
-U =j 00.0CO HC: A.~ Z0 r. -- -; 0 7~ (n~ 0 0
E v 5 -t r C2 in ;a xO -W . E5Co~~~~~~ o '.0 C0-...0)0 ~ 0 c .~ .r
W r 0 : 02~ .0 r 0.,~ .
0. r~ v .c > E r) S r_o0 -a E 0 . .0 r (n c -
0 - F2~ 0 __ U .~ e 2.. .
0. E40 o. 00E C
.E 0 00 WE r Z E.
ro~' r 2 >0 -a 00 aEorj- , - . 0. os r . 1 '6 5 2i o
'00 Z00
r,> :D ca A E > az ~ ~ ~ ~ U. 0-. 2 5oc
r0 .0. C* to4 -. E-V wc. 2n c~ '5 . Z
Go .0C 0 * .~E Z
0.22 o~0> 0 0
0 - C4 09r - o 4c m
t C40
Au v 0 0 . 05 >' 4i. C6 C
o 0 C 'r >o to a. EC 0 .. co U) 0)
.0 0-.0. rn oc E ~ 2~. 0.0 r00 Co2 u6 -0 F0
owt 1X0 0 z~ 2.~ .'! E~ .1 -
E-. - 0c .E= 'o
2~! H~0C 0 V. u co "
or0 , 2 0 = E
7E0 *Z02 0 2 2 .E:? n "- .. - .- 9: or 0 0
.0~~r E2 2u u v2 , 0C C 0
t- -t .0 5 -- -
05 r - -. . 0 .
:4 z 2; I :? 2 _0.r
00 02>. 20 0 .r -C 5u 0 02. '
W0- 0 CL t vO 0 :-0.o .0-C 0 ~ 0
... CC.0.. .0 0 CoOs.. B0r A< C)000 0 r)' 4 0 'C . ~ ~ -
.0 0.0 0E g.. -o .C0
= 0 -A -' Z0 4 X0 .V
C0.0 CC) -. o 4'c A 0-0
.~~ .0.-22o.0 * 0 20 10 () r~~ - .
r , 2 ~oo- o. -- E .1 .0.1 2C 0 C tf) to2 0 73 .)C rCC) A0.
Z"S). 0 0 E0C t m ' g '8C 0 :T t.
c0 00. A E -
0 >~ 0~ c, C
~0. 0
IS -. S!
o 4 .0 "0 c 0
Q C0 . a ~ ,. .
0 0 < 044 0
04) r) 3Z4
cn r~0 -
00 5 r,, ).0E
*b 4 *00'= .;C9
.00 4)>2
E5'
. au ~4 C'7 0
0a <
a. C6 r -0 C
-) -C z ov4'i C4 440 V;.)
.. b &r~ 2 as .--o co c )- 't '
ZV. Q ~ E) -Z0) i
4) 4)~0 - E= >
a w ~ -a g. r) -2
u) > ic cco5 o. ul
E. Enr .
E~ r±4 44 ±o: a.. v' 4
o a. ~ *- 00 ~ 4
E < in ~ 0~ 4~~4) -c r. .
--. = s' *0z '7 .4 -C ( r
00 u. -< .0 L c ~ S)
o' A~ aaoc a
4- ro .' o.. f
4)= zo z4~ r- lo. c- -t-- E4 -
r- L" o o m .4.r
c o " S r-o00 g~.. -: !0 - -
.4a -~ *.X= )...0 4.4
c5 t~0 64 4' (34) w
~~~~~~~~ r < E ) ~ ~ b... 44o) r. 0
A be 0 < 00 0
o.~44 -A cc c
b -. .z 1.' (05 z r- r0 r-E2 E v i w )W i
on. cz c40 .
.) ta c6 c;~
0- 0 >.
.- 4C) bo 00 4
.00 r- a e
Cc 0X w 1ov0 4
to C) + X0C ** 0
00 0
'Cc M>, .- IX0u - 00C0 - r- .O .
oc Zw aX - ~1
+ c- co 5 .2N...
05 -u to~
('40 co)1~03 0e~ - ~C ~ C ..
04)~~ - Cc Cc-0 ~*)~- .
,0 0 10 tm
m .2Ix 1,, 0 -
V. 0 I ~0 40 r 0 Ej0 l
04 0 C 00C1
*e it 10 0 0)
to~o. 4 0. 0
Ol 2 ' ce- dz4) 10 IX h- >C)c ) boC
-. -0 o mr11 . E~ o0) Ec
10 es E ~ 0,-,d 0. Cc + .) 4)4) 11 -, ) ~ )
10P >0 UC). X ).ca -- v) 4 .C
; ie..- . wV o - r1I0oC6"
> to 10 ed~4 ~~o4 0C
o1CC 10 C, C) r- Q.O - Q
ID -. 04 E : 0 co.24) (r -c- - [1.~~ 0. v 2.- t
a- -a
.±.2
0 C,0 C) . .0
c ~~- C)~)) 0~; 0w cC).2m U.) 10 * ~
M2 .0 i-C) v)))4 w.. )~... -- (01.0 1 e
IM 1 co~ 4 ~ . o
C) r))-
0'
ri -.. O1.0 .0 0 -. -=2E0 V 0 x
20 A 0-0 7F Ow00 C= cI-
.In cu o'"U)4
W, >20C 0o 1
(0 C, O CJ w to
0 . Lo to) 0 ~0 C
-- 0- E. q. .CU qu -c Ci -I Cc.~0 o2~ ~ og~
) bo 0
O~C)(Cv* J *C C)
=- 0- 0C 24)-Cr- C3
Cc 0 a E- o 0~~ E. .0 XV00'0
0 ) V.00J to0 C 0 )4)C
4
co W ,a -c C)-ho R -,d co4)(
Wd -0 'd c'. U
be c~0 0 0 ~ ~ E -0 U2 . -r~ ~C Mr. .2d
'd '" /Ed - ,d ' w ta..
Ed .2 -V c z
EU d0 Cd0 - ~ . E ~ d~
olW0 v .2 C3 0.>d' %E 'a. a .- r 0
_.0 Ed 0 C 5C 6d z ESdc
.2 - 'd 2.. '4 _ v .2 E
ci>,~~ ~ 0 E0 V* E Ed Ed r
vd~a itC) 0*C . Ed c ~
CO0 E EC~ c.o.CI Edv ' d d
Clb 0. 00 "- Ed >U1~ . d 0
tqjE c.E Ed~ w.0 'ak.0... m
ac~d W r - - 10dC E c - O
0~~E .'0 or-. .2 E: ft 00 ae~ 'V -W~0 0
0 0.o~ o;~ c;C0 0 A2 00
o, - Z ' 0 31 **
00 L, U'1E Z 15n3 Cwma d0 04 .. 0
-l t.~ 0a - . ~ - ~ -
* ~Cd 0.* 2 z- cc -
>0 0 0- Ii a. "0 -6 - Eds;
C) L, 2-E C- s
Ed E _ m k0 - = -.
410
a. > z C,410 be E4do
- "o r4 c c E d 0,0 9 a ~ E
ed m -- Ed d .~
V~~ ~ ~~ Edz7 - 0(0 0 c v o wc
I=4A I ow w l . Is
- ; S C, ll00a C:
C, Eda~ C~ ;. 0 )~ -
-
. U2 -0 " - - C- A! a a C. co
_e <_2CR .4
4)~ E a ; z = A%1
-, ar2> +: 2 0.I.Ed 0-~
to 40EdO o Z
oo oaaE .5 (0 -,X
I 0 e . 0.,
>da.U 0 , 4 E '0 W .. ~IC oo-d0
0 2 -0 ~ d L_ 9 -((
0.0 00 - 410 -C .0 ~ U; i
bd'-e d r- - E
0.0 .2 E
w 0 ) 0 0 .e) ct W VJ~
~~ ~ ~ ~ 0 C 0 0E
4k 0 E.
.X >2 . .2 804 0. 0.. 2
0 .2 r. .0- 4;> o lo m0.- -+ 0 0: CO 14
00 C. -1.ou.
;t) d ~ d> .u 4 0 (x W 1-1 .