High Performance Computing in Computational Mechanics
Kazuo Kashiyama
Department of Civil Engineering, Chuo University, Tokyo, Japan
Outline
Brief History of Parallel ComputingParallel Computing Method for Large Scale Problems
(Environmental Flow, Composite Materials)PC Cluster Parallel Compting
Why do we need parallel computer?
Demand of solution for complex problems in science Grand Challenge Problems ・Turbulence flow ・Air pollution ・Ocean modeling ・Digital anatomy ・Cosmology :Development of computer hardware circuit element vacuum tube → transistor →IC→VLSI
Limitation of Single Processor light speed in a vacuum(3×10**8m/sec)
Flops1Tflops
First Parallel Computer1972
scalervector
multi-processor
Massive parallel
vacuum tube transistermicro processor
40Tflops(2001)10Tflops(2000)
Performance of Supercomputer
Brief History of Parallel Computer
L.F.Richardson(U.K.)1911: presented a numerical method for
non-linear partial differential equations1922: presented a paper “Weather Prediction by Numerical
Processes”3D analysis (5 layers for vertical direction)6 weeks calculation needed for 6 hours prediction by
manual calculating machine
Dream of RichardsonNorthern hemisphere are discretized by 2000 blocks32 people are assigned in each block (64,000 people are needed)6 hours prediction carried out by 3 hours
Dream of Richardson
Birth of Parallel ComputerUniversity of Illinois(Daniel Slotnic designed two parallel computers)1972: First parallel computer ILLIAC Ⅳ(Burroughs)was developed (64 processing element, 1 control unit:SIMD)
Development of Parallel ComputerIn Japan
1977: PACS/PAX project was startedTsutomu Hoshino(Kyoto University)・PACS-9(1978, 0.01Mflops)・PAX-32(1980, 0.5Mflops)
1980: PACS/PAX project was moved to Tsukuba University・PAX-128(1983, 4M)・PAX-64J(1986, 3.2Mflops)・QCDPAX(1989, 14Gflops)・CP-PACS(1996, 300Gflops) (1997, 600Gflops:2048CPU)http://www.rccp.tsukuba.ac.jp/
Big Projects in Computer ScienceU.S.A.・CIC R&D(Computing, Information and Communications R&D Program)ASCI(Accelerated Strategic Computing Initiative) projectWhite:10Tflops(2000)Turquoise:30Tflops(2002)
Japan・「Earth Simulator」project(Ministry of Science and Technology) Peak performance:40Tflops(2001) Memory:10TB,
Development cost:¥40 billion・「Computer Science」project (Ministry of Education) Support for the development of parallel computer in university CP-PACS(Tsukuba University),GRAPE(University of Tokyo)
Two Currents for Parallel Computing
Computing using Business Parallel Computer:Very Large Scale ComputingExpensive
Computing using PC/WS Cluster:Mediam-Large Scale ComputingCheap&Flexible
Hitachi SR2201(University of Tokyo)
PC Cluster(University of Heidelberg)
Ise-Bay Typhoon (1959)
Power:929hPaNumber of dead person:5098
Damage by Ise-Bay Typhoon
Path of Ise-Bay Typhoon
Mesh Partitioning
(512 processors)
Finite Element Mesh
(elements:206,977,nodes:106,577)
Shallow Water Equations
@Ui@t
+ Uj@Ui@xj
+ g@ê
@xiÄ
@
@xj[Ah(
@Ui@xj
+@Uj@xi
)] +úb3ih + ê
Äús3ih + ê
= 0
where, :mean velocity
ê :water elevation
h :water depthg :gravity acceleration
Ah :horizontal eddy viscosity coefficient
@ê
@t+
@
@xi[(h+ê)Ui] = 0
Ui
úb3iús3i
:bottom shear stress
:surface shear stress
Comparison between computed and observed results at Nagoya
Speed-up ratio
Computational time for one PE
Computational time for N PEsSpeed-up ratio=
2D-mesh
3D-mesh
120 slices
Min. mesh size :0.001D
Span length 6D
Slice length:0.05D
Elements:7,089Nodes:7,213
Elements:851,760nodes:872,773
Span length 6D
Finite Element Mesh
Re=1000
・円柱周方向 96不等分割
・円柱半径方向 68不等分割
・円柱スパン方向 120等分割Min.Mesh size=0.001D
Finite element mesh around a circular cylinder
3D analysis2D analysis
Dimensionless Frequency
Pow
er S
pect
rum
CD
CLCD
F=0.259
0.0 0.5 1.0 1.5 2.0 2.5 3.0
10-6
10-3
100
103
F=1.26
F=0.763F=0.519
F=1.02
F=1.53
Pow
er S
pect
rum
Dimensionless Frequency
CL
CDF=0.229
F=0.443
0.0 0.5 1.0 1.5 2.0 2.5 3.010-12
10-9
10-6
10-3
100 F=0.229
Dimensionless Time
CD
, C
L
CL
CD
0 50 100 150 200 250-2
-1
0
1
2
Dimensionless Time
CD ,
CL
CL
CD
Exp.
T=61.0 T=85.0
0 50 100 150 200 250-2
-1
0
1
2
D im e n s io n le s s T im e
CD ,
CL
C L
C D
E x p .
T = 6 1 . 0 T = 8 5 . 0
0 5 0 1 0 0 1 5 0 2 0 0 2 5 0- 2
- 1
0
1
2
T=61
CD
CL
T=84
Parallel Finite Element Analysis of Free Surface Flows Using PC Cluster
Kazuo Kashiyama, Seizo Tanaka, Katsuyuki Sue and Masaaki SakurabaChuo University, Tokyo, Japan
Topics・Introduction・Governing Equations and Stabilized FEM・PC Cluster Parallel Computing・Numerical Examples
Sloshing of Rectangular Tank and Actual Dam・Conclusions
Introduction
over flow!!
earthquake
Sloshing problem
Purpose:Development of a useful numerical method to evaluate the safety for sloshing of tank and dam by earthquake
Present Approach:Navier-Stokes EquationALE-Stabilized FEMPC Cluster Parallel Computing
ö
í@ui@t
+ (ñuj)@ui@xjÄ fi
ìÄ @õij@xj
= 0
@ui@xi
= 0
õij = Äpéij + ñ
í@ui@xj
+@uj@xi
ì
ui = g on Äg
õijnj = hi on Äh
Governing Equations
Boundary Conditions
Zä
wi ö
í@ui@t
+ ñuj@ui@xjÄ fi
ìdä+
Zä
@wi@xj
õij dä+
Zä
qi@ui@xi
dä
+nelXe=1
Zäeúm
öñuk@wi@xk
+1
ö
@q
@xi
õÅöö
í@ui@t
+ (ñuj)@ui@xjÄ fi
ìÄ @õij@xj
õdäe
+nelXe=1
Zäeúc@wi@xi
ö@ui@xi
däe
=
ZÄh
wiõijnj dÄh
úm =
"í2
Åt
ì 2
+
í2jjuijjhe
ì 2
+
í4ñ
h2e
ì 2#Ä 1
2
úc =h
2jjuijjò(Ree)
Stabilized FEM(SUPG/PSPG)
where
(M + Mé)@ui@t
+ (K(ñuj) + Ké(ñuj))ui
Ä (CÄCé)1
öp+óSui = (N + Né) fi
CT ui + M"@ui@t
+ K"(ñuj )ui
Ä N"fi + C"1
öp = 0
24 êM+Mé
Å t
ëÄ (CÄCé)ê
CT + M"Å t
ëC"
35" un+1i
1öp
n+1
#=
"bnidni
#
Finite Element Equations
@H
@t= ñus
n1
n3+ ñvs
n2
n3+ ws
@2û
@x2i
= 0
û = ÅH on Äm
û = 0 on Äf
Äm
Äf
Rezoning and Remeshing
Bi-CGSTAB Method
A x = b
r 0 = b Ä A x 0 = b ÄXe
A (e ) x 0| { z }2ç
p 0 = r 0
q k = A p k =Xe
A ( e )p k| { z }2ç
ãk = (r 0 ; r k )| { z }1ç
= ( r0 ; q k )| { z }1ç
tk = r k Ä ãk q k
sk = Atk =Xe
A(e)tk| {z }2ç
êk = (sk; tk)| {z }1ç
= (sk; sk)| {z }1ç
xk+1 = xk + ãkpk +êktk
rk+1 = tk Ä êkqk
åk = ãk=êk Å(r0; rk+1)| {z }1ç
= (r0; rk)| {z }1ç
pk+1 = rk+1 + åk (pk Äêkqk)
Initialization
Iteration
①:Global communication, ②:Neighboring communication
Linux(GatewayPC)
Linux
Network of CML
HUB
Network of Biowulf
Network Configuration
Hub
Network Switch Hub100Base-Tx ○
100Base-Tx ×
10Base-T ×
P CCPU PentiumII 400MHz
Cache 512KBRAM 512MB
NICDEC DC21x4x PCI
(10/100Base-Tx)
・Hardware
OS Linux-2.0.34Comm. Library MPICH1.1.1
・Software
Development of Parallel Computer
MPICH-A Portable Implementation of MPI(http://www-unix.mcs.anl.gov/mpi/mpich/index.html)
0.10 m0.50 m
1.00 mY
X
Z
Numerical Examples
・Sloshing analysis of rectangular tank and actual dam・PC cluster parallel computing
0.10 m0.50 m
1.00 mY
X
Z
fx = A sin!t
(4,305 nodes 19,200 elements)
Numerical Example(1)
(A=0.0093m,ω=5.311rad/sec)
waterA
0 10 20 30 40 50
0.4
0.6
0.8
1
time [s]
wat
er e
leva
tion
[m
]
Time History of Water Elevation at Point A
0 2 4 6 8 10 12
0.4
0.6
0.8
1
1.2
time [s]
wat
er e
leva
tion
[m
] Exp. (Okamoto) Present method
Comparison of Water Elevation
Time = 6.0 [s]
Time = 12.0 [s]
Time = 9.0 [s]Time = 9.0 [s]
Time = 15.0 [s]
Time = 6.0 [s]
Computed Results
Numerical Example(2)
Dam for pumped-storage power generation
0.0 200.0 400.0 600.0
0.0
280.0
X [m]
Y [m]
0.0 200.0 400.0 600.0
0.0
Z [m]
X [m]
0.0 200.0 400.0 600.0
0.0
280.0
X [m]
Y [m]
A
BC
D
Finite Element Model
bottom
free surface
(elements:109,314, nodes:22,610)
0 20 40 60 80 100
-0.05
0
0.05
Time [s]
X-velocity
[m/s
]
Earthquake Data (Input Ground Motion)Computed by Harada and Oosumi
0 20 40 60 80 100
28.8
29.0
29.2
29.4
Time [s]
Wat
er L
evel
[m
]
0 . 0 2 0 0 . 0 4 0 0 . 0 6 0 0 . 0
0 . 0
2 8 0 . 0
X [m]
Y [m]
A
BC
D
0 20 40 60 80 100
-0.05
0
0.05
Time [s]
X-velocity
[m/s
]
Computed Water Elevation at Point A
Computed Results
Accton 100Base-TX Switch Hub
Accton 100Base-TX
No Switch Hub
Alied Telesis 10Base-T
No Switch Hub
PE total time
comm.&wait
total time
comm.&wait
total time
comm.&wait
MESH (total number of nodes 22,610,total number of elements 109,314)
1 3331.3 0.0 3331.3 0.0 3331.3 0.0
2 1812.0 39.1 (2.2%)
1826.1 50.8 (2.8%)
1897.3 121.2 (6.4%)
4 961.1 71.7
(7.5%) 979.1 89.4
(9.1%) 1197.5 287.4
(24.0%)
8 462.4 25.3
(5.5%) 497.0
57.1 (11.5%)
1269.6 698.8
(55.0%)
Comparison of Network Environment
Performance of Parallel Computing
Accton100Base-TXSwitch Hub
RS 6000 / SP
PE total time comm.&wait total time comm.&wait
MESH (total number of nodes 22,610,total number of elements 109,314)
1 3331.3 0.0 2935.8 0.0
2 1812.0 39.1(2.2%)
1586.7 6.6(0.4%)
4 961.171.7
(7.5%)805.3
56.1(7.0%)
8 462.425.3
(5.5%)417.8
34.7(8.3%)
Comparison with RS6000/SP
Comparison with IBM RS6000/SP
Parallel Finite Element Analysis of Asphalt ConcreteUsing Image-Base Modeling
Kazuo Kashiyama, Takaaki Kakehi(Chuo University)Takashi Izumiya(Yachiyo Engineering Corporation)Tomoyuki Uo(Kajima Corporation)Kenjiro Terada(Tohoku University)
Outline
・Introduction・Governing Equation・Formulation of Homogenization・Image-Base Modeling Using X-ray CT・Parallel Implementation・Numerical Analysis・Conclusions
subgrade
base course
surface course
structure of pavement
Purpose of This Study
・A parallel finite element method based on the homogenizationtheory for the visco-elastic analysis of asphalt concrete is presented.
・The accurate configuration of microstructure is modeled by the digital image obtained by the X-ray CT.
aggregateasphalt mortor
xxx
y
yy
1 1
2 2
33
Ω
Γ
Γ ε
Y
u
t
ΓΓ
s
s
s
f
f
f
fluidsolid
Homogenization Method
solidElastic body
fluid
Newtonian fluids(Stokes flow)
Solid-fluid mixtures with periodic microstructure
Governing Equation
@õèij@xi
+öèñbj = 0 inäè
õèij(x) = bèijkh(x)"kh(uè) + cèijkh(x)"kh
í@uè
@t
ìbèijkh(x) =
(Eijkh(x) in äès13K
féijékh in äèf
cèijkh(x) =
(0 in äès2ñè
Äéikéjh Ä 1
3éijékhÅ
in äèf
Equilibrium equation:
Constitutive equation:
Principle equation of virtual work
bè(uè;!è) + cèí@uè
@t; !èì
=
ZÄt
ñt!èdÄ+
Zäèöèñb!èdx(
bè(uè;!è) =Räèb
èijkh(x)"ij(u
è)"kh(!è)dx
cè(uè;!è) =Räèc
èijkh(x)"ij(u
è)"kh(!è)dx
Two-scale asymptotic expansion
uè(x) = u0(x; t) +èu1(x;y; t)+è2u2(x;y; t)
+ ……… +ènun(x;y; t)
TOSCANER-23200 (Kumamoto Univ.)
Digital Image Processing for Asphalt Concrete
Scann type : Traverse/RotationPower of X-ray : 300kV/200kVNumber of detectors : 176 channelsSize of specimen : φ400mm×H600mmThickness of slice : 0.5mm,1mm,2mmSpacial resolusion : 0.2mm
Finite Element Model for Microstructure
Finite Element Model
Digital Image (2D) Digital Image(3D)
Microscopic Domain
Domain Decomposition
Parallel Computing Method based on Domain Decomposition Method
Microscopic Structure
1) Equalize the number of elements in each sub-domain2) Minimize the number of nodes on the boundary of sub-domain
Element by Element SCG Method for Parallel Computing
Ax=b
r0 = bÄAx0 = bÄXeA(e)x0 (9)
p0 = r0 (10)
qk = Apk =XeA(e)pk (11)
ãk = (rk;rk)=(pk;qk) (12)
xk+1 = xk +ãkpk (13)
rk+1 = rk Äãkqk (14)
åk = (rk+1;rk+1)=(rk;rk) (15)
pk+1 = rk+1 +åkpk (16)
①neighboring
communication
②global communication
(nodes 9537,elements 8192)(φ10cm , h 20cm ) (40mm×40mm×40mm)
(nodes68921,elements64000)
Macroscopic model Microscopic model
Solid
ν
E=61.0GPa=0.21
Fluid
K=10.0GPaμ=1.0GPas
Vf=49.1%
Material constants
Numerical Analysis
Macro-microscopic model
0 500 1000 1500
-6
-5
-4
-3
-2
-1
0
Time(s)
Axi
al s
tres
s(M
Pa)
Digital image model(Vf=14.6%) Idealized model(Vf=14.6%) Digital image model(Vf=49.1%) Idealized model(Vf=49.1%) Experiment
Time history of axial stress of the macroscopic
60s
(MPa)
1500s
(MPa) (MPa)
Macroscopic von Mises stress distribution(Vf=49.1%)
60s
(MPa) (MPa)
1500s
Microscopic von Mises stress distribution of the solid parts(Vf=49.1%)
60s
(MPa) (MPa)
1500s
Microscopic von Mises stress distribution of the fluid parts(Vf=49.1%)
10*10*10mm(1000pixels)20*20*20mm(8000pixels)30*30*30mm(27000pixels)40*40*40mm(64000pixels)
Effect of the region of unit cell
0 500 1000 1500
-6
-5
-4
-3
-2
-1
0
10*10*10mm 20*20*20mm 30*30*30mm 40*40*40mm Experiment
Time(s)
Axi
al s
tres
s(M
Pa)
Vf=49.1%
convergence
Time history of axial stress of macroscopic(Vf=49.1%)
2 4 6 850
60
70
80
90
100
Number of processors
Eff
icie
ncy
40*40*40elements 64*64*64elements 80*80*80elements
Vf = 49.1%
2 4 6 8
100
200
300
400
Number of processorsE
xe s
ize[
MB
]
40*40*40elements 64*64*64elements 80*80*80elements
Vf = 49.1%
Efficiency of Parallelization
Korakuen Campus(School of Science and Engineering)
Tama Campus(5 Schools of Liberal Arts)
Domain Decomposition Method
Greedy Algorithm
Farhat,C., A simple and efficient automatic FEM domain decomposer, Computer and Structures, Vol.28, pp.576-602