high performance computing in computational mechanicskikuchi/research/kashiyama0300.pdf ·...

Post on 25-Jul-2020

0 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

High Performance Computing in Computational Mechanics

Kazuo Kashiyama

Department of Civil Engineering, Chuo University, Tokyo, Japan

Outline

Brief History of Parallel ComputingParallel Computing Method for Large Scale Problems

(Environmental Flow, Composite Materials)PC Cluster Parallel Compting

Why do we need parallel computer?

Demand of solution for complex problems in science Grand Challenge Problems  ・Turbulence flow  ・Air pollution   ・Ocean modeling  ・Digital anatomy  ・Cosmology     :Development of computer hardware circuit element  vacuum tube → transistor →IC→VLSI

Limitation of Single Processor  light speed in a vacuum(3×10**8m/sec)

Flops1Tflops

First Parallel Computer1972

scalervector

multi-processor

Massive parallel

vacuum tube transistermicro processor

40Tflops(2001)10Tflops(2000)

Performance of Supercomputer

Brief History of Parallel Computer

L.F.Richardson(U.K.)1911: presented a numerical method for

non-linear partial differential equations1922: presented a paper “Weather Prediction by Numerical

Processes”3D analysis (5 layers for vertical direction)6 weeks calculation needed for 6 hours prediction by

manual calculating machine

Dream of RichardsonNorthern hemisphere are discretized by 2000 blocks32 people are assigned in each block (64,000 people are needed)6 hours prediction carried out by 3 hours

Dream of Richardson

Birth of Parallel ComputerUniversity of Illinois(Daniel Slotnic designed two parallel computers)1972: First parallel computer ILLIAC Ⅳ(Burroughs)was developed (64 processing element, 1 control unit:SIMD)

Development of Parallel ComputerIn Japan

1977: PACS/PAX project was startedTsutomu Hoshino(Kyoto University)・PACS-9(1978, 0.01Mflops)・PAX-32(1980, 0.5Mflops)

1980: PACS/PAX project was moved to Tsukuba University・PAX-128(1983, 4M)・PAX-64J(1986, 3.2Mflops)・QCDPAX(1989, 14Gflops)・CP-PACS(1996, 300Gflops)       (1997, 600Gflops:2048CPU)http://www.rccp.tsukuba.ac.jp/

Big Projects in Computer ScienceU.S.A.・CIC R&D(Computing, Information and Communications R&D Program)ASCI(Accelerated Strategic Computing Initiative) projectWhite:10Tflops(2000)Turquoise:30Tflops(2002)

Japan・「Earth Simulator」project(Ministry of Science and Technology)  Peak performance:40Tflops(2001)  Memory:10TB,

Development cost:¥40 billion・「Computer Science」project (Ministry of Education)  Support for the development of parallel computer in university  CP-PACS(Tsukuba University),GRAPE(University of Tokyo)

Two Currents for Parallel Computing

Computing using Business Parallel Computer:Very Large Scale ComputingExpensive

Computing using PC/WS Cluster:Mediam-Large Scale ComputingCheap&Flexible

Hitachi SR2201(University of Tokyo)

PC Cluster(University of Heidelberg)

Ise-Bay Typhoon (1959)

Power:929hPaNumber of dead person:5098

Damage by Ise-Bay Typhoon

Path of Ise-Bay Typhoon

Mesh Partitioning

(512 processors)

Finite Element Mesh

(elements:206,977,nodes:106,577)

Shallow Water Equations

@Ui@t

+ Uj@Ui@xj

+ g@ê

@xiÄ

@

@xj[Ah(

@Ui@xj

+@Uj@xi

)] +úb3ih + ê

Äús3ih + ê

= 0

where, :mean velocity

ê :water elevation

h :water depthg :gravity acceleration

Ah :horizontal eddy viscosity coefficient

@t+

@

@xi[(h+ê)Ui] = 0 

Ui

úb3iús3i

:bottom shear stress

:surface shear stress

Comparison between computed and observed results at Nagoya

Speed-up ratio

Computational time for one PE

Computational time for N PEsSpeed-up ratio=

2D-mesh

3D-mesh

120 slices

Min. mesh size :0.001D

Span length 6D

Slice length:0.05D

Elements:7,089Nodes:7,213

Elements:851,760nodes:872,773

Span length 6D

Finite Element Mesh

Re=1000

・円柱周方向               96不等分割

・円柱半径方向              68不等分割

・円柱スパン方向             120等分割Min.Mesh size=0.001D

Finite element mesh around a circular cylinder

3D analysis2D analysis

Dimensionless Frequency

Pow

er S

pect

rum

CD

CLCD

F=0.259

0.0 0.5 1.0 1.5 2.0 2.5 3.0

10-6

10-3

100

103

F=1.26

F=0.763F=0.519

F=1.02

F=1.53

Pow

er S

pect

rum

Dimensionless Frequency

CL

CDF=0.229

F=0.443

0.0 0.5 1.0 1.5 2.0 2.5 3.010-12

10-9

10-6

10-3

100 F=0.229

Dimensionless Time

CD

, C

L

CL

CD

0 50 100 150 200 250-2

-1

0

1

2

Dimensionless Time

CD ,

CL

CL

CD

Exp.

T=61.0 T=85.0

0 50 100 150 200 250-2

-1

0

1

2

D im e n s io n le s s T im e

CD ,

CL

C L

C D

E x p .

T = 6 1 . 0 T = 8 5 . 0

0 5 0 1 0 0 1 5 0 2 0 0 2 5 0- 2

- 1

0

1

2

T=61

CD

CL

T=84

Parallel Finite Element Analysis of Free Surface Flows Using PC Cluster

Kazuo Kashiyama, Seizo Tanaka, Katsuyuki Sue and Masaaki SakurabaChuo University, Tokyo, Japan

Topics・Introduction・Governing Equations and Stabilized FEM・PC Cluster Parallel Computing・Numerical Examples

Sloshing of Rectangular Tank and Actual Dam・Conclusions

Introduction

over flow!!

earthquake

Sloshing problem

Purpose:Development of a useful numerical method to evaluate the safety for sloshing of tank and dam by earthquake

Present Approach:Navier-Stokes EquationALE-Stabilized FEMPC Cluster Parallel Computing

ö

í@ui@t

+ (ñuj)@ui@xjÄ fi

ìÄ @õij@xj

= 0

@ui@xi

= 0

õij = Äpéij + ñ

í@ui@xj

+@uj@xi

ì

ui = g on Äg

õijnj = hi on Äh

Governing Equations

Boundary Conditions

wi ö

í@ui@t

+ ñuj@ui@xjÄ fi

ìdä+

@wi@xj

õij dä+

qi@ui@xi

+nelXe=1

Zäeúm

öñuk@wi@xk

+1

ö

@q

@xi

õÅöö

í@ui@t

+ (ñuj)@ui@xjÄ fi

ìÄ @õij@xj

õdäe

+nelXe=1

Zäeúc@wi@xi

ö@ui@xi

däe

=

ZÄh

wiõijnj dÄh

úm =

"í2

Åt

ì 2

+

í2jjuijjhe

ì 2

+

í4ñ

h2e

ì 2#Ä 1

2

úc =h

2jjuijjò(Ree)

Stabilized FEM(SUPG/PSPG)

where

(M + Mé)@ui@t

+ (K(ñuj) + Ké(ñuj))ui

Ä (CÄCé)1

öp+óSui = (N + Né) fi

CT ui + M"@ui@t

+ K"(ñuj )ui

Ä N"fi + C"1

öp = 0

24 êM+Mé

Å t

ëÄ (CÄCé)ê

CT + M"Å t

ëC"

35" un+1i

1öp

n+1

#=

"bnidni

#

Finite Element Equations

@H

@t= ñus

n1

n3+ ñvs

n2

n3+ ws

@2û

@x2i

= 0

û = ÅH on Äm

û = 0 on Äf

Äm

Äf

Rezoning and Remeshing

Bi-CGSTAB Method

A x = b

r 0 = b Ä A x 0 = b ÄXe

A (e ) x 0| { z }2ç

p 0 = r 0

q k = A p k =Xe

A ( e )p k| { z }2ç

ãk = (r 0 ; r k )| { z }1ç

= ( r0 ; q k )| { z }1ç

tk = r k Ä ãk q k

sk = Atk =Xe

A(e)tk| {z }2ç

êk = (sk; tk)| {z }1ç

= (sk; sk)| {z }1ç

xk+1 = xk + ãkpk +êktk

rk+1 = tk Ä êkqk

åk = ãk=êk Å(r0; rk+1)| {z }1ç

= (r0; rk)| {z }1ç

pk+1 = rk+1 + åk (pk Äêkqk)

Initialization

Iteration

①:Global communication, ②:Neighboring communication

Linux(GatewayPC)

Linux

Network of CML

HUB

Network of Biowulf

Network Configuration

Hub

Network Switch Hub100Base-Tx ○

100Base-Tx ×

10Base-T ×

P CCPU PentiumII 400MHz

Cache 512KBRAM 512MB

NICDEC DC21x4x PCI

(10/100Base-Tx)

・Hardware

OS Linux-2.0.34Comm. Library MPICH1.1.1

・Software

Development of Parallel Computer

MPICH-A Portable Implementation of MPI(http://www-unix.mcs.anl.gov/mpi/mpich/index.html)

0.10 m0.50 m

1.00 mY

X

Z

Numerical Examples

・Sloshing analysis of rectangular tank and actual dam・PC cluster parallel computing

0.10 m0.50 m

1.00 mY

X

Z

fx = A sin!t

(4,305 nodes 19,200 elements)

Numerical Example(1)

(A=0.0093m,ω=5.311rad/sec)

waterA

0 10 20 30 40 50

0.4

0.6

0.8

1

time [s]

wat

er e

leva

tion

[m

]

Time History of Water Elevation at Point A

0 2 4 6 8 10 12

0.4

0.6

0.8

1

1.2

time [s]

wat

er e

leva

tion

[m

] Exp. (Okamoto) Present method

Comparison of Water Elevation

Time = 6.0 [s]

Time = 12.0 [s]

Time = 9.0 [s]Time = 9.0 [s]

Time = 15.0 [s]

Time = 6.0 [s]

Computed Results

Numerical Example(2)

Dam for pumped-storage power generation

             

0.0 200.0 400.0 600.0

0.0

280.0

X [m]

Y [m]

0.0 200.0 400.0 600.0

0.0

Z [m]

X [m]

0.0 200.0 400.0 600.0

0.0

280.0

X [m]

Y [m]

A

BC

D

Finite Element Model

bottom

free surface

(elements:109,314, nodes:22,610)

0 20 40 60 80 100

-0.05

0

0.05

Time [s]

X-velocity

[m/s

]

Earthquake Data (Input Ground Motion)Computed by Harada and Oosumi

0 20 40 60 80 100

28.8

29.0

29.2

29.4

Time [s]

Wat

er L

evel

[m

]

0 . 0 2 0 0 . 0 4 0 0 . 0 6 0 0 . 0

0 . 0

2 8 0 . 0

X [m]

Y [m]

A

BC

D

0 20 40 60 80 100

-0.05

0

0.05

Time [s]

X-velocity

[m/s

]

Computed Water Elevation at Point A

Computed Results

Accton 100Base-TX Switch Hub

Accton 100Base-TX

No Switch Hub

Alied Telesis 10Base-T

No Switch Hub

PE total time

comm.&wait

total time

comm.&wait

total time

comm.&wait

MESH (total number of nodes 22,610,total number of elements 109,314)

1 3331.3 0.0 3331.3 0.0 3331.3 0.0

2 1812.0 39.1 (2.2%)

1826.1 50.8 (2.8%)

1897.3 121.2 (6.4%)

4 961.1 71.7

(7.5%) 979.1 89.4

(9.1%) 1197.5 287.4

(24.0%)

8 462.4 25.3

(5.5%) 497.0

57.1 (11.5%)

1269.6 698.8

(55.0%)

Comparison of Network Environment

Performance of Parallel Computing

Accton100Base-TXSwitch Hub

RS 6000 / SP

PE total time comm.&wait total time comm.&wait

MESH (total number of nodes 22,610,total number of elements 109,314)

1 3331.3 0.0 2935.8 0.0

2 1812.0 39.1(2.2%)

1586.7 6.6(0.4%)

4 961.171.7

(7.5%)805.3

56.1(7.0%)

8 462.425.3

(5.5%)417.8

34.7(8.3%)

Comparison with RS6000/SP

Comparison with IBM RS6000/SP

Parallel Finite Element Analysis of Asphalt ConcreteUsing Image-Base Modeling

Kazuo Kashiyama, Takaaki Kakehi(Chuo University)Takashi Izumiya(Yachiyo Engineering Corporation)Tomoyuki Uo(Kajima Corporation)Kenjiro Terada(Tohoku University)

Outline

・Introduction・Governing Equation・Formulation of Homogenization・Image-Base Modeling Using X-ray CT・Parallel Implementation・Numerical Analysis・Conclusions

subgrade

base course

surface course

structure of pavement

Purpose of This Study

・A parallel finite element method based on the homogenizationtheory for the visco-elastic analysis of asphalt concrete is presented.

・The accurate configuration of microstructure is modeled by the digital image obtained by the X-ray CT.

aggregateasphalt mortor

xxx

y

yy

1 1

2 2

33

Ω

Γ

Γ ε

u

t

ΓΓ

s

s

s

f

f

f

fluidsolid

Homogenization Method

solidElastic body

fluid

Newtonian fluids(Stokes flow)

Solid-fluid mixtures with periodic microstructure

Governing Equation

@õèij@xi

+öèñbj = 0   inäè

õèij(x) = bèijkh(x)"kh(uè) + cèijkh(x)"kh

í@uè

@t

ìbèijkh(x) =

(Eijkh(x)      in äès13K

féijékh      in äèf

cèijkh(x) =

(0 in äès2ñè

Äéikéjh Ä 1

3éijékhÅ

in äèf

Equilibrium equation:

Constitutive equation:

Principle equation of virtual work

bè(uè;!è) + cèí@uè

@t; !èì

=

ZÄt

ñt!èdÄ+

Zäèöèñb!èdx(

bè(uè;!è) =Räèb

èijkh(x)"ij(u

è)"kh(!è)dx

cè(uè;!è) =Räèc

èijkh(x)"ij(u

è)"kh(!è)dx

Two-scale asymptotic expansion

uè(x) = u0(x; t) +èu1(x;y; t)+è2u2(x;y; t)

+ ……… +ènun(x;y; t)

TOSCANER-23200 (Kumamoto Univ.)

Digital Image Processing for Asphalt Concrete

Scann type : Traverse/RotationPower of X-ray : 300kV/200kVNumber of detectors : 176 channelsSize of specimen : φ400mm×H600mmThickness of slice : 0.5mm,1mm,2mmSpacial resolusion : 0.2mm

Finite Element Model for Microstructure

Finite Element Model

Digital Image (2D) Digital Image(3D)

Microscopic Domain

Domain Decomposition

Parallel Computing Method based on Domain Decomposition Method

Microscopic Structure

1) Equalize the number of elements in each sub-domain2) Minimize the number of nodes on the boundary of sub-domain

Element by Element SCG Method for Parallel Computing

Ax=b

r0 = bÄAx0 = bÄXeA(e)x0 (9)

p0 = r0 (10)

qk = Apk =XeA(e)pk (11)

ãk = (rk;rk)=(pk;qk) (12)

xk+1 = xk +ãkpk (13)

rk+1 = rk Äãkqk (14)

åk = (rk+1;rk+1)=(rk;rk) (15)

pk+1 = rk+1 +åkpk (16)

①neighboring

communication

②global communication

(nodes 9537,elements 8192)(φ10cm , h 20cm ) (40mm×40mm×40mm)

(nodes68921,elements64000)

Macroscopic model Microscopic model

Solid

ν

E=61.0GPa=0.21

Fluid

K=10.0GPaμ=1.0GPas

Vf=49.1%

Material constants

Numerical Analysis

Macro-microscopic model

0 500 1000 1500

-6

-5

-4

-3

-2

-1

0

Time(s)

Axi

al s

tres

s(M

Pa)

 Digital image model(Vf=14.6%) Idealized model(Vf=14.6%) Digital image model(Vf=49.1%) Idealized model(Vf=49.1%) Experiment

Time history of axial stress of the macroscopic

60s

(MPa)

1500s

(MPa) (MPa)

Macroscopic von Mises stress distribution(Vf=49.1%)

60s

(MPa) (MPa)

1500s

Microscopic von Mises stress distribution of the solid parts(Vf=49.1%)

60s

(MPa) (MPa)

1500s

Microscopic von Mises stress distribution of the fluid parts(Vf=49.1%)

10*10*10mm(1000pixels)20*20*20mm(8000pixels)30*30*30mm(27000pixels)40*40*40mm(64000pixels)

Effect of the region of unit cell

0 500 1000 1500

-6

-5

-4

-3

-2

-1

0

 10*10*10mm 20*20*20mm 30*30*30mm 40*40*40mm Experiment

Time(s)

Axi

al s

tres

s(M

Pa)

Vf=49.1%

convergence

Time history of axial stress of macroscopic(Vf=49.1%)

2 4 6 850

60

70

80

90

100

Number of processors

Eff

icie

ncy

40*40*40elements 64*64*64elements 80*80*80elements

Vf = 49.1%

2 4 6 8

100

200

300

400

Number of processorsE

xe s

ize[

MB

]

40*40*40elements 64*64*64elements 80*80*80elements

Vf = 49.1%

Efficiency of Parallelization

Korakuen Campus(School of Science and Engineering)

Tama Campus(5 Schools of Liberal Arts)

Domain Decomposition Method

Greedy Algorithm

Farhat,C., A simple and efficient automatic FEM domain decomposer, Computer and Structures, Vol.28, pp.576-602

top related