paolo miocchi in collaboration with r. capuzzo-dolcetta, p. di matteo, a. vicari dept. of physics,...

32
Paolo Miocchi in collaboration with R. Capuzzo-Dolcetta, P. Di Matteo, A. Vicari Dept. of Physics, Univ. of Rome “La Sapienza” (Rome, Italy) Work supported by the INAF-CINECA agreement ( http:// inaf.cineca.it , grant inarm033). The use of High The use of High Performance Computing in Performance Computing in Astrophysics: an Astrophysics: an experience report experience report

Upload: barnaby-bruce

Post on 18-Jan-2016

217 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Paolo Miocchi in collaboration with R. Capuzzo-Dolcetta, P. Di Matteo, A. Vicari Dept. of Physics, Univ. of Rome “La Sapienza” (Rome, Italy) Work supported

Paolo Miocchi

in collaboration with

R. Capuzzo-Dolcetta, P. Di Matteo, A. Vicari

Dept. of Physics, Univ. of Rome “La Sapienza” (Rome, Italy)

Work supported by the INAF-CINECA agreement (http://inaf.cineca.it, grant inarm033).

The use of High Performance The use of High Performance Computing in Astrophysics: an Computing in Astrophysics: an

experience reportexperience report

Page 2: Paolo Miocchi in collaboration with R. Capuzzo-Dolcetta, P. Di Matteo, A. Vicari Dept. of Physics, Univ. of Rome “La Sapienza” (Rome, Italy) Work supported

The needs of HPC in Globular Cluster The needs of HPC in Globular Cluster dynamicsdynamics

Theoretical study of a system made up of

N ~ 105 – 107 gravitationally bound stars

(Self-gravitating system).

Page 3: Paolo Miocchi in collaboration with R. Capuzzo-Dolcetta, P. Di Matteo, A. Vicari Dept. of Physics, Univ. of Rome “La Sapienza” (Rome, Italy) Work supported

The needs of HPC in Globular Cluster The needs of HPC in Globular Cluster dynamicsdynamics

Theoretical study of a system made up of N ~ 105 – 107 gravitationally bound stars

(Self-gravitating system).

O(N2 ) force computations to do.

Page 4: Paolo Miocchi in collaboration with R. Capuzzo-Dolcetta, P. Di Matteo, A. Vicari Dept. of Physics, Univ. of Rome “La Sapienza” (Rome, Italy) Work supported

The needs of HPC in Globular Cluster The needs of HPC in Globular Cluster dynamicsdynamics

Gravity is a long-range and attractive force

Very unstable dynamical states

Page 5: Paolo Miocchi in collaboration with R. Capuzzo-Dolcetta, P. Di Matteo, A. Vicari Dept. of Physics, Univ. of Rome “La Sapienza” (Rome, Italy) Work supported

The needs of HPC in Globular Cluster The needs of HPC in Globular Cluster dynamicsdynamics

Gravity is a long-range and attractive force

Inhomogeneous mass distributions

very wide range of time-scales ~ (G)–1/2

Numerically “expensive” time integration

of particle motion

Individual and variable time-steps should be adopted

Page 6: Paolo Miocchi in collaboration with R. Capuzzo-Dolcetta, P. Di Matteo, A. Vicari Dept. of Physics, Univ. of Rome “La Sapienza” (Rome, Italy) Work supported

The needs of HPC in Globular Cluster The needs of HPC in Globular Cluster dynamicsdynamics

Gravity is a long-range and attractive force

Very unstable dynamical states Inhomogeneous mass distributions 3D problems!

arduous analytical approach!

Page 7: Paolo Miocchi in collaboration with R. Capuzzo-Dolcetta, P. Di Matteo, A. Vicari Dept. of Physics, Univ. of Rome “La Sapienza” (Rome, Italy) Work supported

The needs of HPC in Globular Cluster The needs of HPC in Globular Cluster dynamicsdynamics

Dynamical evolution of self-gravitating systems with N > 105 stars

> tens of Gflops needed!

codes PARALLELIZATION required

Page 8: Paolo Miocchi in collaboration with R. Capuzzo-Dolcetta, P. Di Matteo, A. Vicari Dept. of Physics, Univ. of Rome “La Sapienza” (Rome, Italy) Work supported

rrQrrQrF 753 2

5rG

rG

rGM

m

computational cost independent of n

m

r cmFm

The tree-codeThe tree-coden particles

M = tot. mass

Q = quadrupole

see Barnes & Hut 1986, Nature 324, 446

Page 9: Paolo Miocchi in collaboration with R. Capuzzo-Dolcetta, P. Di Matteo, A. Vicari Dept. of Physics, Univ. of Rome “La Sapienza” (Rome, Italy) Work supported

1

3

2

4

‘tree’ logical structure

each node corresponds to a box

recursive subdivision in ‘boxes’

The tree-codeThe tree-code

Page 10: Paolo Miocchi in collaboration with R. Capuzzo-Dolcetta, P. Di Matteo, A. Vicari Dept. of Physics, Univ. of Rome “La Sapienza” (Rome, Italy) Work supported

The tree-codeThe tree-code

Multipolar coefficients are evaluated for each box. O(N log N) computations

recursive subdivision in ‘boxes’

Page 11: Paolo Miocchi in collaboration with R. Capuzzo-Dolcetta, P. Di Matteo, A. Vicari Dept. of Physics, Univ. of Rome “La Sapienza” (Rome, Italy) Work supported

Problems in the tree-code Problems in the tree-code parallelizationparallelization

Gravity is a long range interaction: inter-processor data transfer unavoidable (heavy overhead on DMP)

Inhomogeneous mass distributions: particles assignment to PEs has to be done according to the work-load

Hierarchical force evaluation: most of force contributions due to closer bodies, spatial domain decomposition.

Page 12: Paolo Miocchi in collaboration with R. Capuzzo-Dolcetta, P. Di Matteo, A. Vicari Dept. of Physics, Univ. of Rome “La Sapienza” (Rome, Italy) Work supported

the ‘Adaptive Tree Decomposition’ the ‘Adaptive Tree Decomposition’ methodmethod

Domain decomposition is performed ‘on-the-fly’ during the tree-construction with a low computational cost.

The adaptivity of the tree structure is exploited to give a good load-balancing and data-locality in the forces evaluation.

The locally essential tree is built ‘dynamically’ during the tree-walking: remote boxes are linked only when really needed.

Page 13: Paolo Miocchi in collaboration with R. Capuzzo-Dolcetta, P. Di Matteo, A. Vicari Dept. of Physics, Univ. of Rome “La Sapienza” (Rome, Italy) Work supported

the ‘Adaptive Tree Decomposition’ the ‘Adaptive Tree Decomposition’ methodmethod

LOWER-TREE: few boxes containing many particles.

Two different parallelization strategies

UPPER-TREE: many boxes with few particles inside.

see Miocchi & Capuzzo-Dolcetta 2002, A&A 382, 758

PE

3

2

1

0

Page 14: Paolo Miocchi in collaboration with R. Capuzzo-Dolcetta, P. Di Matteo, A. Vicari Dept. of Physics, Univ. of Rome “La Sapienza” (Rome, Italy) Work supported

Some definitionsSome definitions

UPPER-tree = made up of boxes with less than kp particles inside;

LOWER-tree = made up of boxes with more than kp particles;

a Pseudo-terminal (PTERM) box is a box in the upper-tree whose ‘parent box’ is in the lower-tree;

p = no. of processors,

k = fixed coefficient

the ‘Adaptive Tree Decomposition’ the ‘Adaptive Tree Decomposition’ approachapproach

Page 15: Paolo Miocchi in collaboration with R. Capuzzo-Dolcetta, P. Di Matteo, A. Vicari Dept. of Physics, Univ. of Rome “La Sapienza” (Rome, Italy) Work supported

load balancing: in this stage it is ensured by setting k sufficiently large so to deal always with a number of particles in a box much greater than the number of processors.

the ‘Adaptive Tree Decomposition’ the ‘Adaptive Tree Decomposition’ methodmethod

1. Preliminary “random” particles distribution to PEs.

2. All PEs work, starting from the root box, constructing in synchrony the same lower-boxes (by a recursive procedure).

3. When a PTERM box is found, it is assigned to a certain PE (so to preserve a good load-balancing in the subsequent forces evaluation) and no further ‘branches’ are built up.

domain decomposition: Communications among PEs during tree-walking are minimized by the particular order in which PTERM boxes are met. The lower-tree is stored in the local memories of ALL PEs.

Parallelization of the Parallelization of the lowerlower-tree construction...-tree construction...

Page 16: Paolo Miocchi in collaboration with R. Capuzzo-Dolcetta, P. Di Matteo, A. Vicari Dept. of Physics, Univ. of Rome “La Sapienza” (Rome, Italy) Work supported

the ‘Adaptive Tree Decomposition’ the ‘Adaptive Tree Decomposition’ methodmethod

Example of a uniform 2-D distribution with PTERM boxes at the 3rd subdivision level.

Every spatial domain is (nearly) contiguous

the data transfer among PEs is minimized

PTERM orderPTERM order

Page 17: Paolo Miocchi in collaboration with R. Capuzzo-Dolcetta, P. Di Matteo, A. Vicari Dept. of Physics, Univ. of Rome “La Sapienza” (Rome, Italy) Work supported

Example of domain decompositionExample of domain decomposition

Plummer distribution of 16K particles; 4 processors

the ‘Adaptive Tree Decomposition’ the ‘Adaptive Tree Decomposition’ methodmethod

Page 18: Paolo Miocchi in collaboration with R. Capuzzo-Dolcetta, P. Di Matteo, A. Vicari Dept. of Physics, Univ. of Rome “La Sapienza” (Rome, Italy) Work supported

Parallelization of the Parallelization of the upperupper-tree construction-tree construction

PTERM boxes have been already distributed to PEs Each PE works independently and asynchronously,

starting from every PTERM box in the domain and building the descendant portion of the upper-tree, up to the terminal boxes.

the ‘Adaptive Tree Decomposition’ the ‘Adaptive Tree Decomposition’ methodmethod

Page 19: Paolo Miocchi in collaboration with R. Capuzzo-Dolcetta, P. Di Matteo, A. Vicari Dept. of Physics, Univ. of Rome “La Sapienza” (Rome, Italy) Work supported

Parallelization of the tree walkingParallelization of the tree walking

Each PE evaluates independently the forces on the particles belonging to its domain (i.e. those contained in the PTERM boxes previously assigned).

Each PE has in its memory the local tree, i.e. the whole lower-tree plus the portion of the upper-tree that is descended from the PTERM boxes of the PE’s domain.

When a ‘remote’ box is met, it is linked to the local tree, copying it into the local memory.

the ‘Adaptive Tree Decomposition’ the ‘Adaptive Tree Decomposition’ methodmethod

Page 20: Paolo Miocchi in collaboration with R. Capuzzo-Dolcetta, P. Di Matteo, A. Vicari Dept. of Physics, Univ. of Rome “La Sapienza” (Rome, Italy) Work supported

Code performance on a IBM SP4Code performance on a IBM SP4

Performances on one ‘main’ time-step (T ) with complete forces evaluation and time integration of motion for a self-gravitating system with N = 106 particles

WARNING

each particle has its own variable time-step depending on the local density of mass and typical velocity.

Dynamical tree recostruction implemented according to the block time scheme the particle step can be T/2n

(Aarseth 1985)

The tree is re-built when the no. of interactions evaluated is > N /10

(Springel et al., 2001, New Astr., 6, 51)

Page 21: Paolo Miocchi in collaboration with R. Capuzzo-Dolcetta, P. Di Matteo, A. Vicari Dept. of Physics, Univ. of Rome “La Sapienza” (Rome, Italy) Work supported

Code performance on a IBM SP4Code performance on a IBM SP4

Performance on one ‘main’ time-step (T ) with complete forces evaluation and time integration of motion for a self-gravitating system with N = 106 particles

Particle time-step distribution

0

1

2

3

4

5

6

7

T T/2 T/4 T/8 T/16 T/32 T/64 T/128 T/256

time-step

log

(n)

2,100,000 time-advancing performed

Page 22: Paolo Miocchi in collaboration with R. Capuzzo-Dolcetta, P. Di Matteo, A. Vicari Dept. of Physics, Univ. of Rome “La Sapienza” (Rome, Italy) Work supported

Code performance on a IBM SP4Code performance on a IBM SP4

CPU-time (sec)

Performance on one ‘main’ time-step with complete forces evaluation and time integration of motion for a self-gravitating system with N = 106 particles ( = 0.7, k = 256, up to 16 PEs per node)

25,000 particles per second

Page 23: Paolo Miocchi in collaboration with R. Capuzzo-Dolcetta, P. Di Matteo, A. Vicari Dept. of Physics, Univ. of Rome “La Sapienza” (Rome, Italy) Work supported

Code performance on a IBM SP4Code performance on a IBM SP4

The speedup behaviour is very good up to 16 PEs (= 10).

The load-unbalancing is low (10% with 64 PEs). Data transfer and communications still penalize

the overall performance with low PEs / N ratio (34% with 64 PEs).

An MPI-2 version could fully exploit the ATD parallelization strategy.

Page 24: Paolo Miocchi in collaboration with R. Capuzzo-Dolcetta, P. Di Matteo, A. Vicari Dept. of Physics, Univ. of Rome “La Sapienza” (Rome, Italy) Work supported

Merging of Globular Clusters in Merging of Globular Clusters in galactic central regionsgalactic central regions

To what extent can GCs survive the strong tidal bulge interaction?

Do they merge at the end? What features the final merging product

will have? To what extent can the bulge accrete from the

GCs mass lost?

Motivation: the study of Motivation: the study of the the dynamical evolution and the fate dynamical evolution and the fate of young GCs within the bulgeof young GCs within the bulge

Page 25: Paolo Miocchi in collaboration with R. Capuzzo-Dolcetta, P. Di Matteo, A. Vicari Dept. of Physics, Univ. of Rome “La Sapienza” (Rome, Italy) Work supported

Merging of Globular Clusters in Merging of Globular Clusters in galactic central regionsgalactic central regions

30,000 CPU-hours on an IBM SP4 provided by the INAF-CINECA agreement for a scientific ‘key-project’ (under grant inarm033)

Motivation: the study of Motivation: the study of the the dynamical evolution and the fate dynamical evolution and the fate of young GCs within the bulgeof young GCs within the bulge

Page 26: Paolo Miocchi in collaboration with R. Capuzzo-Dolcetta, P. Di Matteo, A. Vicari Dept. of Physics, Univ. of Rome “La Sapienza” (Rome, Italy) Work supported

Merging of Globular Clusters in Merging of Globular Clusters in galactic central regionsgalactic central regions

N-body (tree-code) accurate simulations with high number of ‘particles’ (106).

Dynamical friction and mass function included. Self-consistent triaxial bulge model (Schwarzschild).

Features of the numerical approachFeatures of the numerical approach

3310090.97215b

37425.51.29820c

37283.81.37715d

33170140.89520a

(km/s)

tcr (Kyr)

rc (pc)crt (pc)M (106 M)clusterSimulation

B

A

higher concentration

Page 27: Paolo Miocchi in collaboration with R. Capuzzo-Dolcetta, P. Di Matteo, A. Vicari Dept. of Physics, Univ. of Rome “La Sapienza” (Rome, Italy) Work supported

Merging of Globular Clusters in Merging of Globular Clusters in galactic central regionsgalactic central regions

Quasi-radial orbits Clusters cross each other at every passage (twice per period)

t (Myr)

x (pc)

Page 28: Paolo Miocchi in collaboration with R. Capuzzo-Dolcetta, P. Di Matteo, A. Vicari Dept. of Physics, Univ. of Rome “La Sapienza” (Rome, Italy) Work supported

Merging of Globular Clusters in Merging of Globular Clusters in galactic central regionsgalactic central regions

“tidal tails” around Pal 5 (after Odenkirchen et. al. 2002)

Our simulation of a cluster in a circular orbit

tidal tails reproduced by our simulation

Tidal tails structure and formationTidal tails structure and formation

Page 29: Paolo Miocchi in collaboration with R. Capuzzo-Dolcetta, P. Di Matteo, A. Vicari Dept. of Physics, Univ. of Rome “La Sapienza” (Rome, Italy) Work supported

Merging of Globular Clusters in Merging of Globular Clusters in galactic central regionsgalactic central regions

“ripples” around a cluster in our simulations “ripples” around NGC

3923

Tidal tails structure and formationTidal tails structure and formation

Page 30: Paolo Miocchi in collaboration with R. Capuzzo-Dolcetta, P. Di Matteo, A. Vicari Dept. of Physics, Univ. of Rome “La Sapienza” (Rome, Italy) Work supported

Merging of Globular Clusters in Merging of Globular Clusters in galactic central regionsgalactic central regions

“ripples” around a cluster“ripples” around NGC 3923

What “ripples” are?

How do they form?

3D visualization tools can help to give answers!

Tidal tails structure and formationTidal tails structure and formation

Page 31: Paolo Miocchi in collaboration with R. Capuzzo-Dolcetta, P. Di Matteo, A. Vicari Dept. of Physics, Univ. of Rome “La Sapienza” (Rome, Italy) Work supported

Merging of Globular Clusters in Merging of Globular Clusters in galactic central regionsgalactic central regions

t = 0t = 17 Myr (dashed black line: bulge central density)

least compact cluster at t = 15 Myr

Density profiles of the most compact cluster (solid lines) fitted with a single-mass King model (dotted lines)

tidal tails

Page 32: Paolo Miocchi in collaboration with R. Capuzzo-Dolcetta, P. Di Matteo, A. Vicari Dept. of Physics, Univ. of Rome “La Sapienza” (Rome, Italy) Work supported

Merging of Globular Clusters in Merging of Globular Clusters in galactic central regionsgalactic central regions

p = fraction of mass

lost if / < p/100

central cluster density

E = fraction of

mass lost if Ei > 0

FractionoFractionof mass f mass

lostlost

c = 0.8

0.91.2

1.3

bulge stellar density