enspy: python library for computations of ensembles of particles on gpu

41
EnSPy: Python library for computations of ensembles of particles on GPU EnSPy: Python library for computations of ensembles of particles on GPU Glib Ivashkevych Institute of Theoretical Physics, NSC KIPT, Kharkov, Ukraine October 13, 2010

Upload: phtraveller

Post on 19-Feb-2015

40 views

Category:

Documents


0 download

DESCRIPTION

Talk at Frontiers in Computational Astrophysics (Lyon, France, 10-16 October, 2010).

TRANSCRIPT

Page 1: EnSPy: Python library for computations of ensembles of particles on GPU

EnSPy: Python library for computations of ensembles of particles on GPU

EnSPy: Python library for computations ofensembles of particles on GPU

Glib Ivashkevych

Institute of Theoretical Physics, NSC KIPT,Kharkov, Ukraine

October 13, 2010

Page 2: EnSPy: Python library for computations of ensembles of particles on GPU

EnSPy: Python library for computations of ensembles of particles on GPU

Why GPU?

GPU – Graphic Processing Unit

programmable

manycore

multithreaded

with very high memory bandwidth

GPU programming give us:

high performance

transparent scalability

... and is useful for problems with high data parallelism:

large datasets

portions of data could be processed independently

Page 3: EnSPy: Python library for computations of ensembles of particles on GPU

EnSPy: Python library for computations of ensembles of particles on GPU

Why GPU?

GPU – Graphic Processing Unit

programmable

manycore

multithreaded

with very high memory bandwidth

GPU programming give us:

high performance

transparent scalability

... and is useful for problems with high data parallelism:

large datasets

portions of data could be processed independently

Page 4: EnSPy: Python library for computations of ensembles of particles on GPU

EnSPy: Python library for computations of ensembles of particles on GPU

Outline

Outline1 NVIDIA GPU Architecture and CUDA2 Python and CUDA3 EnSPy functionality4 EnSPy architecture5 Example: D5 potential6 Example: Hill problem7 Example: Hill problem, N–body version8 Performance results9 Future development

Page 5: EnSPy: Python library for computations of ensembles of particles on GPU

EnSPy: Python library for computations of ensembles of particles on GPU

NVIDIA GPU Architecture and CUDA

Outline1 NVIDIA GPU Architecture and CUDA2 Python and CUDA3 EnSPy functionality4 EnSPy architecture5 Example: D5 potential6 Example: Hill problem7 Example: Hill problem, N–body version8 Performance results9 Future development

Page 6: EnSPy: Python library for computations of ensembles of particles on GPU

EnSPy: Python library for computations of ensembles of particles on GPU

NVIDIA GPU Architecture and CUDA

Simplified GT200 architecture

consists ofmultiprocessors

each MP has:

8 stream processors1 unit for doubleprecision operationsshared memory

global memory

Page 7: EnSPy: Python library for computations of ensembles of particles on GPU

EnSPy: Python library for computations of ensembles of particles on GPU

NVIDIA GPU Architecture and CUDA

Multiprocessors and threads

MP can launch numerous threads

threads are ”lightweight” – little creation and switchingoverhead

threads run the same code

threads syncronization within MP

cooperation via shared memory

each thread have unique identifier – thread ID

Efficiency is achieved by latency hiding by calculation, and not bycache usage, as on CPU

Page 8: EnSPy: Python library for computations of ensembles of particles on GPU

EnSPy: Python library for computations of ensembles of particles on GPU

NVIDIA GPU Architecture and CUDA

C for CUDA

a set of extensions to C

runtime library

function and variable type qualifiers

built–in vector types: float4, double2 etc.

built–in variables

Kernels

maps parallel part of the program to the GPU

execution: N times in parallel by N CUDA threads

CUDA Driver API

low–level control over the execution

no need in nvcc compiler if kernels are precompiled – onlydriver needed

Page 9: EnSPy: Python library for computations of ensembles of particles on GPU

EnSPy: Python library for computations of ensembles of particles on GPU

NVIDIA GPU Architecture and CUDA

Execution model

Page 10: EnSPy: Python library for computations of ensembles of particles on GPU

EnSPy: Python library for computations of ensembles of particles on GPU

Python and CUDA

Outline1 NVIDIA GPU Architecture and CUDA2 Python and CUDA3 EnSPy functionality4 EnSPy architecture5 Example: D5 potential6 Example: Hill problem7 Example: Hill problem, N–body version8 Performance results9 Future development

Page 11: EnSPy: Python library for computations of ensembles of particles on GPU

EnSPy: Python library for computations of ensembles of particles on GPU

Why Python?

Python: flexible multipurpose interpreted language

easy to learn

dynamically typed

rich built–in functionality

very well documented

have large and active community

Page 12: EnSPy: Python library for computations of ensembles of particles on GPU

EnSPy: Python library for computations of ensembles of particles on GPU

Why Python?

Python scientific packages:

SciPy – modeling and simulation

Fourier transformsODEOptimizationscipy.weave.inline – C inlining with little or nooverhead· · ·

NumPy – arrays, linear algebra etc.

flexible array creation routinessorting, random sampling and statistics· · ·

Python is a convenient way of interfacing C/C++ libraries

Page 13: EnSPy: Python library for computations of ensembles of particles on GPU

EnSPy: Python library for computations of ensembles of particles on GPU

Why Python?

Python scientific packages:

SciPy – modeling and simulation

Fourier transformsODEOptimizationscipy.weave.inline – C inlining with little or nooverhead· · ·

NumPy – arrays, linear algebra etc.

flexible array creation routinessorting, random sampling and statistics· · ·

Python is a convenient way of interfacing C/C++ libraries

Page 14: EnSPy: Python library for computations of ensembles of particles on GPU

EnSPy: Python library for computations of ensembles of particles on GPU

Why Python?

Python scientific packages:

SciPy – modeling and simulation

Fourier transformsODEOptimizationscipy.weave.inline – C inlining with little or nooverhead· · ·

NumPy – arrays, linear algebra etc.

flexible array creation routinessorting, random sampling and statistics· · ·

Python is a convenient way of interfacing C/C++ libraries

Page 15: EnSPy: Python library for computations of ensembles of particles on GPU

EnSPy: Python library for computations of ensembles of particles on GPU

Why Python?

Python and CUDA

We could interface with:

Python C API – low–level approach: overkill

SWIG, Boost::Python – high–level approach: overkill

PyCUDA – most simple and straightforward way for CUDAonly

scipy.weave.inline – simple and straightforward way forboth CUDA and plain C/C++

Page 16: EnSPy: Python library for computations of ensembles of particles on GPU

EnSPy: Python library for computations of ensembles of particles on GPU

EnSPy functionality

Outline1 NVIDIA GPU Architecture and CUDA2 Python and CUDA3 EnSPy functionality4 EnSPy architecture5 Example: D5 potential6 Example: Hill problem7 Example: Hill problem, N–body version8 Performance results9 Future development

Page 17: EnSPy: Python library for computations of ensembles of particles on GPU

EnSPy: Python library for computations of ensembles of particles on GPU

EnSPy functionality

Motivation

Combine flexibility of Python with efficiency of C++ → CUDA forN–body sim

interface of EnSPy is written in Python

core of EnSPy is written in C++

joined together by scipy.weave.inline

C++ core could be used without Python – just include headerand link with precompiled shared library

easily extensible: both through high–level Python interfaceand low–level C++ core – new algorithms, initial distributionsetc.

multi–GPU parallelization

it’s easy to experiment with EnSPy!

Page 18: EnSPy: Python library for computations of ensembles of particles on GPU

EnSPy: Python library for computations of ensembles of particles on GPU

EnSPy functionality

EnSPy functionality

Types of ensembles:

”Simple” ensemble – without interaction, only externalpotential

N–body ensemble – both external potential and gravitationalinteraction between particles

Current algorithms:

4-th order Runge–Kutta for ”simple” ensemble

Hermite scheme with shared time steps for N-body ensemble

Page 19: EnSPy: Python library for computations of ensembles of particles on GPU

EnSPy: Python library for computations of ensembles of particles on GPU

EnSPy functionality

Predefined initial distributions:

Uniform, point and spherical for ”simple” ensembles

Uniform sphere with 2T/|U| = 1 for N-body ensemble

user could supply functions (in Python) for initial ensemblegeneration

User specified values and expressions:

parameters of initial distribution

potential, forces, parameters of integration scheme

arbitrary number of triggers – Ni (t) of particles which do notcross the given hypersurface Fi (q, p) = 0 before time t

arbitrary number of averages – F̄i (q, p, t) – quantities whichshould be averaged over the ensembles

Page 20: EnSPy: Python library for computations of ensembles of particles on GPU

EnSPy: Python library for computations of ensembles of particles on GPU

EnSPy functionality

Runtime generation and compilation of C and CUDA code:

User specified expressions (as Python strings) are wrapped byEnSPy template subpackage into C functions and CUDAmodule

Compiled at runtime

High usage and calculation efficiency:

flexible Python interface

all actual calculations are performed by runtime generated Cextension and precompiled shared library

Drawback:

extra time for generation and compilation of new code

Page 21: EnSPy: Python library for computations of ensembles of particles on GPU

EnSPy: Python library for computations of ensembles of particles on GPU

EnSPy architecture

Outline1 NVIDIA GPU Architecture and CUDA2 Python and CUDA3 EnSPy functionality4 EnSPy architecture5 Example: D5 potential6 Example: Hill problem7 Example: Hill problem, N–body version8 Performance results9 Future development

Page 22: EnSPy: Python library for computations of ensembles of particles on GPU

EnSPy: Python library for computations of ensembles of particles on GPU

EnSPy architecture

Execution flow and architecture

Input parameters

Ensemble population(predefined or user specifieddistribution)

Code generation andcompilation

Launching NGPUs threads

Page 23: EnSPy: Python library for computations of ensembles of particles on GPU

EnSPy: Python library for computations of ensembles of particles on GPU

EnSPy architecture

GPU parallelization scheme for N–body simulations

Page 24: EnSPy: Python library for computations of ensembles of particles on GPU

EnSPy: Python library for computations of ensembles of particles on GPU

EnSPy architecture

Order of force calculation

Page 25: EnSPy: Python library for computations of ensembles of particles on GPU

EnSPy: Python library for computations of ensembles of particles on GPU

Example: D5 potential

Outline1 NVIDIA GPU Architecture and CUDA2 Python and CUDA3 EnSPy functionality4 EnSPy architecture5 Example: D5 potential6 Example: Hill problem7 Example: Hill problem, N–body version8 Performance results9 Future development

Page 26: EnSPy: Python library for computations of ensembles of particles on GPU

EnSPy: Python library for computations of ensembles of particles on GPU

Example: D5 potential

Overview

Problem:Escape from potential well.

Watched values (trigger):

N(t) – number of particles, remaining in the well at time t

Potential:

UD5 = 2ay2 − x2 + xy2 +x4

4

”Critical” energy: Ecr = ES = 0

Page 27: EnSPy: Python library for computations of ensembles of particles on GPU

EnSPy: Python library for computations of ensembles of particles on GPU

Example: D5 potential

Potential and structure of phase space:

−2 −1 0 1 2x

−2

−1

0

1

2

y

Level lines of D5 potential

2 1 0 1 2

2

1

0

1

2

x

px

Page 28: EnSPy: Python library for computations of ensembles of particles on GPU

EnSPy: Python library for computations of ensembles of particles on GPU

Example: D5 potential

Calculation setup:

”Simple ensemble”

uniform initial distribution of N = 10240 particles inx > 0 ∩ U(x , y) < E

trigger: x = 0→ q0 = 0.

12 lines of simple Python code (examples/d5.py):specification of integration parameters

Page 29: EnSPy: Python library for computations of ensembles of particles on GPU

EnSPy: Python library for computations of ensembles of particles on GPU

Example: D5 potential

Results:

Regular particles are trapped in well → initial ”mixed state” splits

E = 0.1

E = 0.9

0 10 20 300

0.2

0.4

0.6

0.8

1

t

N(t)/N(0)

Page 30: EnSPy: Python library for computations of ensembles of particles on GPU

EnSPy: Python library for computations of ensembles of particles on GPU

Example: Hill problem

Outline1 NVIDIA GPU Architecture and CUDA2 Python and CUDA3 EnSPy functionality4 EnSPy architecture5 Example: D5 potential6 Example: Hill problem7 Example: Hill problem, N–body version8 Performance results9 Future development

Page 31: EnSPy: Python library for computations of ensembles of particles on GPU

EnSPy: Python library for computations of ensembles of particles on GPU

Example: Hill problem

Overview

Problem:Toy model of escape from star cluster: escape of star frompotential of point rotating star cluster Mc and point galaxy coreMg � Mc

Watched values (trigger):

N(t) – number of particles, remaining in cluster at time t

”Potential” in cluster frame of reference (tidal approximation):

UHill = −3ω2x2 − GMc

r2

”Critical” energy: Ecr = ES = −4.5ω2

Page 32: EnSPy: Python library for computations of ensembles of particles on GPU

EnSPy: Python library for computations of ensembles of particles on GPU

Example: Hill problem

Potential:

−1.0 −0.5 0.0 0.5x

−1.0

−0.5

0.0

0.5

y

Hill curves

Page 33: EnSPy: Python library for computations of ensembles of particles on GPU

EnSPy: Python library for computations of ensembles of particles on GPU

Example: Hill problem

Calculation setup:

”Simple ensemble”

uniform initial distribution of N = 10240 particles in|x | < rt ∩ U(x , y) < E

ω = 1√3→ rt = 1

trigger: |x | − rt = 0→ abs(q0) - 1. = 0.

12 lines of simple Python code (examples/hill plain.py):specification of integration parameters

Page 34: EnSPy: Python library for computations of ensembles of particles on GPU

EnSPy: Python library for computations of ensembles of particles on GPU

Example: Hill problem

Results:

Traping of regular particles (some tricky physics here):

0

2 · 103

4 · 103

6 · 103

8 · 103

1 · 104

N(t

)

0 2.5 · 104 5 · 104 7.5 · 104 1 · 105

nt

E = −1.3E = −0.8E = −0.3

Page 35: EnSPy: Python library for computations of ensembles of particles on GPU

EnSPy: Python library for computations of ensembles of particles on GPU

Example: Hill problem, N–body version

Outline1 NVIDIA GPU Architecture and CUDA2 Python and CUDA3 EnSPy functionality4 EnSPy architecture5 Example: D5 potential6 Example: Hill problem7 Example: Hill problem, N–body version8 Performance results9 Future development

Page 36: EnSPy: Python library for computations of ensembles of particles on GPU

EnSPy: Python library for computations of ensembles of particles on GPU

Example: Hill problem, N–body version

Overview

Problem:Simplified model of escape from star cluster: escape of star frompotential of rotating star cluster with total mass Mc and pointpotential of galaxy core with mass Mg � Mc (2D)

Watched values:Configuration of cluster

Potential of galaxy core in cluster frame of reference (tidalapproximation):

UHillNB = −3ω2x2

Page 37: EnSPy: Python library for computations of ensembles of particles on GPU

EnSPy: Python library for computations of ensembles of particles on GPU

Example: Hill problem, N–body version

”Toy” Hill model vs N–body Hill model:

Page 38: EnSPy: Python library for computations of ensembles of particles on GPU

EnSPy: Python library for computations of ensembles of particles on GPU

Example: Hill problem, N–body version

Calculation setup:

N–body ensemble

2D (z = 0) initial distribution of N = 10240 particles insidecircle R with zero initial velocities

14 lines of simple Python code (examples/hill nbody.py):specification of integration parameters

Mc = 1, R = 200, ω = 1√3

Page 39: EnSPy: Python library for computations of ensembles of particles on GPU

EnSPy: Python library for computations of ensembles of particles on GPU

Example: Hill problem, N–body version

Results: cluster configuration

step = 201

−300

−200

−100

0

100

200

300

y

−300 −200 −100 0 100 200 300x

step = 801

−300

−200

−100

0

100

200

300

y

−300 −200 −100 0 100 200 300x

step = 401

−300

−200

−100

0

100

200

300

y

−300 −200 −100 0 100 200 300x

step = 1001

−300

−200

−100

0

100

200

300

y

−300 −200 −100 0 100 200 300x

step = 601

−300

−200

−100

0

100

200

300

y

−300 −200 −100 0 100 200 300x

step = 1201

−300

−200

−100

0

100

200

300

y

−300 −200 −100 0 100 200 300x

Page 40: EnSPy: Python library for computations of ensembles of particles on GPU

EnSPy: Python library for computations of ensembles of particles on GPU

Performance results

Not as good, as it could be – subject to improve. Estimation:∼ 1TFlops on 2x recent Fermi graphic processors

0

10

20

30

40

GF

lop/s

1 · 104 2 · 104 5 · 104 1 · 105 2 · 105

N

GTX260 DP - N–bodyGTX260 DP – ”simple” ensemble

Page 41: EnSPy: Python library for computations of ensembles of particles on GPU

EnSPy: Python library for computations of ensembles of particles on GPU

Future development

Must have features:

MPI: shifting from ”one host–multiple GPUs” to ”multiplehosts–multiple GPUs” environment

individual timesteps for Hermite

tree–codes

Performance improvements:

utilization of texture memory

better load balancing between GPUs