scientific computing with python and cuda · scientific computing with python and cuda stefan...

88
Scientific Computing with Python and CUDA Stefan Reiterer High Performance Computing Seminar, January 17 2011 Stefan Reiterer () Scientific Computing with Python and CUDA HPC Seminar 1 / 55

Upload: others

Post on 22-Mar-2020

20 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Scientific Computing with Python and CUDA · Scientific Computing with Python and CUDA Stefan Reiterer High Performance Computing Seminar, January 17 2011 ... e.g. Matlab Working

Scientific Computing with Python and CUDA

Stefan Reiterer

High Performance Computing Seminar, January 17 2011

Stefan Reiterer () Scientific Computing with Python and CUDA HPC Seminar 1 / 55

Page 2: Scientific Computing with Python and CUDA · Scientific Computing with Python and CUDA Stefan Reiterer High Performance Computing Seminar, January 17 2011 ... e.g. Matlab Working

Inhalt

1 A short Introduction to Python

2 Scientific Computing tools in Python

3 Easy ways to make Python faster

4 Python + CUDA = PyCUDA

5 Python + MPI = mpi4py

6 ...and the Rest

Stefan Reiterer () Scientific Computing with Python and CUDA HPC Seminar 2 / 55

Page 3: Scientific Computing with Python and CUDA · Scientific Computing with Python and CUDA Stefan Reiterer High Performance Computing Seminar, January 17 2011 ... e.g. Matlab Working

A short Introduction to Python

Inhalt

1 A short Introduction to Python

2 Scientific Computing tools in Python

3 Easy ways to make Python faster

4 Python + CUDA = PyCUDA

5 Python + MPI = mpi4py

6 ...and the Rest

Stefan Reiterer () Scientific Computing with Python and CUDA HPC Seminar 3 / 55

Page 4: Scientific Computing with Python and CUDA · Scientific Computing with Python and CUDA Stefan Reiterer High Performance Computing Seminar, January 17 2011 ... e.g. Matlab Working

A short Introduction to Python

What is Python?

Python is

a high level programming language

interpreted,

object oriented,

named after the "Monty Pythons".

It was invented by Guido van Rossum in the early 90s.

Stefan Reiterer () Scientific Computing with Python and CUDA HPC Seminar 4 / 55

Page 5: Scientific Computing with Python and CUDA · Scientific Computing with Python and CUDA Stefan Reiterer High Performance Computing Seminar, January 17 2011 ... e.g. Matlab Working

A short Introduction to Python

What is Python?

Python is

a high level programming language

interpreted,

object oriented,

named after the "Monty Pythons".

It was invented by Guido van Rossum in the early 90s.

Stefan Reiterer () Scientific Computing with Python and CUDA HPC Seminar 4 / 55

Page 6: Scientific Computing with Python and CUDA · Scientific Computing with Python and CUDA Stefan Reiterer High Performance Computing Seminar, January 17 2011 ... e.g. Matlab Working

A short Introduction to Python

What is Python?

Python is

a high level programming language

interpreted,

object oriented,

named after the "Monty Pythons".

It was invented by Guido van Rossum in the early 90s.

Stefan Reiterer () Scientific Computing with Python and CUDA HPC Seminar 4 / 55

Page 7: Scientific Computing with Python and CUDA · Scientific Computing with Python and CUDA Stefan Reiterer High Performance Computing Seminar, January 17 2011 ... e.g. Matlab Working

A short Introduction to Python

What is Python?

Python is

a high level programming language

interpreted,

object oriented,

named after the "Monty Pythons".

It was invented by Guido van Rossum in the early 90s.

Stefan Reiterer () Scientific Computing with Python and CUDA HPC Seminar 4 / 55

Page 8: Scientific Computing with Python and CUDA · Scientific Computing with Python and CUDA Stefan Reiterer High Performance Computing Seminar, January 17 2011 ... e.g. Matlab Working

A short Introduction to Python

What is Python?

Python is

a high level programming language

interpreted,

object oriented,

named after the "Monty Pythons".

It was invented by Guido van Rossum in the early 90s.

Stefan Reiterer () Scientific Computing with Python and CUDA HPC Seminar 4 / 55

Page 9: Scientific Computing with Python and CUDA · Scientific Computing with Python and CUDA Stefan Reiterer High Performance Computing Seminar, January 17 2011 ... e.g. Matlab Working

A short Introduction to Python

What is Python?

Python is

a high level programming language

interpreted,

object oriented,

named after the "Monty Pythons".

It was invented by Guido van Rossum in the early 90s.

Stefan Reiterer () Scientific Computing with Python and CUDA HPC Seminar 4 / 55

Page 10: Scientific Computing with Python and CUDA · Scientific Computing with Python and CUDA Stefan Reiterer High Performance Computing Seminar, January 17 2011 ... e.g. Matlab Working

A short Introduction to Python

Why Python?

Object Oriented programming

→ reusable code

True strength of Python: Rapid prototyping→ accelerateresearch process, saves time and nerves.

Stefan Reiterer () Scientific Computing with Python and CUDA HPC Seminar 5 / 55

Page 11: Scientific Computing with Python and CUDA · Scientific Computing with Python and CUDA Stefan Reiterer High Performance Computing Seminar, January 17 2011 ... e.g. Matlab Working

A short Introduction to Python

Why Python?

Object Oriented programming→ reusable code

True strength of Python: Rapid prototyping→ accelerateresearch process, saves time and nerves.

Stefan Reiterer () Scientific Computing with Python and CUDA HPC Seminar 5 / 55

Page 12: Scientific Computing with Python and CUDA · Scientific Computing with Python and CUDA Stefan Reiterer High Performance Computing Seminar, January 17 2011 ... e.g. Matlab Working

A short Introduction to Python

Why Python?

Object Oriented programming→ reusable code

True strength of Python: Rapid prototyping

→ accelerateresearch process, saves time and nerves.

Stefan Reiterer () Scientific Computing with Python and CUDA HPC Seminar 5 / 55

Page 13: Scientific Computing with Python and CUDA · Scientific Computing with Python and CUDA Stefan Reiterer High Performance Computing Seminar, January 17 2011 ... e.g. Matlab Working

A short Introduction to Python

Why Python?

Object Oriented programming→ reusable code

True strength of Python: Rapid prototyping→ accelerateresearch process, saves time and nerves.

Stefan Reiterer () Scientific Computing with Python and CUDA HPC Seminar 5 / 55

Page 14: Scientific Computing with Python and CUDA · Scientific Computing with Python and CUDA Stefan Reiterer High Performance Computing Seminar, January 17 2011 ... e.g. Matlab Working

A short Introduction to Python

Classical prototyping process

-

Prototype

Easy SL

e.g. Matlab

Working Code

Fast, but complex PL

e.g. C, C++...

Stefan Reiterer () Scientific Computing with Python and CUDA HPC Seminar 6 / 55

Page 15: Scientific Computing with Python and CUDA · Scientific Computing with Python and CUDA Stefan Reiterer High Performance Computing Seminar, January 17 2011 ... e.g. Matlab Working

A short Introduction to Python

Prototyping with Python

Algorithms(Python)

WorkFlow

(Python)

Criticalcalculations

(Python)

������

������ ��

���

� --�

GUI(Python)

Tests(Python)

Stefan Reiterer () Scientific Computing with Python and CUDA HPC Seminar 7 / 55

Page 16: Scientific Computing with Python and CUDA · Scientific Computing with Python and CUDA Stefan Reiterer High Performance Computing Seminar, January 17 2011 ... e.g. Matlab Working

A short Introduction to Python

Prototyping with Python

Algorithms(Python)

(Compile to C)

WorkFlow

(Python)

Criticalcalculations

(Python)(link C/C++)

������

������ ��

���

� --�

GUI(Python)

Tests(Python)

Stefan Reiterer () Scientific Computing with Python and CUDA HPC Seminar 8 / 55

Page 17: Scientific Computing with Python and CUDA · Scientific Computing with Python and CUDA Stefan Reiterer High Performance Computing Seminar, January 17 2011 ... e.g. Matlab Working

A short Introduction to Python

Don’t fear Python!

Intuitive Syntax

No Allocation

Garbage Collection

Easy to Learn!

Stefan Reiterer () Scientific Computing with Python and CUDA HPC Seminar 9 / 55

Page 18: Scientific Computing with Python and CUDA · Scientific Computing with Python and CUDA Stefan Reiterer High Performance Computing Seminar, January 17 2011 ... e.g. Matlab Working

A short Introduction to Python

Don’t fear Python!

Intuitive Syntax

No Allocation

Garbage Collection

Easy to Learn!

Stefan Reiterer () Scientific Computing with Python and CUDA HPC Seminar 9 / 55

Page 19: Scientific Computing with Python and CUDA · Scientific Computing with Python and CUDA Stefan Reiterer High Performance Computing Seminar, January 17 2011 ... e.g. Matlab Working

A short Introduction to Python

The Goodbye World Program

print ( "GoodbyeWorld! " )

Stefan Reiterer () Scientific Computing with Python and CUDA HPC Seminar 10 / 55

Page 20: Scientific Computing with Python and CUDA · Scientific Computing with Python and CUDA Stefan Reiterer High Performance Computing Seminar, January 17 2011 ... e.g. Matlab Working

A short Introduction to Python

Definition of functions

def function_name(arg1 ,arg2 , arg3 = default ) :result = arg1 + arg2*arg3 #do somethingreturn result

Python recognizes code blocks with intends!Always remember: Don’t mix tabs with spaces!

Stefan Reiterer () Scientific Computing with Python and CUDA HPC Seminar 11 / 55

Page 21: Scientific Computing with Python and CUDA · Scientific Computing with Python and CUDA Stefan Reiterer High Performance Computing Seminar, January 17 2011 ... e.g. Matlab Working

A short Introduction to Python

Definition of functions

def function_name(arg1 ,arg2 , arg3 = default ) :result = arg1 + arg2*arg3 #do somethingreturn result

Python recognizes code blocks with intends!

Always remember: Don’t mix tabs with spaces!

Stefan Reiterer () Scientific Computing with Python and CUDA HPC Seminar 11 / 55

Page 22: Scientific Computing with Python and CUDA · Scientific Computing with Python and CUDA Stefan Reiterer High Performance Computing Seminar, January 17 2011 ... e.g. Matlab Working

A short Introduction to Python

Definition of functions

def function_name(arg1 ,arg2 , arg3 = default ) :result = arg1 + arg2*arg3 #do somethingreturn result

Python recognizes code blocks with intends!Always remember: Don’t mix tabs with spaces!

Stefan Reiterer () Scientific Computing with Python and CUDA HPC Seminar 11 / 55

Page 23: Scientific Computing with Python and CUDA · Scientific Computing with Python and CUDA Stefan Reiterer High Performance Computing Seminar, January 17 2011 ... e.g. Matlab Working

A short Introduction to Python

If statments

i f cond1 is True : #normal i fdo this

e l i f cond2 is True : #else i fdo that

else : #elsedo something_else

Stefan Reiterer () Scientific Computing with Python and CUDA HPC Seminar 12 / 55

Page 24: Scientific Computing with Python and CUDA · Scientific Computing with Python and CUDA Stefan Reiterer High Performance Computing Seminar, January 17 2011 ... e.g. Matlab Working

A short Introduction to Python

while loops

while condition is True :do something

While loops know else too:

while condition is True :do something

else :do something_else

Stefan Reiterer () Scientific Computing with Python and CUDA HPC Seminar 13 / 55

Page 25: Scientific Computing with Python and CUDA · Scientific Computing with Python and CUDA Stefan Reiterer High Performance Computing Seminar, January 17 2011 ... e.g. Matlab Working

A short Introduction to Python

while loops

while condition is True :do something

While loops know else too:

while condition is True :do something

else :do something_else

Stefan Reiterer () Scientific Computing with Python and CUDA HPC Seminar 13 / 55

Page 26: Scientific Computing with Python and CUDA · Scientific Computing with Python and CUDA Stefan Reiterer High Performance Computing Seminar, January 17 2011 ... e.g. Matlab Working

A short Introduction to Python

for loops

for x in range(n) :do something with x

Instead of range one can use any list!

For better performance use xrange.

for loops also have an else statement.

Stefan Reiterer () Scientific Computing with Python and CUDA HPC Seminar 14 / 55

Page 27: Scientific Computing with Python and CUDA · Scientific Computing with Python and CUDA Stefan Reiterer High Performance Computing Seminar, January 17 2011 ... e.g. Matlab Working

A short Introduction to Python

for loops

for x in range(n) :do something with x

Instead of range one can use any list!

For better performance use xrange.

for loops also have an else statement.

Stefan Reiterer () Scientific Computing with Python and CUDA HPC Seminar 14 / 55

Page 28: Scientific Computing with Python and CUDA · Scientific Computing with Python and CUDA Stefan Reiterer High Performance Computing Seminar, January 17 2011 ... e.g. Matlab Working

A short Introduction to Python

for loops

for x in range(n) :do something with x

Instead of range one can use any list!

For better performance use xrange.

for loops also have an else statement.

Stefan Reiterer () Scientific Computing with Python and CUDA HPC Seminar 14 / 55

Page 29: Scientific Computing with Python and CUDA · Scientific Computing with Python and CUDA Stefan Reiterer High Performance Computing Seminar, January 17 2011 ... e.g. Matlab Working

A short Introduction to Python

for loops

for x in range(n) :do something with x

Instead of range one can use any list!

For better performance use xrange.

for loops also have an else statement.

Stefan Reiterer () Scientific Computing with Python and CUDA HPC Seminar 14 / 55

Page 30: Scientific Computing with Python and CUDA · Scientific Computing with Python and CUDA Stefan Reiterer High Performance Computing Seminar, January 17 2011 ... e.g. Matlab Working

A short Introduction to Python

An example...

for n in range(2 , 10):for x in range(2 , n ) :

i f n % x == 0:print n, ’equals ’ , x , ’ * ’ , n/xbreak

else :# loop f e l l through without finding a factorprint n, ’ i s a prime number ’

Stefan Reiterer () Scientific Computing with Python and CUDA HPC Seminar 15 / 55

Page 31: Scientific Computing with Python and CUDA · Scientific Computing with Python and CUDA Stefan Reiterer High Performance Computing Seminar, January 17 2011 ... e.g. Matlab Working

A short Introduction to Python

Output

2 is a prime number3 is a prime number4 equals 2 * 25 is a prime number6 equals 2 * 37 is a prime number8 equals 2 * 49 equals 3 * 3

Stefan Reiterer () Scientific Computing with Python and CUDA HPC Seminar 16 / 55

Page 32: Scientific Computing with Python and CUDA · Scientific Computing with Python and CUDA Stefan Reiterer High Performance Computing Seminar, January 17 2011 ... e.g. Matlab Working

A short Introduction to Python

Classes

In Python everything is a class!

No elementary data types!

Stefan Reiterer () Scientific Computing with Python and CUDA HPC Seminar 17 / 55

Page 33: Scientific Computing with Python and CUDA · Scientific Computing with Python and CUDA Stefan Reiterer High Performance Computing Seminar, January 17 2011 ... e.g. Matlab Working

A short Introduction to Python

Classes

In Python everything is a class!No elementary data types!

Stefan Reiterer () Scientific Computing with Python and CUDA HPC Seminar 17 / 55

Page 34: Scientific Computing with Python and CUDA · Scientific Computing with Python and CUDA Stefan Reiterer High Performance Computing Seminar, January 17 2011 ... e.g. Matlab Working

A short Introduction to Python

Definition of a class

Classes are defined in the same way as functions

class my_class :statement1statement2. . .

__init__ serves as constructor

__del__ serves as destructor

Class methods take the object itself as first argument (asconvention one writes self)

Stefan Reiterer () Scientific Computing with Python and CUDA HPC Seminar 18 / 55

Page 35: Scientific Computing with Python and CUDA · Scientific Computing with Python and CUDA Stefan Reiterer High Performance Computing Seminar, January 17 2011 ... e.g. Matlab Working

A short Introduction to Python

Definition of a class

Classes are defined in the same way as functions

class my_class :statement1statement2. . .

__init__ serves as constructor

__del__ serves as destructor

Class methods take the object itself as first argument (asconvention one writes self)

Stefan Reiterer () Scientific Computing with Python and CUDA HPC Seminar 18 / 55

Page 36: Scientific Computing with Python and CUDA · Scientific Computing with Python and CUDA Stefan Reiterer High Performance Computing Seminar, January 17 2011 ... e.g. Matlab Working

A short Introduction to Python

Definition of a class

Classes are defined in the same way as functions

class my_class :statement1statement2. . .

__init__ serves as constructor

__del__ serves as destructor

Class methods take the object itself as first argument (asconvention one writes self)

Stefan Reiterer () Scientific Computing with Python and CUDA HPC Seminar 18 / 55

Page 37: Scientific Computing with Python and CUDA · Scientific Computing with Python and CUDA Stefan Reiterer High Performance Computing Seminar, January 17 2011 ... e.g. Matlab Working

A short Introduction to Python

Definition of a class

Classes are defined in the same way as functions

class my_class :statement1statement2. . .

__init__ serves as constructor

__del__ serves as destructor

Class methods take the object itself as first argument (asconvention one writes self)

Stefan Reiterer () Scientific Computing with Python and CUDA HPC Seminar 18 / 55

Page 38: Scientific Computing with Python and CUDA · Scientific Computing with Python and CUDA Stefan Reiterer High Performance Computing Seminar, January 17 2011 ... e.g. Matlab Working

A short Introduction to Python

Derivative of classes

Classes are derived simply with

class my_class( inheritence ) :statement1statement2. . .

They hold the same methods and variables as their parents

To overload methods simply add the same method agein.

Stefan Reiterer () Scientific Computing with Python and CUDA HPC Seminar 19 / 55

Page 39: Scientific Computing with Python and CUDA · Scientific Computing with Python and CUDA Stefan Reiterer High Performance Computing Seminar, January 17 2011 ... e.g. Matlab Working

A short Introduction to Python

Derivative of classes

Classes are derived simply with

class my_class( inheritence ) :statement1statement2. . .

They hold the same methods and variables as their parents

To overload methods simply add the same method agein.

Stefan Reiterer () Scientific Computing with Python and CUDA HPC Seminar 19 / 55

Page 40: Scientific Computing with Python and CUDA · Scientific Computing with Python and CUDA Stefan Reiterer High Performance Computing Seminar, January 17 2011 ... e.g. Matlab Working

A short Introduction to Python

Derivative of classes

Classes are derived simply with

class my_class( inheritence ) :statement1statement2. . .

They hold the same methods and variables as their parents

To overload methods simply add the same method agein.

Stefan Reiterer () Scientific Computing with Python and CUDA HPC Seminar 19 / 55

Page 41: Scientific Computing with Python and CUDA · Scientific Computing with Python and CUDA Stefan Reiterer High Performance Computing Seminar, January 17 2011 ... e.g. Matlab Working

A short Introduction to Python

Operator overloading

To overload operators like +,-,*,/ etc simply add the methods__add__, __sub__, __mul__, __div__ ...

Stefan Reiterer () Scientific Computing with Python and CUDA HPC Seminar 20 / 55

Page 42: Scientific Computing with Python and CUDA · Scientific Computing with Python and CUDA Stefan Reiterer High Performance Computing Seminar, January 17 2011 ... e.g. Matlab Working

A short Introduction to Python

Another example:

class band_matrix :def __ in i t __ ( self ,ab) :

se l f . shape = (ab.shape[1] ,ab.shape[1])se l f . data = absel f . band_width = ab.shape[0]se l f . dtype = ab. dtype

def matvec( self ,u ) :i f se l f . shape[0] != u.shape[0] :

raise ValueError ( "Dimension Missmatch! " )return band_matvec( se l f . data ,u)

def __add__ ( self , other ) :return add_band_mat( self , other )

Stefan Reiterer () Scientific Computing with Python and CUDA HPC Seminar 21 / 55

Page 43: Scientific Computing with Python and CUDA · Scientific Computing with Python and CUDA Stefan Reiterer High Performance Computing Seminar, January 17 2011 ... e.g. Matlab Working

Scientific Computing tools in Python

Inhalt

1 A short Introduction to Python

2 Scientific Computing tools in Python

3 Easy ways to make Python faster

4 Python + CUDA = PyCUDA

5 Python + MPI = mpi4py

6 ...and the Rest

Stefan Reiterer () Scientific Computing with Python and CUDA HPC Seminar 22 / 55

Page 44: Scientific Computing with Python and CUDA · Scientific Computing with Python and CUDA Stefan Reiterer High Performance Computing Seminar, January 17 2011 ... e.g. Matlab Working

Scientific Computing tools in Python

NumPy and SciPy

NumPy is for Numerical Python

SciPy is for Scientific Python

Allow Numerical Linear Algebra like in Matlab

Speed: NumPy + SciPy ≈ Matlab

Stefan Reiterer () Scientific Computing with Python and CUDA HPC Seminar 23 / 55

Page 45: Scientific Computing with Python and CUDA · Scientific Computing with Python and CUDA Stefan Reiterer High Performance Computing Seminar, January 17 2011 ... e.g. Matlab Working

Scientific Computing tools in Python

NumPy and SciPy

NumPy is for Numerical Python

SciPy is for Scientific Python

Allow Numerical Linear Algebra like in Matlab

Speed: NumPy + SciPy ≈ Matlab

Stefan Reiterer () Scientific Computing with Python and CUDA HPC Seminar 23 / 55

Page 46: Scientific Computing with Python and CUDA · Scientific Computing with Python and CUDA Stefan Reiterer High Performance Computing Seminar, January 17 2011 ... e.g. Matlab Working

Scientific Computing tools in Python

NumPy and SciPy

NumPy is for Numerical Python

SciPy is for Scientific Python

Allow Numerical Linear Algebra like in Matlab

Speed: NumPy + SciPy ≈ Matlab

Stefan Reiterer () Scientific Computing with Python and CUDA HPC Seminar 23 / 55

Page 47: Scientific Computing with Python and CUDA · Scientific Computing with Python and CUDA Stefan Reiterer High Performance Computing Seminar, January 17 2011 ... e.g. Matlab Working

Scientific Computing tools in Python

Some examples...

Stefan Reiterer () Scientific Computing with Python and CUDA HPC Seminar 24 / 55

Page 48: Scientific Computing with Python and CUDA · Scientific Computing with Python and CUDA Stefan Reiterer High Performance Computing Seminar, January 17 2011 ... e.g. Matlab Working

Easy ways to make Python faster

Inhalt

1 A short Introduction to Python

2 Scientific Computing tools in Python

3 Easy ways to make Python faster

4 Python + CUDA = PyCUDA

5 Python + MPI = mpi4py

6 ...and the Rest

Stefan Reiterer () Scientific Computing with Python and CUDA HPC Seminar 25 / 55

Page 49: Scientific Computing with Python and CUDA · Scientific Computing with Python and CUDA Stefan Reiterer High Performance Computing Seminar, January 17 2011 ... e.g. Matlab Working

Easy ways to make Python faster

Band Matrix vector Multiplication

# −*− coding : utf−8 −*−def band_matvec(A,u ) :

result = zeros (u.shape[0] ,dtype=u. dtype)

for i in xrange(A.shape[1] ) :result [ i ] = A[0 , i ]*u[ i ]

for j in xrange(1 ,A.shape[0] ) :for i in xrange(A.shape[1]− j ) :

result [ i ] += A[ j , i ]*u[ i+j ]result [ i+j ] += A[ j , i ]*u[ i ]

return resultStefan Reiterer () Scientific Computing with Python and CUDA HPC Seminar 26 / 55

Page 50: Scientific Computing with Python and CUDA · Scientific Computing with Python and CUDA Stefan Reiterer High Performance Computing Seminar, January 17 2011 ... e.g. Matlab Working

Easy ways to make Python faster

Write the C-Code inline in Python with Blitz

# −*− coding : utf−8 −*−from numpy import array , zerosfrom scipy .weave import convertersfrom scipy import weave

def band_matvec_inline (A,u ) :

result = zeros (u.shape[0] ,dtype=u. dtype)

N = A.shape[1]B = A.shape[0]

Stefan Reiterer () Scientific Computing with Python and CUDA HPC Seminar 27 / 55

Page 51: Scientific Computing with Python and CUDA · Scientific Computing with Python and CUDA Stefan Reiterer High Performance Computing Seminar, January 17 2011 ... e.g. Matlab Working

Easy ways to make Python faster

Write the C-Code inline in Python with Blitz

code = """for ( int i=0; i < N; i++) {

result ( i ) = A(0 , i )*u( i ) ;}for ( int j=1; j < B; j++) {

for ( int i=0; i < (N−j ) ; i++) {i f ( ( i+j < N) ) {

result ( i ) += A( j , i )*u( j+i ) ;result ( i+j ) += A( j , i )*u( i ) ;

}}

}"""

Stefan Reiterer () Scientific Computing with Python and CUDA HPC Seminar 28 / 55

Page 52: Scientific Computing with Python and CUDA · Scientific Computing with Python and CUDA Stefan Reiterer High Performance Computing Seminar, January 17 2011 ... e.g. Matlab Working

Easy ways to make Python faster

Write the C-Code inline in Python with Blitz

weave. in l ine (code , [ ’u ’ , ’A ’ , ’ result ’ , ’N’ , ’B ’ ] ,type_converters=converters . b l i t z )

return result

≈ 290x speedup to python

Stefan Reiterer () Scientific Computing with Python and CUDA HPC Seminar 29 / 55

Page 53: Scientific Computing with Python and CUDA · Scientific Computing with Python and CUDA Stefan Reiterer High Performance Computing Seminar, January 17 2011 ... e.g. Matlab Working

Easy ways to make Python faster

Write the C-Code inline in Python with Blitz

weave. in l ine (code , [ ’u ’ , ’A ’ , ’ result ’ , ’N’ , ’B ’ ] ,type_converters=converters . b l i t z )

return result

≈ 290x speedup to python

Stefan Reiterer () Scientific Computing with Python and CUDA HPC Seminar 29 / 55

Page 54: Scientific Computing with Python and CUDA · Scientific Computing with Python and CUDA Stefan Reiterer High Performance Computing Seminar, January 17 2011 ... e.g. Matlab Working

Easy ways to make Python faster

Cython

Python produces a lot of overhead with data types

Solution: Declare which data types to use and compile it withCython!

Stefan Reiterer () Scientific Computing with Python and CUDA HPC Seminar 30 / 55

Page 55: Scientific Computing with Python and CUDA · Scientific Computing with Python and CUDA Stefan Reiterer High Performance Computing Seminar, January 17 2011 ... e.g. Matlab Working

Easy ways to make Python faster

Cython

Python produces a lot of overhead with data typesSolution: Declare which data types to use and compile it withCython!

Stefan Reiterer () Scientific Computing with Python and CUDA HPC Seminar 30 / 55

Page 56: Scientific Computing with Python and CUDA · Scientific Computing with Python and CUDA Stefan Reiterer High Performance Computing Seminar, January 17 2011 ... e.g. Matlab Working

Easy ways to make Python faster

Python again

# −*− coding : utf−8 −*−def band_matvec(A,u ) :

result = zeros (u.shape[0] ,dtype=u. dtype)

for i in xrange(A.shape[1] ) :result [ i ] = A[0 , i ]*u[ i ]

for j in xrange(1 ,A.shape[0] ) :for i in xrange(A.shape[1]− j ) :

result [ i ] += A[ j , i ]*u[ i+j ]result [ i+j ] += A[ j , i ]*u[ i ]

return resultStefan Reiterer () Scientific Computing with Python and CUDA HPC Seminar 31 / 55

Page 57: Scientific Computing with Python and CUDA · Scientific Computing with Python and CUDA Stefan Reiterer High Performance Computing Seminar, January 17 2011 ... e.g. Matlab Working

Easy ways to make Python faster

Get more with Cython

import numpy, scipycimport numpy as cnumpy

ctypedef cnumpy. float64_t realsdef matvec_help(cnumpy. ndarray[ reals ,ndim=2] A,

cnumpy. ndarray[ reals ,ndim=1] u) :cdef Py_ssize_t i , jcdef cnumpy. ndarray[ reals ,ndim=1] result = \

numpy. zeros (A.shape[1] ,dtype=A. dtype)for i in xrange(A.shape[1] ) :

result [ i ] = A[0 , i ]*u[ i ]for j in xrange(1 ,A.shape[0] ) :

for i in xrange(A.shape[1]− j ) :result [ i ] = result [ i ] + A[ j , i ]*u[ i+j ]result [ i+j ] = result [ i+j ]+A[ j , i ]*u[ i ]

return resultStefan Reiterer () Scientific Computing with Python and CUDA HPC Seminar 32 / 55

Page 58: Scientific Computing with Python and CUDA · Scientific Computing with Python and CUDA Stefan Reiterer High Performance Computing Seminar, January 17 2011 ... e.g. Matlab Working

Easy ways to make Python faster

...and even more

import numpy, scipycimport numpy as cnumpy, [email protected]( False )ctypedef cnumpy. float64_t realsdef matvec_help(cnumpy. ndarray[ reals ,ndim=2] A,

cnumpy. ndarray[ reals ,ndim=1] u) :cdef Py_ssize_t i , jcdef cnumpy. ndarray[ reals ,ndim=1] result = \

numpy. zeros (A.shape[1] ,dtype=A. dtype)for i in xrange(A.shape[1] ) :

result [ i ] = A[0 , i ]*u[ i ]for j in xrange(1 ,A.shape[0] ) :

for i in xrange(A.shape[1]− j ) :result [ i ] = result [ i ] + A[ j , i ]*u[ i+j ]result [ i+j ] = result [ i+j ]+A[ j , i ]*u[ i ]

return resultStefan Reiterer () Scientific Computing with Python and CUDA HPC Seminar 33 / 55

Page 59: Scientific Computing with Python and CUDA · Scientific Computing with Python and CUDA Stefan Reiterer High Performance Computing Seminar, January 17 2011 ... e.g. Matlab Working

Easy ways to make Python faster

Speedup to python

Cython ≈ 220× speedup

...with boundscheck: ≈ 440× speedup

Stefan Reiterer () Scientific Computing with Python and CUDA HPC Seminar 34 / 55

Page 60: Scientific Computing with Python and CUDA · Scientific Computing with Python and CUDA Stefan Reiterer High Performance Computing Seminar, January 17 2011 ... e.g. Matlab Working

Easy ways to make Python faster

Speedup to python

Cython ≈ 220× speedup

...with boundscheck: ≈ 440× speedup

Stefan Reiterer () Scientific Computing with Python and CUDA HPC Seminar 34 / 55

Page 61: Scientific Computing with Python and CUDA · Scientific Computing with Python and CUDA Stefan Reiterer High Performance Computing Seminar, January 17 2011 ... e.g. Matlab Working

Easy ways to make Python faster

Speedup to python

Cython ≈ 220× speedup

...with boundscheck: ≈ 440× speedup

Stefan Reiterer () Scientific Computing with Python and CUDA HPC Seminar 34 / 55

Page 62: Scientific Computing with Python and CUDA · Scientific Computing with Python and CUDA Stefan Reiterer High Performance Computing Seminar, January 17 2011 ... e.g. Matlab Working

Easy ways to make Python faster

Overview

Language time speedup

Python 2.96 s 1×Matlab 90 ms 30×Pure C 50 ms 60×Cython 12 ms 220×

Inline C++ 10 ms 290×Cython w. boundscheck 6 ms 440×

With linking libraries like Lapack even more is possible!

Stefan Reiterer () Scientific Computing with Python and CUDA HPC Seminar 35 / 55

Page 63: Scientific Computing with Python and CUDA · Scientific Computing with Python and CUDA Stefan Reiterer High Performance Computing Seminar, January 17 2011 ... e.g. Matlab Working

Easy ways to make Python faster

Overview

Language time speedup

Python 2.96 s 1×Matlab 90 ms 30×Pure C 50 ms 60×Cython 12 ms 220×

Inline C++ 10 ms 290×Cython w. boundscheck 6 ms 440×

With linking libraries like Lapack even more is possible!

Stefan Reiterer () Scientific Computing with Python and CUDA HPC Seminar 35 / 55

Page 64: Scientific Computing with Python and CUDA · Scientific Computing with Python and CUDA Stefan Reiterer High Performance Computing Seminar, January 17 2011 ... e.g. Matlab Working

Python + CUDA = PyCUDA

Inhalt

1 A short Introduction to Python

2 Scientific Computing tools in Python

3 Easy ways to make Python faster

4 Python + CUDA = PyCUDA

5 Python + MPI = mpi4py

6 ...and the Rest

Stefan Reiterer () Scientific Computing with Python and CUDA HPC Seminar 36 / 55

Page 65: Scientific Computing with Python and CUDA · Scientific Computing with Python and CUDA Stefan Reiterer High Performance Computing Seminar, January 17 2011 ... e.g. Matlab Working

Python + CUDA = PyCUDA

What is PyCUDA

PyCUDA is a Wrapper in Python to CUDA

Inline CUDA

Garbage Collection

A NumPy like vector class.

Developed by Andreas Klöckner

Stefan Reiterer () Scientific Computing with Python and CUDA HPC Seminar 37 / 55

Page 66: Scientific Computing with Python and CUDA · Scientific Computing with Python and CUDA Stefan Reiterer High Performance Computing Seminar, January 17 2011 ... e.g. Matlab Working

Python + CUDA = PyCUDA

Step 1: Write your Code in CUDA

source = """__global__ void doublify ( f loat *a)

{int idx = threadIdx .x + threadIdx .y*4;a[ idx ] *= 2;

}"""

Stefan Reiterer () Scientific Computing with Python and CUDA HPC Seminar 38 / 55

Page 67: Scientific Computing with Python and CUDA · Scientific Computing with Python and CUDA Stefan Reiterer High Performance Computing Seminar, January 17 2011 ... e.g. Matlab Working

Python + CUDA = PyCUDA

Step 2: Get it into Python

import pycuda. driver as cudaimport pycuda. autoinitfrom pycuda. compiler import SourceModule

mod = SourceModule(source)func = mod. get_function ( "doublify " )

Stefan Reiterer () Scientific Computing with Python and CUDA HPC Seminar 39 / 55

Page 68: Scientific Computing with Python and CUDA · Scientific Computing with Python and CUDA Stefan Reiterer High Performance Computing Seminar, January 17 2011 ... e.g. Matlab Working

Python + CUDA = PyCUDA

Step3: Call it

import numpy#create vectorx = numpy.random. randn(4 ,4)x = x . astype(numpy. float32 )#copy i t on cardx_gpu = cuda.mem_alloc(x . nbytes)cuda.memcpy_htod(x_gpu , x)#ca l l functionfunc (x_gpu , block=(4 ,4 ,1))#get data backx_doubled = numpy. empty_like (x)cuda.memcpy_dtoh(x_doubled , x_gpu)

Stefan Reiterer () Scientific Computing with Python and CUDA HPC Seminar 40 / 55

Page 69: Scientific Computing with Python and CUDA · Scientific Computing with Python and CUDA Stefan Reiterer High Performance Computing Seminar, January 17 2011 ... e.g. Matlab Working

Python + CUDA = PyCUDA

Benefits

You can compile CUDA code on runtime

You can call it in Python

I was able to implement CG algorithm with the different variantswithout rewriting code!

Stefan Reiterer () Scientific Computing with Python and CUDA HPC Seminar 41 / 55

Page 70: Scientific Computing with Python and CUDA · Scientific Computing with Python and CUDA Stefan Reiterer High Performance Computing Seminar, January 17 2011 ... e.g. Matlab Working

Python + CUDA = PyCUDA

Benefits

You can compile CUDA code on runtime

You can call it in Python

I was able to implement CG algorithm with the different variantswithout rewriting code!

Stefan Reiterer () Scientific Computing with Python and CUDA HPC Seminar 41 / 55

Page 71: Scientific Computing with Python and CUDA · Scientific Computing with Python and CUDA Stefan Reiterer High Performance Computing Seminar, January 17 2011 ... e.g. Matlab Working

Python + CUDA = PyCUDA

Benefits

You can compile CUDA code on runtime

You can call it in Python

I was able to implement CG algorithm with the different variantswithout rewriting code!

Stefan Reiterer () Scientific Computing with Python and CUDA HPC Seminar 41 / 55

Page 72: Scientific Computing with Python and CUDA · Scientific Computing with Python and CUDA Stefan Reiterer High Performance Computing Seminar, January 17 2011 ... e.g. Matlab Working

Python + CUDA = PyCUDA

Overview (Band-matrix vector multiplication)

Language time speedup

Python 2.87 s 1×Cython 9.87 ms 270×

Cython w. boundscheck 5.56 ms 500×PyCUDA (on GTX 280) 585 µ s 5000×

Stefan Reiterer () Scientific Computing with Python and CUDA HPC Seminar 42 / 55

Page 73: Scientific Computing with Python and CUDA · Scientific Computing with Python and CUDA Stefan Reiterer High Performance Computing Seminar, January 17 2011 ... e.g. Matlab Working

Python + CUDA = PyCUDA

Overview (Band-matrix vector multiplication)

Language time speedup

Python 2.87 s 1×Cython 9.87 ms 270×

Cython w. boundscheck 5.56 ms 500×PyCUDA (on GTX 280) 585 µ s 5000×

Stefan Reiterer () Scientific Computing with Python and CUDA HPC Seminar 42 / 55

Page 74: Scientific Computing with Python and CUDA · Scientific Computing with Python and CUDA Stefan Reiterer High Performance Computing Seminar, January 17 2011 ... e.g. Matlab Working

Python + MPI = mpi4py

Inhalt

1 A short Introduction to Python

2 Scientific Computing tools in Python

3 Easy ways to make Python faster

4 Python + CUDA = PyCUDA

5 Python + MPI = mpi4py

6 ...and the Rest

Stefan Reiterer () Scientific Computing with Python and CUDA HPC Seminar 43 / 55

Page 75: Scientific Computing with Python and CUDA · Scientific Computing with Python and CUDA Stefan Reiterer High Performance Computing Seminar, January 17 2011 ... e.g. Matlab Working

Python + MPI = mpi4py

Now something completly different... mpi4py

from mpi4py import MPIcomm = MPI .COMM_WORLDprint ( " hello world" )print ( "my rank is : %d"%comm. rank)

Simply call Python with MPI:mpirun -n <# processes> python filename.py

Stefan Reiterer () Scientific Computing with Python and CUDA HPC Seminar 44 / 55

Page 76: Scientific Computing with Python and CUDA · Scientific Computing with Python and CUDA Stefan Reiterer High Performance Computing Seminar, January 17 2011 ... e.g. Matlab Working

Python + MPI = mpi4py

Now something completly different... mpi4py

from mpi4py import MPIcomm = MPI .COMM_WORLDprint ( " hello world" )print ( "my rank is : %d"%comm. rank)

Simply call Python with MPI:mpirun -n <# processes> python filename.py

Stefan Reiterer () Scientific Computing with Python and CUDA HPC Seminar 44 / 55

Page 77: Scientific Computing with Python and CUDA · Scientific Computing with Python and CUDA Stefan Reiterer High Performance Computing Seminar, January 17 2011 ... e.g. Matlab Working

Python + MPI = mpi4py

PyCUDA+mpi4py

from mpi4py import MPIimport pycuda. driver as cudafrom pycuda import gpuarrayfrom numpy import float32 , arrayfrom numpy.random import randn as rand

Stefan Reiterer () Scientific Computing with Python and CUDA HPC Seminar 45 / 55

Page 78: Scientific Computing with Python and CUDA · Scientific Computing with Python and CUDA Stefan Reiterer High Performance Computing Seminar, January 17 2011 ... e.g. Matlab Working

Python + MPI = mpi4py

PyCUDA+mpi4py

comm = MPI .COMM_WORLDrank = comm. rankroot = 0nr_gpus = 4sendbuf = [ ]N = 2**20*nr_gpusK = 1000i f rank == 0:

x = rand(N) . astype( float32)*10**16print "x : " , xcuda. i n i t ( ) #i n i t pycudadriversendbuf = x . reshape(nr_gpus ,N/ nr_gpus)

Stefan Reiterer () Scientific Computing with Python and CUDA HPC Seminar 46 / 55

Page 79: Scientific Computing with Python and CUDA · Scientific Computing with Python and CUDA Stefan Reiterer High Performance Computing Seminar, January 17 2011 ... e.g. Matlab Working

Python + MPI = mpi4py

PyCUDA+mpi4py

i f rank > nr_gpus−1:raise ValueError ( "To few gpus! " )

current_dev = cuda. Device( rank) #device we are working onctx = current_dev .make_context ( ) #make a working contextctx .push( ) #let context make the lead#recieve data and port i t to gpu:x_gpu_part = gpuarray . to_gpu(comm. scatter (sendbuf , root ) )

Stefan Reiterer () Scientific Computing with Python and CUDA HPC Seminar 47 / 55

Page 80: Scientific Computing with Python and CUDA · Scientific Computing with Python and CUDA Stefan Reiterer High Performance Computing Seminar, January 17 2011 ... e.g. Matlab Working

Python + MPI = mpi4py

PyCUDA+mpi4py

#do something . . .for k in xrange(K) :

x_gpu_part = 0.9*x_gpu_part

Stefan Reiterer () Scientific Computing with Python and CUDA HPC Seminar 48 / 55

Page 81: Scientific Computing with Python and CUDA · Scientific Computing with Python and CUDA Stefan Reiterer High Performance Computing Seminar, January 17 2011 ... e.g. Matlab Working

Python + MPI = mpi4py

PyCUDA+mpi4py

#get data back:x_part = (x_gpu_part ) . get ( )ctx .pop( ) #deactivate againctx . detach ( ) #delete i t

Stefan Reiterer () Scientific Computing with Python and CUDA HPC Seminar 49 / 55

Page 82: Scientific Computing with Python and CUDA · Scientific Computing with Python and CUDA Stefan Reiterer High Performance Computing Seminar, January 17 2011 ... e.g. Matlab Working

Python + MPI = mpi4py

PyCUDA+mpi4py

recvbuf=comm. gather ( x_part , root ) #recieve datai f rank == 0:

x_altered= array ( recvbuf ) . reshape(N)print "altered x: " , x_altered

Stefan Reiterer () Scientific Computing with Python and CUDA HPC Seminar 50 / 55

Page 83: Scientific Computing with Python and CUDA · Scientific Computing with Python and CUDA Stefan Reiterer High Performance Computing Seminar, January 17 2011 ... e.g. Matlab Working

Python + MPI = mpi4py

Overview

N K NumPy MPI+Numpy MPI+PyCUDA

217 500 125 ms 4.6 ms 3 s218 1000 636 ms 212 ms 3.7 s219 2000 2.68 s 781 ms 3.84 s220 4000 10.9 s 6.11 s 7.6 s221 8000 39.5 s 24.2 s 11.6 s222 16000 30 min 93.0 s 19.3 s223 32000 ? 382 s 24.1 s224 64000 ? 25 min 48.0 s

Stefan Reiterer () Scientific Computing with Python and CUDA HPC Seminar 51 / 55

Page 84: Scientific Computing with Python and CUDA · Scientific Computing with Python and CUDA Stefan Reiterer High Performance Computing Seminar, January 17 2011 ... e.g. Matlab Working

...and the Rest

Inhalt

1 A short Introduction to Python

2 Scientific Computing tools in Python

3 Easy ways to make Python faster

4 Python + CUDA = PyCUDA

5 Python + MPI = mpi4py

6 ...and the Rest

Stefan Reiterer () Scientific Computing with Python and CUDA HPC Seminar 52 / 55

Page 85: Scientific Computing with Python and CUDA · Scientific Computing with Python and CUDA Stefan Reiterer High Performance Computing Seminar, January 17 2011 ... e.g. Matlab Working

...and the Rest

Where can I get Python?

Everything is OpensourceLinks:

http://www.python.org/

http://www.scipy.org/

http://www.sagemath.org/

http://femhub.org/

http://cython.org/

If you need the latest mpi4py and and PyCUDA packages forSage/Femhub ask me!

Stefan Reiterer () Scientific Computing with Python and CUDA HPC Seminar 53 / 55

Page 86: Scientific Computing with Python and CUDA · Scientific Computing with Python and CUDA Stefan Reiterer High Performance Computing Seminar, January 17 2011 ... e.g. Matlab Working

...and the Rest

Where can I get Python?

Everything is OpensourceLinks:

http://www.python.org/

http://www.scipy.org/

http://www.sagemath.org/

http://femhub.org/

http://cython.org/

If you need the latest mpi4py and and PyCUDA packages forSage/Femhub ask me!

Stefan Reiterer () Scientific Computing with Python and CUDA HPC Seminar 53 / 55

Page 87: Scientific Computing with Python and CUDA · Scientific Computing with Python and CUDA Stefan Reiterer High Performance Computing Seminar, January 17 2011 ... e.g. Matlab Working

...and the Rest

Other interesting things for Python

f2py→ Fortran Code inline

petsc4py→ PETSC interface for Python

PyOpenCL→ like PyCUDA but for OpenCl

matplotlib, Mayavi→ powerful plotting libraries

... and many many more

Stefan Reiterer () Scientific Computing with Python and CUDA HPC Seminar 54 / 55

Page 88: Scientific Computing with Python and CUDA · Scientific Computing with Python and CUDA Stefan Reiterer High Performance Computing Seminar, January 17 2011 ... e.g. Matlab Working

...and the Rest

Questions?

Stefan Reiterer () Scientific Computing with Python and CUDA HPC Seminar 55 / 55