python и программирование gpu (Ивашкевич Глеб)
Post on 14-Jan-2015
283 Views
Preview:
DESCRIPTION
TRANSCRIPT
Python and GPU Computing
Glib IvashkevychHPC software developer, GERO Lab
Parallel revolutionThe Free Lunch Is Over: A Fundamental Turn Toward Concurrency in Software
Herb Sutter, March 2005
When serial code hits the wall.Power wall.Now, Intel is embarked on a course already adopted by some of its major
rivals: obtaining more computing power by stamping multiple processors
on a single chip rather than straining to increase the speed of a single
processor.
Paul S. Otellini, Intel's CEO
May 2004
July 2006
Feb 2007
Nov 2008
Intel launches Core 2 Duo (Conroe)
Nvidia releases CUDA SDK
Tsubame, first GPU accelerated supercomputer
Dec 2008 OpenCL 1.0 specification released
Today >50 GPU powered supercomputers in Top500, 9 in Top50
It's very clear, that we are close to the tipping point. If we're not at a tipping point, we're racing at it.
Jen-Hsun Huang, NVIDIA Co-founder and CEOMarch 2013
Heterogeneous computing becomes a standard in HPC
and programming has changed
Heterogeneouscomputing
CPU
main memory
GPU
core
s
GPUmemory
mu
ltip
roce
sso
rs
Host Device
CPU GPU
general purpose
sophisticated design
and scheduling
perfect for
task parallelism
highly parallel
huge memory bandwidth
lightweight scheduling
perfect for
data parallelism
Anatomy of GPU:multiprocessors
GPU
MP
sharedmemory
GPU is composed of
tens of multiprocessors(streaming processors), which are composed of
tens of cores
= hundreds of cores
ComputeUnifiedDeviceArchitecture
is a
hierarchy of
computation
memory
synchronization
Compute hierarchy
software
kernel
hardwareabstractions
hardware
thread
thread block
grid of blocks
core
multiprocessor
GPU
Compute hierarchy
threadthreadIdx
thread blockblockIdx, blockDim
grid of blocksgridDim
Python
fast development
huge # of packages: for data analysis, linear algebra, special functions etc
metaprogramming
Convenient, but not that fastin number crunching
PyCUDAWrapper package around CUDA API
Convenient abstractions: GPUArray, random numbers generation, reductions & scans etc
Automatic cleanup, initialization and error checking, kernels caching
Completeness
GPUArrayNumPy-like interface for GPU arrays
Convenient creation and manipulation routines
Elementwise operations
Cleanup
SourceModuleAbstraction to create, compile and run GPU code
GPU code to compile is passed as a string
Control over nvcc compiler options
Convenient interface to get kernels
MetaprogrammingGPU code can be created at runtime
PyCUDA uses mako template engine internally
Any template engine is ok to create GPU source code. Remember about codepy
Create more flexible and optimized code
Installationnumpy, mako, CUDA driver & toolkit are required
Boost.Python is optional
Dev packages: if you build from source
Also:PyOpenCl, pyfft
NumbaProAccelerator package for PythonGenerates machine code from Python scalar functions(create ufunc)
from numbapro import vectorize
import numpy as np
@vectorize(['float32(float32, float32)'], target='cpu')
def add2(a, b):
return a + b
X = np.ones((1024), dtype='float32')
Y = 2*np.ones((1024), dtype='float32')
print add(X, Y)
[3., 3., … 3.]
GPU computing resources
Documentation
Intro to Parallel Programmingby David Luebke (Nvidia) and John Owens (UC Davis)
Heterogeneous Parallel Programmingby Wen-mei W. Hwu (UIUC)
Tesla K20/K40 test drivehttp://www.nvidia.ru/object/k40-gpu-test-drive-ru.html
top related