python и программирование gpu (Ивашкевич Глеб)

Python and GPU Computing

Glib IvashkevychHPC software developer, GERO Lab

Parallel revolutionThe Free Lunch Is Over: A Fundamental Turn Toward Concurrency in Software

Herb Sutter, March 2005

When serial code hits the wall.Power wall.Now, Intel is embarked on a course already adopted by some of its major

rivals: obtaining more computing power by stamping multiple processors

on a single chip rather than straining to increase the speed of a single

processor.

Paul S. Otellini, Intel's CEO

May 2004

July 2006

Feb 2007

Nov 2008

Intel launches Core 2 Duo (Conroe)

Nvidia releases CUDA SDK

Tsubame, first GPU accelerated supercomputer

Dec 2008 OpenCL 1.0 specification released

Today >50 GPU powered supercomputers in Top500, 9 in Top50

It's very clear, that we are close to the tipping point. If we're not at a tipping point, we're racing at it.

Jen-Hsun Huang, NVIDIA Co-founder and CEOMarch 2013

Heterogeneous computing becomes a standard in HPC

and programming has changed

Heterogeneouscomputing

main memory

GPUmemory

Host Device

CPU GPU

general purpose

sophisticated design

and scheduling

perfect for

task parallelism

highly parallel

huge memory bandwidth

lightweight scheduling

perfect for

data parallelism

Anatomy of GPU:multiprocessors

sharedmemory

GPU is composed of

tens of multiprocessors(streaming processors), which are composed of

tens of cores

= hundreds of cores

ComputeUnifiedDeviceArchitecture

hierarchy of

computation

memory

synchronization

Compute hierarchy

software

kernel

hardwareabstractions

hardware

thread

thread block

grid of blocks

multiprocessor

Compute hierarchy

threadthreadIdx

thread blockblockIdx, blockDim

grid of blocksgridDim

Python

fast development

huge # of packages: for data analysis, linear algebra, special functions etc

metaprogramming

Convenient, but not that fastin number crunching

PyCUDAWrapper package around CUDA API

Convenient abstractions: GPUArray, random numbers generation, reductions & scans etc

Automatic cleanup, initialization and error checking, kernels caching

Completeness

GPUArrayNumPy-like interface for GPU arrays

Convenient creation and manipulation routines

Elementwise operations

Cleanup

SourceModuleAbstraction to create, compile and run GPU code

GPU code to compile is passed as a string

Control over nvcc compiler options

Convenient interface to get kernels

MetaprogrammingGPU code can be created at runtime

PyCUDA uses mako template engine internally

Any template engine is ok to create GPU source code. Remember about codepy

Create more flexible and optimized code

Installationnumpy, mako, CUDA driver & toolkit are required

Boost.Python is optional

Dev packages: if you build from source

Also:PyOpenCl, pyfft

NumbaProAccelerator package for PythonGenerates machine code from Python scalar functions(create ufunc)

from numbapro import vectorize

import numpy as np

@vectorize(['float32(float32, float32)'], target='cpu')

def add2(a, b):

return a + b

X = np.ones((1024), dtype='float32')

Y = 2*np.ones((1024), dtype='float32')

print add(X, Y)

[3., 3., … 3.]

GPU computing resources

Documentation

Intro to Parallel Programmingby David Luebke (Nvidia) and John Owens (UC Davis)

Heterogeneous Parallel Programmingby Wen-mei W. Hwu (UIUC)

Tesla K20/K40 test drivehttp://www.nvidia.ru/object/k40-gpu-test-drive-ru.html

python и программирование gpu (Ивашкевич Глеб)

Education

"swift. Функциональное...

arts square Презентацию подготовили:...

НАУЧНО ТЕХНИЧЕСКИЙ...

back to the future: Функциональное...

Многопоточное программирование...

МАПО 2013 Лекция 03...

Программирование на kotlin, осень...

Функциональное...

Функциональное реактивное...

Рекурсивно логическое...

абилов а зачем нам...

Параллельное программирование,...

А нам-то зачем функциональное...

java.Промышленное...

Параллельное программирование,...

Архитектура и программирование...

Программирование для intel xeon phi

Лекция 8: Многопоточное...

Программирование на...

Параллельное программирование,...