python и программирование gpu (Ивашкевич Глеб)

18

Click here to load reader

Upload: it-

Post on 14-Jan-2015

283 views

Category:

Education


12 download

DESCRIPTION

Ивашкевич Глеб - HPC software developer / Gero / Украина, Харьков Графические процессоры становятся частью стандартного инструментария в высокопроизводительных вычислениях. Одновременно появляются новые и совершенствуются уже существующие программные средства. Мы поговорим об архитектуре графических процессоров Nvidia и о том, как с ними работать из Python. http://www.it-sobytie.ru/events/2040

TRANSCRIPT

Page 1: Python и программирование GPU (Ивашкевич Глеб)

Python and GPU Computing

Glib IvashkevychHPC software developer, GERO Lab

Page 2: Python и программирование GPU (Ивашкевич Глеб)

Parallel revolutionThe Free Lunch Is Over: A Fundamental Turn Toward Concurrency in Software

Herb Sutter, March 2005

When serial code hits the wall.Power wall.Now, Intel is embarked on a course already adopted by some of its major

rivals: obtaining more computing power by stamping multiple processors

on a single chip rather than straining to increase the speed of a single

processor.

Paul S. Otellini, Intel's CEO

May 2004

Page 3: Python и программирование GPU (Ивашкевич Глеб)

July 2006

Feb 2007

Nov 2008

Intel launches Core 2 Duo (Conroe)

Nvidia releases CUDA SDK

Tsubame, first GPU accelerated supercomputer

Dec 2008 OpenCL 1.0 specification released

Today >50 GPU powered supercomputers in Top500, 9 in Top50

Page 4: Python и программирование GPU (Ивашкевич Глеб)

It's very clear, that we are close to the tipping point. If we're not at a tipping point, we're racing at it.

Jen-Hsun Huang, NVIDIA Co-founder and CEOMarch 2013

Heterogeneous computing becomes a standard in HPC

and programming has changed

Page 5: Python и программирование GPU (Ивашкевич Глеб)

Heterogeneouscomputing

CPU

main memory

GPU

core

s

GPUmemory

mu

ltip

roce

sso

rs

Host Device

Page 6: Python и программирование GPU (Ивашкевич Глеб)

CPU GPU

general purpose

sophisticated design

and scheduling

perfect for

task parallelism

highly parallel

huge memory bandwidth

lightweight scheduling

perfect for

data parallelism

Page 7: Python и программирование GPU (Ивашкевич Глеб)

Anatomy of GPU:multiprocessors

GPU

MP

sharedmemory

GPU is composed of

tens of multiprocessors(streaming processors), which are composed of

tens of cores

= hundreds of cores

Page 8: Python и программирование GPU (Ивашкевич Глеб)

ComputeUnifiedDeviceArchitecture

is a

hierarchy of

computation

memory

synchronization

Page 9: Python и программирование GPU (Ивашкевич Глеб)

Compute hierarchy

software

kernel

hardwareabstractions

hardware

thread

thread block

grid of blocks

core

multiprocessor

GPU

Page 10: Python и программирование GPU (Ивашкевич Глеб)

Compute hierarchy

threadthreadIdx

thread blockblockIdx, blockDim

grid of blocksgridDim

Page 11: Python и программирование GPU (Ивашкевич Глеб)

Python

fast development

huge # of packages: for data analysis, linear algebra, special functions etc

metaprogramming

Convenient, but not that fastin number crunching

Page 12: Python и программирование GPU (Ивашкевич Глеб)

PyCUDAWrapper package around CUDA API

Convenient abstractions: GPUArray, random numbers generation, reductions & scans etc

Automatic cleanup, initialization and error checking, kernels caching

Completeness

Page 13: Python и программирование GPU (Ивашкевич Глеб)

GPUArrayNumPy-like interface for GPU arrays

Convenient creation and manipulation routines

Elementwise operations

Cleanup

Page 14: Python и программирование GPU (Ивашкевич Глеб)

SourceModuleAbstraction to create, compile and run GPU code

GPU code to compile is passed as a string

Control over nvcc compiler options

Convenient interface to get kernels

Page 15: Python и программирование GPU (Ивашкевич Глеб)

MetaprogrammingGPU code can be created at runtime

PyCUDA uses mako template engine internally

Any template engine is ok to create GPU source code. Remember about codepy

Create more flexible and optimized code

Page 16: Python и программирование GPU (Ивашкевич Глеб)

Installationnumpy, mako, CUDA driver & toolkit are required

Boost.Python is optional

Dev packages: if you build from source

Also:PyOpenCl, pyfft

Page 17: Python и программирование GPU (Ивашкевич Глеб)

NumbaProAccelerator package for PythonGenerates machine code from Python scalar functions(create ufunc)

from numbapro import vectorize

import numpy as np

@vectorize(['float32(float32, float32)'], target='cpu')

def add2(a, b):

return a + b

X = np.ones((1024), dtype='float32')

Y = 2*np.ones((1024), dtype='float32')

print add(X, Y)

[3., 3., … 3.]

Page 18: Python и программирование GPU (Ивашкевич Глеб)

GPU computing resources

Documentation

Intro to Parallel Programmingby David Luebke (Nvidia) and John Owens (UC Davis)

Heterogeneous Parallel Programmingby Wen-mei W. Hwu (UIUC)

Tesla K20/K40 test drivehttp://www.nvidia.ru/object/k40-gpu-test-drive-ru.html