python и программирование gpu (Ивашкевич Глеб)

Post on 14-Jan-2015

283 Views

Category:

Education

12 Downloads

Preview:

Click to see full reader

DESCRIPTION

Ивашкевич Глеб - HPC software developer / Gero / Украина, Харьков Графические процессоры становятся частью стандартного инструментария в высокопроизводительных вычислениях. Одновременно появляются новые и совершенствуются уже существующие программные средства. Мы поговорим об архитектуре графических процессоров Nvidia и о том, как с ними работать из Python. http://www.it-sobytie.ru/events/2040

TRANSCRIPT

Python and GPU Computing

Glib IvashkevychHPC software developer, GERO Lab

Parallel revolutionThe Free Lunch Is Over: A Fundamental Turn Toward Concurrency in Software

Herb Sutter, March 2005

When serial code hits the wall.Power wall.Now, Intel is embarked on a course already adopted by some of its major

rivals: obtaining more computing power by stamping multiple processors

on a single chip rather than straining to increase the speed of a single

processor.

Paul S. Otellini, Intel's CEO

May 2004

July 2006

Feb 2007

Nov 2008

Intel launches Core 2 Duo (Conroe)

Nvidia releases CUDA SDK

Tsubame, first GPU accelerated supercomputer

Dec 2008 OpenCL 1.0 specification released

Today >50 GPU powered supercomputers in Top500, 9 in Top50

It's very clear, that we are close to the tipping point. If we're not at a tipping point, we're racing at it.

Jen-Hsun Huang, NVIDIA Co-founder and CEOMarch 2013

Heterogeneous computing becomes a standard in HPC

and programming has changed

Heterogeneouscomputing

CPU

main memory

GPU

core

s

GPUmemory

mu

ltip

roce

sso

rs

Host Device

CPU GPU

general purpose

sophisticated design

and scheduling

perfect for

task parallelism

highly parallel

huge memory bandwidth

lightweight scheduling

perfect for

data parallelism

Anatomy of GPU:multiprocessors

GPU

MP

sharedmemory

GPU is composed of

tens of multiprocessors(streaming processors), which are composed of

tens of cores

= hundreds of cores

ComputeUnifiedDeviceArchitecture

is a

hierarchy of

computation

memory

synchronization

Compute hierarchy

software

kernel

hardwareabstractions

hardware

thread

thread block

grid of blocks

core

multiprocessor

GPU

Compute hierarchy

threadthreadIdx

thread blockblockIdx, blockDim

grid of blocksgridDim

Python

fast development

huge # of packages: for data analysis, linear algebra, special functions etc

metaprogramming

Convenient, but not that fastin number crunching

PyCUDAWrapper package around CUDA API

Convenient abstractions: GPUArray, random numbers generation, reductions & scans etc

Automatic cleanup, initialization and error checking, kernels caching

Completeness

GPUArrayNumPy-like interface for GPU arrays

Convenient creation and manipulation routines

Elementwise operations

Cleanup

SourceModuleAbstraction to create, compile and run GPU code

GPU code to compile is passed as a string

Control over nvcc compiler options

Convenient interface to get kernels

MetaprogrammingGPU code can be created at runtime

PyCUDA uses mako template engine internally

Any template engine is ok to create GPU source code. Remember about codepy

Create more flexible and optimized code

Installationnumpy, mako, CUDA driver & toolkit are required

Boost.Python is optional

Dev packages: if you build from source

Also:PyOpenCl, pyfft

NumbaProAccelerator package for PythonGenerates machine code from Python scalar functions(create ufunc)

from numbapro import vectorize

import numpy as np

@vectorize(['float32(float32, float32)'], target='cpu')

def add2(a, b):

return a + b

X = np.ones((1024), dtype='float32')

Y = 2*np.ones((1024), dtype='float32')

print add(X, Y)

[3., 3., … 3.]

GPU computing resources

Documentation

Intro to Parallel Programmingby David Luebke (Nvidia) and John Owens (UC Davis)

Heterogeneous Parallel Programmingby Wen-mei W. Hwu (UIUC)

Tesla K20/K40 test drivehttp://www.nvidia.ru/object/k40-gpu-test-drive-ru.html

top related