code gpu with cuda - cuda introduction

Post on 11-Apr-2017

1.192 Views

Category:

Education

5 Downloads

Preview:

Click to see full reader

TRANSCRIPT

CODE GPU WITH CUDACUDA

INTRODUCTION

Created by Marina Kolpakova ( ) for cuda.geek Itseez

PREVIOUS

OUTLINETerminologyDefinitionProgramming modelExecution modelMemory modelsCUDA kernel

OUT OF SCOPECUDA API overview

TERMINOLOGYDevice

CUDA-capable NVIDIA GPUDevice code

code executed on the deviceHost

x86/x64/arm CPUHost code

code executed on the hostKernel

concrete device function

CUDACUDA is a Compute Unified Device Arhitecture.CUDA includes:

1. Capable GPU hardware and driver2. Device ISA, GPU assembler, Compiler3. C++ based HL language, CUDA Runtime

CUDA defines:programming modelexecution modelmemory model

PROGRAMMING MODELKernel is executed by many threads

PROGRAMMING MODELThreads are grouped into blocks

Each thread has a thread ID

PROGRAMMING MODELThread blocks form an execution grid

Each block has a block ID

EXECUTION (HW MAPPING) MODELSingle thread is executed on core

EXECUTION (HW MAPPING) MODELEach block is executed by one SM and does not migrateNumber of concurrent blocks that can reside on SM depends on available resources

EXECUTION (HW MAPPING) MODELThreads in a block can cooperate via shared memory and synchronizationThere is no hardware support for cooperation between threads from different blocks

EXECUTION (HW MAPPING) MODELOne or multiple (sm_20+) kernels are executed on the device

MEMORY MODELThread has its own registers

MEMORY MODELThread has its own local memory

MEMORY MODELBlock has shared memoryPointer to shared memory is valid while block is resident

_ _ s h a r e d _ _ f l o a t b u f f e r [ C T A _ S I Z E ] ;

MEMORY MODELGrid is able to access global and constant memory

BASIC CUDA KERNELWork for GPU threads represented as kernelkernel represents a task for single thread (scalar notation)Every thread in a particular grid executes the same kernelThreads use their threadIdx and blockIdx to dispatch workKernel function is marked with __global__ keyword

Common kernel structure:1. Retrieving position in grid (widely named tid)2. Loading data form GPU’s memory3. Performing compute work4. Writing back the result into GPU’s memory

_ _ g l o b a l _ _ v o i d k e r n e l ( f l o a t * i n , f l o a t * o u t ){ i n t t i d = b l o c k I d x . x * b l o c k D i m . x + t h r e a d I d x . x ; o u t [ t i d ] = i n [ t i d ] ;}

KERNEL EXECUTIONv o i d e x e c u t e _ k e r n e l ( c o n s t * f l o a t h o s t _ i n , f l o a t * h o s t _ o u t , i n t s i z e ){ f l o a t * d e v i c e _ i n , * d e v i c e _ o u t ; c u d a M a l l o c ( ( v o i d * * ) & d e v i c e _ i n , s i z e * s i z e o f ( f l o a t ) ) ; c u d a M a l l o c ( ( v o i d * * ) & d e v i c e _ o u t , s i z e * s i z e o f ( f l o a t ) ) ;

/ / 1 . U p l o a d d a t a i n t o d e v i c e m e m o r y c u d a M e m c p y ( d e v i c e _ i n , h o s t _ i n , c u d a M e m c p y H o s t T o D e v i c e ) ;

/ / 2 . C o n f i g u r e k e r n e l l a u n c h d i m 3 b l o c k ( 2 5 6 ) ; d i m 3 g r i d ( s i z e / 2 5 6 ) ;

/ / 3 . E x e c u t e k e r n e l k e r n e l < < < g r i d , b l o c k > > > ( d e v i c e _ i n , d e v i c e _ o u t ) ;

/ / 4 . W a i t t i l l c o m p l e t i o n c u d a T h r e a d S y n c h r o n i z e ( ) ;

/ / 5 . D o w n l o a d r e s u l t s i n t o h o s t m e m o r y c u d a M e m c p y ( h o s t _ o u t , d e v i c e _ o u t , c u d a M e m c p y D e v i c e T o H o s t ) ;}

FINAL WORDSCUDA is a set of capable GPU hardware, driver, GPU ISA, GPU assembler, compiler, C++based HL language and runtime which enables programming of NVIDIA GPUCUDA function (kernel) is called on a grid of blocksKernel runs on unified programmable coresKernel is able to access registers and local memory, share memory inside a block ofthreads and access RAM through global, texture and constant memories

top related