code gpu with cuda - cuda introduction

CODE GPU WITH CUDACUDA

INTRODUCTION

Created by Marina Kolpakova ( ) for cuda.geek Itseez

OUT OF SCOPECUDA API overview

TERMINOLOGYDevice

CUDA-capable NVIDIA GPUDevice code

code executed on the deviceHost

x86/x64/arm CPUHost code

code executed on the hostKernel

concrete device function

CUDACUDA is a Compute Unified Device Arhitecture.CUDA includes:

1. Capable GPU hardware and driver2. Device ISA, GPU assembler, Compiler3. C++ based HL language, CUDA Runtime

CUDA defines:programming modelexecution modelmemory model

PROGRAMMING MODELKernel is executed by many threads

PROGRAMMING MODELThreads are grouped into blocks

Each thread has a thread ID

PROGRAMMING MODELThread blocks form an execution grid

Each block has a block ID

EXECUTION (HW MAPPING) MODELSingle thread is executed on core

EXECUTION (HW MAPPING) MODELEach block is executed by one SM and does not migrateNumber of concurrent blocks that can reside on SM depends on available resources

EXECUTION (HW MAPPING) MODELThreads in a block can cooperate via shared memory and synchronizationThere is no hardware support for cooperation between threads from different blocks

EXECUTION (HW MAPPING) MODELOne or multiple (sm_20+) kernels are executed on the device

MEMORY MODELThread has its own registers

MEMORY MODELThread has its own local memory

MEMORY MODELBlock has shared memoryPointer to shared memory is valid while block is resident

_ _ s h a r e d _ _ f l o a t b u f f e r [ C T A _ S I Z E ] ;

MEMORY MODELGrid is able to access global and constant memory

BASIC CUDA KERNELWork for GPU threads represented as kernelkernel represents a task for single thread (scalar notation)Every thread in a particular grid executes the same kernelThreads use their threadIdx and blockIdx to dispatch workKernel function is marked with __global__ keyword

Common kernel structure:1. Retrieving position in grid (widely named tid)2. Loading data form GPU’s memory3. Performing compute work4. Writing back the result into GPU’s memory

_ _ g l o b a l _ _ v o i d k e r n e l ( f l o a t * i n , f l o a t * o u t ){ i n t t i d = b l o c k I d x . x * b l o c k D i m . x + t h r e a d I d x . x ; o u t [ t i d ] = i n [ t i d ] ;}

KERNEL EXECUTIONv o i d e x e c u t e _ k e r n e l ( c o n s t * f l o a t h o s t _ i n , f l o a t * h o s t _ o u t , i n t s i z e ){ f l o a t * d e v i c e _ i n , * d e v i c e _ o u t ; c u d a M a l l o c ( ( v o i d * * ) & d e v i c e _ i n , s i z e * s i z e o f ( f l o a t ) ) ; c u d a M a l l o c ( ( v o i d * * ) & d e v i c e _ o u t , s i z e * s i z e o f ( f l o a t ) ) ;

/ / 1 . U p l o a d d a t a i n t o d e v i c e m e m o r y c u d a M e m c p y ( d e v i c e _ i n , h o s t _ i n , c u d a M e m c p y H o s t T o D e v i c e ) ;

/ / 2 . C o n f i g u r e k e r n e l l a u n c h d i m 3 b l o c k ( 2 5 6 ) ; d i m 3 g r i d ( s i z e / 2 5 6 ) ;

/ / 3 . E x e c u t e k e r n e l k e r n e l < < < g r i d , b l o c k > > > ( d e v i c e _ i n , d e v i c e _ o u t ) ;

/ / 4 . W a i t t i l l c o m p l e t i o n c u d a T h r e a d S y n c h r o n i z e ( ) ;

/ / 5 . D o w n l o a d r e s u l t s i n t o h o s t m e m o r y c u d a M e m c p y ( h o s t _ o u t , d e v i c e _ o u t , c u d a M e m c p y D e v i c e T o H o s t ) ;}

FINAL WORDSCUDA is a set of capable GPU hardware, driver, GPU ISA, GPU assembler, compiler, C++based HL language and runtime which enables programming of NVIDIA GPUCUDA function (kernel) is called on a grid of blocksKernel runs on unified programmable coresKernel is able to access registers and local memory, share memory inside a block ofthreads and access RAM through global, texture and constant memories

THE ENDNEXT

BY / 2013–2015CUDA.GEEK

code gpu with cuda - cuda introduction

Education

gpu architecture & cuda programming

code gpu with cuda - identifying performance limiters

gpu computing with cuda

gpu computing with cuda lecture 8 - cuda libraries - …gpu...

gpu: understanding cuda

gpu coder: automatic cuda and tensorrt code generation...

gpu programming using cuda

gpu history cuda intro

gpu computing and cuda

code gpu with cuda - simt

cuda and gpu programming - cuda teaching center at uga

cuda gpu computing

optimizing cuda for gpu architecture - macalester...

cuda and the memory model (part ii). code executed on gpu

introduction à la programmation gpu -...

cuda gpu programming

cuda-gdb: the nvidia cuda...

gpu programming: cuda

gpu computing with cuda lecture 2 - cuda memories · gpu...

gpgpu programming on example of cuda - panoramix -...