cuda and the memory model (part ii). code executed on gpu

CUDA and the Memory Model (Part II)

Post on 21-Dec-2015

230 views

Category:

Documents

1 download

Report

Download

Tags:

Embed Size (px):

TRANSCRIPT

CUDA and the Memory Model (Part II)

Code executed on GPU

Variable Qualifiers (GPU Code)

Page 4: CUDA and the Memory Model (Part II). Code executed on GPU

CUDA: Features available to kernals

Standard mathematical functions Sinf, powf, atanf, ceil, etc

Built-in vector types Float4, int4, uint4, etc for dimensions 1…4

Texture accesses in kernels Texture<float,2> my_texture // declare texture

reference Float4 texel = texfetch (my_texture, u, v);

Page 5: CUDA and the Memory Model (Part II). Code executed on GPU

Thread Synchronization function

Page 6: CUDA and the Memory Model (Part II). Code executed on GPU

Host Synchronization (for Kalin…)

Page 7: CUDA and the Memory Model (Part II). Code executed on GPU

Thread Blocks must be independent

Page 8: CUDA and the Memory Model (Part II). Code executed on GPU

Thread Blocks must be independent

Page 9: CUDA and the Memory Model (Part II). Code executed on GPU

Example: Increment Array Elements

Page 10: CUDA and the Memory Model (Part II). Code executed on GPU

Example: Increment Array Elements

Page 11: CUDA and the Memory Model (Part II). Code executed on GPU

Example: Host Code

Page 12: CUDA and the Memory Model (Part II). Code executed on GPU

CUDA Error Reporting to CPU

Page 13: CUDA and the Memory Model (Part II). Code executed on GPU

CUDA Event API (someone asked out this….)

Page 14: CUDA and the Memory Model (Part II). Code executed on GPU

Shared Memory

On-chip 2 orders of magnitude lower latency than global memory Order of magnitude higher bandwidth than global memory 16KB per multiprocessor

NVIDIA GPUs contain up to ~ 30 multiprocessors

Allocated per threadblock Accessible by any thread in the threadblock

Not accessible to other threadblocks Several uses:

Sharing data among threads in a threadblock User-managed cache (reducing global memory accesses)

Page 15: CUDA and the Memory Model (Part II). Code executed on GPU

Using Shared Memory

Slide Courtesy of NVIDA: Timo Stich

Page 16: CUDA and the Memory Model (Part II). Code executed on GPU

Using Shared Memory

Page 17: CUDA and the Memory Model (Part II). Code executed on GPU

Using Shared Memory

Page 18: CUDA and the Memory Model (Part II). Code executed on GPU

Using Shared Memory

Page 19: CUDA and the Memory Model (Part II). Code executed on GPU

Thread counts

•More threads per block are better for time slicing- Minimum: 64, Ideal: 192-256

•More threads per block means fewer registers per thread- Kernel invocation may fail if the kernel compiles to more

registers than are available

•Threads within a block can be synchronized- Important for SIMD efficiency

•The maximum threads allowed per grid is 64K^3

Page 20: CUDA and the Memory Model (Part II). Code executed on GPU

Block Counts

•There should be at least as many blocks as multiprocessors- The number of blocks should be at least 100 to

scale to future generations

•Blocks within a grid can not be synchronized

•Blocks can only be swapped by partitioning registers and shared memory among them

GPU Programming Using CUDA

INF5062 – GPU & CUDA

GPU Programming and CUDA

Chapter 18 GPU ( CUDA)

Optimizing CUDA for GPU Architecture - Macalester College · 3.3 CUDA best practices ... When programming in CUDA C we work with blocks of ... Optimizing CUDA for GPU Architecture,

CUDA GPU Computing

GPU Computing and CUDA

GPU Computing with CUDA - Rice Universitycscads.rice.edu/RichardJohson.pdf · GPU Computing with CUDA CScADS Workshop on Automatic Tuning Richard Johnson

Code GPU with CUDA - Memory Subsystem

GPU Architecture and CUDA Programming

GPU: Understanding CUDA

GPGPU programming on example of CUDA - Panoramix - …panoramx.ift.uni.wroc.pl/~maq/cuda/prezentacja-cuda-eng.pdf · CPU GPU CUDA Architecture GPU programming Examples Summary GPGPU

GPU Optimization using CUDA Framework

GPU!#!$%&’(cyy/courses/assembly/08fall/...Computer Organization and Assembly Language Final Project Report GPU Architecture and Programming: From GPGPU to CUDA GPU!"#!"$"%&’( )GPGPU#CUDA

Code gpu with cuda - CUDA introduction

CUDA And GPU Programming - CUDA Teaching Center at UGA

GPU Computing with CUDA

GPU programming: CUDA - | Institut de ... · GPU programming: CUDA Sylvain Collange ... NVIDIA CUDA Khronos OpenCL Microsoft DirectCompute Google RenderScript ... 3 1. 14 Data flow

Building CUDA Photoshop Filters for the GPU - Nvidiadeveloper.download.nvidia.com/.../cuda/Photoshop/... · December 2008 Building CUDA Photoshop Filters for the GPU James Fung and

GPU History CUDA Intro

Introduction To CUDA · GPU and CUDA • Popular – Over 100 million CUDA enabled GPU sold • Easy to program using CUDA – C and C++ Integration – Sizeable computing libraries

Gpu with cuda architecture

GPU Computing with CUDA Lecture 2 - CUDA Memories · GPU Computing with CUDA Lecture 2 - CUDA Memories Christopher Cooper Boston University August, 2011 UTFSM, Valparaíso, Chile

INF5063 – GPU & CUDA

GPU Computing with CUDA Lecture 8 - CUDA Libraries - …GPU Computing with CUDA Lecture 8 - CUDA Libraries - CUFFT, PyCUDA Christopher Cooper Boston University August, 2011 UTFSM,

GPU Computing with CUDA Lecture 6 - CUDA Libraries - Thrust · GPU Computing with CUDA Lecture 6 - CUDA Libraries - Thrust Christopher Cooper Boston University August, 2011 UTFSM,

GPU (Graphics Processing Unit) Programming in CUDANVIDIA CUDA Programming Guide) ... CUDA C OpenCL CUDA Fortran ... GPU Computing Applications. Soluzioni alternative a CUDA per GPU

Gpu workshop cluster universe: scripting cuda

GPU Architecture & CUDA Programming

Gpu, Cuda and Pycuda

CUDA GPU Programming

Lecture 5: GPU Architecture & CUDA Programming

CUDA Without Cuda (CUDA Libraries) - Nvidiadeveloper.download.nvidia.com/CUDA/training/ntrotoCUDALibraries.pdf · CUDA Without Cuda (CUDA Libraries) GPU Computing Webinar 7/16/2011

New Introduction To CUDA · 2010. 1. 27. · GPU and CUDA • Popular – Over 100 million CUDA enabled GPU sold • Easy to program using CUDA – C and C++ Integration – Sizeable

GPU and CUDA - UniFI

cuda and the memory model (part ii). code executed on gpu

Documents

gpu slide

cpu slide

kalin slide

independent slide

host code slide

timo stich slide

thread blocks

memory model