cuda synchronizationdevietti/classes/cis601-spring2017/slide… · i had to ask nvidia corp: ......

23
CUDA Synchronization

Upload: others

Post on 24-Jun-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: CUDA Synchronizationdevietti/classes/cis601-spring2017/slide… · I had to ask Nvidia Corp: ... other threads in the GPU For communication between threads in different CTAs or even

CUDA Synchronization

Page 2: CUDA Synchronizationdevietti/classes/cis601-spring2017/slide… · I had to ask Nvidia Corp: ... other threads in the GPU For communication between threads in different CTAs or even

atomics

2

Page 3: CUDA Synchronizationdevietti/classes/cis601-spring2017/slide… · I had to ask Nvidia Corp: ... other threads in the GPU For communication between threads in different CTAs or even

memory fences

– Robert Frost, “Mending Wall”

“Good fences make good neighbors.”

3

Page 4: CUDA Synchronizationdevietti/classes/cis601-spring2017/slide… · I had to ask Nvidia Corp: ... other threads in the GPU For communication between threads in different CTAs or even

without fences

same (lack of) guarantees for reads

4

Page 5: CUDA Synchronizationdevietti/classes/cis601-spring2017/slide… · I had to ask Nvidia Corp: ... other threads in the GPU For communication between threads in different CTAs or even

ORLY?

5

Page 6: CUDA Synchronizationdevietti/classes/cis601-spring2017/slide… · I had to ask Nvidia Corp: ... other threads in the GPU For communication between threads in different CTAs or even

__threadfence_block

6

Page 7: CUDA Synchronizationdevietti/classes/cis601-spring2017/slide… · I had to ask Nvidia Corp: ... other threads in the GPU For communication between threads in different CTAs or even

PTX membar.cta

7

Page 8: CUDA Synchronizationdevietti/classes/cis601-spring2017/slide… · I had to ask Nvidia Corp: ... other threads in the GPU For communication between threads in different CTAs or even

__threadfence

8

Page 9: CUDA Synchronizationdevietti/classes/cis601-spring2017/slide… · I had to ask Nvidia Corp: ... other threads in the GPU For communication between threads in different CTAs or even

PTX membar.gl

9

Page 10: CUDA Synchronizationdevietti/classes/cis601-spring2017/slide… · I had to ask Nvidia Corp: ... other threads in the GPU For communication between threads in different CTAs or even

__threadfence_system

10

Page 11: CUDA Synchronizationdevietti/classes/cis601-spring2017/slide… · I had to ask Nvidia Corp: ... other threads in the GPU For communication between threads in different CTAs or even

volatile

11

Page 12: CUDA Synchronizationdevietti/classes/cis601-spring2017/slide… · I had to ask Nvidia Corp: ... other threads in the GPU For communication between threads in different CTAs or even

CUDA spinlock?

12

Page 13: CUDA Synchronizationdevietti/classes/cis601-spring2017/slide… · I had to ask Nvidia Corp: ... other threads in the GPU For communication between threads in different CTAs or even

– “Robert Frost”, The Thread Not Taken

“Two threads diverged in a CUDA warp,And sorry I had become untwinedfrom my PC by a branch so sharp,

I had to ask Nvidia Corp:the order of paths was undefined.”

13

Page 14: CUDA Synchronizationdevietti/classes/cis601-spring2017/slide… · I had to ask Nvidia Corp: ... other threads in the GPU For communication between threads in different CTAs or even

__syncthreads()

14

Page 15: CUDA Synchronizationdevietti/classes/cis601-spring2017/slide… · I had to ask Nvidia Corp: ... other threads in the GPU For communication between threads in different CTAs or even

intra-warp synchronization

15

Page 16: CUDA Synchronizationdevietti/classes/cis601-spring2017/slide… · I had to ask Nvidia Corp: ... other threads in the GPU For communication between threads in different CTAs or even

<spinlock PTX demo>

Page 17: CUDA Synchronizationdevietti/classes/cis601-spring2017/slide… · I had to ask Nvidia Corp: ... other threads in the GPU For communication between threads in different CTAs or even

PTX

❖ virtual ISA for Nvidia GPUs

❖ RISC-like ISA, load-store, 3-operand

❖ destination register is on the left

Page 18: CUDA Synchronizationdevietti/classes/cis601-spring2017/slide… · I had to ask Nvidia Corp: ... other threads in the GPU For communication between threads in different CTAs or even

PTX load/store cachingqualifier meaning load store

.ca cache at all levels default

.wb write back caching default

.cg cache at L2 (global cache) yes yes

.cs streaming (mark as LRU) yes yes

.lu last use (read & invalidate) yes

18

Page 19: CUDA Synchronizationdevietti/classes/cis601-spring2017/slide… · I had to ask Nvidia Corp: ... other threads in the GPU For communication between threads in different CTAs or even

PTX tidbits

19

Page 20: CUDA Synchronizationdevietti/classes/cis601-spring2017/slide… · I had to ask Nvidia Corp: ... other threads in the GPU For communication between threads in different CTAs or even

Homework 2

Page 21: CUDA Synchronizationdevietti/classes/cis601-spring2017/slide… · I had to ask Nvidia Corp: ... other threads in the GPU For communication between threads in different CTAs or even

CUDA Kernel Timeout == good

21

Page 22: CUDA Synchronizationdevietti/classes/cis601-spring2017/slide… · I had to ask Nvidia Corp: ... other threads in the GPU For communication between threads in different CTAs or even

__managed____device__ __managed__ int d_counter = 0;

void main() { d_counter = 10;

myKernel<<<8,16>>>();

cudaStatus = cudaDeviceSynchronize(); checkCudaErrors(cudaStatus);

printf(“%d”, d_counter); }

22

Page 23: CUDA Synchronizationdevietti/classes/cis601-spring2017/slide… · I had to ask Nvidia Corp: ... other threads in the GPU For communication between threads in different CTAs or even

C++ virtual functions

23