cuda synchronization - cis.upenn.edudevietti/classes/cis601-spring2017/cuda... · volatile keyword:...

23
CUDA Synchronization

Upload: others

Post on 24-Jun-2020

9 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: CUDA Synchronization - cis.upenn.edudevietti/classes/cis601-spring2017/cuda... · volatile keyword: If a variable located in global or shared memory is declared as volatile, the compiler

CUDA Synchronization

Page 2: CUDA Synchronization - cis.upenn.edudevietti/classes/cis601-spring2017/cuda... · volatile keyword: If a variable located in global or shared memory is declared as volatile, the compiler

atomics

2

Page 3: CUDA Synchronization - cis.upenn.edudevietti/classes/cis601-spring2017/cuda... · volatile keyword: If a variable located in global or shared memory is declared as volatile, the compiler

memory fences

– Robert Frost, “Mending Wall”

“Good fences make good neighbors.”

3

Page 4: CUDA Synchronization - cis.upenn.edudevietti/classes/cis601-spring2017/cuda... · volatile keyword: If a variable located in global or shared memory is declared as volatile, the compiler

without fences

same (lack of) guarantees for reads

4

Page 5: CUDA Synchronization - cis.upenn.edudevietti/classes/cis601-spring2017/cuda... · volatile keyword: If a variable located in global or shared memory is declared as volatile, the compiler

ORLY?

5

Page 6: CUDA Synchronization - cis.upenn.edudevietti/classes/cis601-spring2017/cuda... · volatile keyword: If a variable located in global or shared memory is declared as volatile, the compiler

__threadfence_block

6

Page 7: CUDA Synchronization - cis.upenn.edudevietti/classes/cis601-spring2017/cuda... · volatile keyword: If a variable located in global or shared memory is declared as volatile, the compiler

PTX membar.cta

7

Page 8: CUDA Synchronization - cis.upenn.edudevietti/classes/cis601-spring2017/cuda... · volatile keyword: If a variable located in global or shared memory is declared as volatile, the compiler

__threadfence

8

Page 9: CUDA Synchronization - cis.upenn.edudevietti/classes/cis601-spring2017/cuda... · volatile keyword: If a variable located in global or shared memory is declared as volatile, the compiler

PTX membar.gl

9

Page 10: CUDA Synchronization - cis.upenn.edudevietti/classes/cis601-spring2017/cuda... · volatile keyword: If a variable located in global or shared memory is declared as volatile, the compiler

__threadfence_system

10

Page 11: CUDA Synchronization - cis.upenn.edudevietti/classes/cis601-spring2017/cuda... · volatile keyword: If a variable located in global or shared memory is declared as volatile, the compiler

volatile

11

Page 12: CUDA Synchronization - cis.upenn.edudevietti/classes/cis601-spring2017/cuda... · volatile keyword: If a variable located in global or shared memory is declared as volatile, the compiler

CUDA spinlock?

12

Page 13: CUDA Synchronization - cis.upenn.edudevietti/classes/cis601-spring2017/cuda... · volatile keyword: If a variable located in global or shared memory is declared as volatile, the compiler

– “Robert Frost”, The Thread Not Taken

“Two threads diverged in a CUDA warp,And sorry I had become untwinedfrom my PC by a branch so sharp,

I had to ask Nvidia Corp:the order of paths was undefined.”

13

Page 14: CUDA Synchronization - cis.upenn.edudevietti/classes/cis601-spring2017/cuda... · volatile keyword: If a variable located in global or shared memory is declared as volatile, the compiler

__syncthreads()

14

Page 15: CUDA Synchronization - cis.upenn.edudevietti/classes/cis601-spring2017/cuda... · volatile keyword: If a variable located in global or shared memory is declared as volatile, the compiler

intra-warp synchronization

15

Page 16: CUDA Synchronization - cis.upenn.edudevietti/classes/cis601-spring2017/cuda... · volatile keyword: If a variable located in global or shared memory is declared as volatile, the compiler

<spinlock PTX demo>

Page 17: CUDA Synchronization - cis.upenn.edudevietti/classes/cis601-spring2017/cuda... · volatile keyword: If a variable located in global or shared memory is declared as volatile, the compiler

PTX

❖ virtual ISA for Nvidia GPUs

❖ RISC-like ISA, load-store, 3-operand

❖ destination register is on the left

Page 18: CUDA Synchronization - cis.upenn.edudevietti/classes/cis601-spring2017/cuda... · volatile keyword: If a variable located in global or shared memory is declared as volatile, the compiler

PTX load/store cachingqualifier meaning load store

.ca cache at all levels default

.wb write back caching default

.cg cache at L2 (global cache) yes yes

.cs streaming (mark as LRU) yes yes

.lu last use (read & invalidate) yes

18

Page 19: CUDA Synchronization - cis.upenn.edudevietti/classes/cis601-spring2017/cuda... · volatile keyword: If a variable located in global or shared memory is declared as volatile, the compiler

PTX tidbits

19

Page 20: CUDA Synchronization - cis.upenn.edudevietti/classes/cis601-spring2017/cuda... · volatile keyword: If a variable located in global or shared memory is declared as volatile, the compiler

Homework 2

Page 21: CUDA Synchronization - cis.upenn.edudevietti/classes/cis601-spring2017/cuda... · volatile keyword: If a variable located in global or shared memory is declared as volatile, the compiler

CUDA Kernel Timeout == good

21

Page 22: CUDA Synchronization - cis.upenn.edudevietti/classes/cis601-spring2017/cuda... · volatile keyword: If a variable located in global or shared memory is declared as volatile, the compiler

__managed____device__ __managed__ int d_counter = 0;

void main() { d_counter = 10;

myKernel<<<8,16>>>();

cudaStatus = cudaDeviceSynchronize(); checkCudaErrors(cudaStatus);

printf(“%d”, d_counter); }

22

Page 23: CUDA Synchronization - cis.upenn.edudevietti/classes/cis601-spring2017/cuda... · volatile keyword: If a variable located in global or shared memory is declared as volatile, the compiler

C++ virtual functions

23