understanding and using atomic memory operations...atomic memory operations lars nyland &...
TRANSCRIPT
![Page 1: Understanding and Using Atomic Memory Operations...Atomic Memory Operations Lars Nyland & Stephen Jones, NVIDIA GTC 2013 . What Is an Atomic Memory Operation? Uninterruptable read-modify-write](https://reader033.vdocument.in/reader033/viewer/2022043001/5f78d6929cdbdd210d1c64db/html5/thumbnails/1.jpg)
Understanding and Using Atomic Memory Operations Lars Nyland & Stephen Jones, NVIDIA GTC 2013
![Page 2: Understanding and Using Atomic Memory Operations...Atomic Memory Operations Lars Nyland & Stephen Jones, NVIDIA GTC 2013 . What Is an Atomic Memory Operation? Uninterruptable read-modify-write](https://reader033.vdocument.in/reader033/viewer/2022043001/5f78d6929cdbdd210d1c64db/html5/thumbnails/2.jpg)
What Is an Atomic Memory Operation?
Uninterruptable read-modify-write memory operation
— Requested by threads
— Updates a value at a specific address
Serializes contentious updates from multiple threads
Enables co-ordination among >1 threads
Limited to specific functions & data sizes
![Page 3: Understanding and Using Atomic Memory Operations...Atomic Memory Operations Lars Nyland & Stephen Jones, NVIDIA GTC 2013 . What Is an Atomic Memory Operation? Uninterruptable read-modify-write](https://reader033.vdocument.in/reader033/viewer/2022043001/5f78d6929cdbdd210d1c64db/html5/thumbnails/3.jpg)
Precise Meaning of atomicAdd()
int atomicAdd(int *p, int v) { int old; exclusive_single_thread { // atomically perform LD; ADD; ST ops old = *p; // Load from memory *p = old + v; // Store after adding v } return old; }
![Page 4: Understanding and Using Atomic Memory Operations...Atomic Memory Operations Lars Nyland & Stephen Jones, NVIDIA GTC 2013 . What Is an Atomic Memory Operation? Uninterruptable read-modify-write](https://reader033.vdocument.in/reader033/viewer/2022043001/5f78d6929cdbdd210d1c64db/html5/thumbnails/4.jpg)
Simple Atomic Example
Addition is a two-step process
x = x + 4.5; x 1.25
![Page 5: Understanding and Using Atomic Memory Operations...Atomic Memory Operations Lars Nyland & Stephen Jones, NVIDIA GTC 2013 . What Is an Atomic Memory Operation? Uninterruptable read-modify-write](https://reader033.vdocument.in/reader033/viewer/2022043001/5f78d6929cdbdd210d1c64db/html5/thumbnails/5.jpg)
Simple Atomic Example
Then write back the new value to memory
x = x + 4.5;
x = r0
1
x 5.75
2
r0 = 1.25 + 4.5;
r0 5.75
![Page 6: Understanding and Using Atomic Memory Operations...Atomic Memory Operations Lars Nyland & Stephen Jones, NVIDIA GTC 2013 . What Is an Atomic Memory Operation? Uninterruptable read-modify-write](https://reader033.vdocument.in/reader033/viewer/2022043001/5f78d6929cdbdd210d1c64db/html5/thumbnails/6.jpg)
Simple Atomic Example
But multi-threaded addition is a problem
x = x - 1.25;
x = x + 8.0;
x = x + 4.5; x 1.25
x = x - 3.1;
x = x + 6.2;
![Page 7: Understanding and Using Atomic Memory Operations...Atomic Memory Operations Lars Nyland & Stephen Jones, NVIDIA GTC 2013 . What Is an Atomic Memory Operation? Uninterruptable read-modify-write](https://reader033.vdocument.in/reader033/viewer/2022043001/5f78d6929cdbdd210d1c64db/html5/thumbnails/7.jpg)
Simple Atomic Example
We want the total sum, but threads operate
independently x = x - 1.25;
x = x - 3.1;
x = x + 6.2;
x = x + 8.0;
x = x + 4.5; x 1.25 r0 0.00
r0 -1.85
r0 9.25
r0 5.75
r0 7.45
![Page 8: Understanding and Using Atomic Memory Operations...Atomic Memory Operations Lars Nyland & Stephen Jones, NVIDIA GTC 2013 . What Is an Atomic Memory Operation? Uninterruptable read-modify-write](https://reader033.vdocument.in/reader033/viewer/2022043001/5f78d6929cdbdd210d1c64db/html5/thumbnails/8.jpg)
Simple Atomic Example
Any thread might write the final result
x = x - 1.25;
x = x - 3.1;
x = x + 6.2;
x = x + 8.0;
x = x + 4.5; x ??? r0 0.00
r0 -1.85
r0 9.25
r0 5.75
r0 7.45
![Page 9: Understanding and Using Atomic Memory Operations...Atomic Memory Operations Lars Nyland & Stephen Jones, NVIDIA GTC 2013 . What Is an Atomic Memory Operation? Uninterruptable read-modify-write](https://reader033.vdocument.in/reader033/viewer/2022043001/5f78d6929cdbdd210d1c64db/html5/thumbnails/9.jpg)
Simple Atomic Example
Result is undetermined because of race between
threads x = x - 1.25;
x = x - 3.1;
x = x + 6.2;
x = x + 8.0;
x = x + 4.5; x -1.85 r0 0.00
r0 -1.85
r0 9.25
r0 5.75
r0 7.45
![Page 10: Understanding and Using Atomic Memory Operations...Atomic Memory Operations Lars Nyland & Stephen Jones, NVIDIA GTC 2013 . What Is an Atomic Memory Operation? Uninterruptable read-modify-write](https://reader033.vdocument.in/reader033/viewer/2022043001/5f78d6929cdbdd210d1c64db/html5/thumbnails/10.jpg)
Simple Atomic Example
Atomic accumulation is consistent
atomicAdd(&x, -1.25);
atomicAdd(&x, -3.1);
atomicAdd(&x, 6.2);
atomicAdd(&x, 8.0);
atomicAdd(&x, 4.5);
x 15.60
r0 4.50
r0 1.40
r0 9.40
r0 5.75
r0 15.60
x 1.25
![Page 11: Understanding and Using Atomic Memory Operations...Atomic Memory Operations Lars Nyland & Stephen Jones, NVIDIA GTC 2013 . What Is an Atomic Memory Operation? Uninterruptable read-modify-write](https://reader033.vdocument.in/reader033/viewer/2022043001/5f78d6929cdbdd210d1c64db/html5/thumbnails/11.jpg)
Common problem: races on read-modify-write of shared data
— Transactions & Data Access Control
Why Use Atomics?
Data
base
Lockin
g &
Exclu
sivit
y
Delete
Merge
Append
![Page 12: Understanding and Using Atomic Memory Operations...Atomic Memory Operations Lars Nyland & Stephen Jones, NVIDIA GTC 2013 . What Is an Atomic Memory Operation? Uninterruptable read-modify-write](https://reader033.vdocument.in/reader033/viewer/2022043001/5f78d6929cdbdd210d1c64db/html5/thumbnails/12.jpg)
Why Use Atomics?
Common problem: races on read-modify-write of shared data
— Transactions & Data Access Control
— Data aggregation & enumeration
Reducti
on
n0 n1 n2 n3 n4 n5 n6 n7 n8 nk ∑ni
![Page 13: Understanding and Using Atomic Memory Operations...Atomic Memory Operations Lars Nyland & Stephen Jones, NVIDIA GTC 2013 . What Is an Atomic Memory Operation? Uninterruptable read-modify-write](https://reader033.vdocument.in/reader033/viewer/2022043001/5f78d6929cdbdd210d1c64db/html5/thumbnails/13.jpg)
Common problem: races on read-modify-write of shared data
— Transactions & Data Access Control
— Data aggregation & enumeration
— Concurrent data structures
Why Use Atomics?
Mult
i-Pro
ducer
Lis
ts &
Queues
Xi Xi+1 Xi+2 Xi+3 Xi+4 Xi+5
Push Xnew
Push Xnew
![Page 14: Understanding and Using Atomic Memory Operations...Atomic Memory Operations Lars Nyland & Stephen Jones, NVIDIA GTC 2013 . What Is an Atomic Memory Operation? Uninterruptable read-modify-write](https://reader033.vdocument.in/reader033/viewer/2022043001/5f78d6929cdbdd210d1c64db/html5/thumbnails/14.jpg)
Compare-and-Swap
int atomicCAS(int *p, int cmp, int v) { exclusive_single_thread { int old = *p; if (cmp == old) *p = v; } return old; }
atomicCAS
exclusive single thread
old == cmp
old = *p;
*p = v;
*p, cmp, v
old
true
false
L2/DRAM
![Page 15: Understanding and Using Atomic Memory Operations...Atomic Memory Operations Lars Nyland & Stephen Jones, NVIDIA GTC 2013 . What Is an Atomic Memory Operation? Uninterruptable read-modify-write](https://reader033.vdocument.in/reader033/viewer/2022043001/5f78d6929cdbdd210d1c64db/html5/thumbnails/15.jpg)
Arithmetic/Logical Atomic Operations
atomicOP
exclusive single thread
old = *p;
*p = old OP v;
*p, v
old
L2/DRAM
Binary Ops: Add, Min, Max And, Or, Xor
int atomicOP(int *p, int v) { exclusive_single_thread { int old = *p; *p = old OP v; } return old; }
![Page 16: Understanding and Using Atomic Memory Operations...Atomic Memory Operations Lars Nyland & Stephen Jones, NVIDIA GTC 2013 . What Is an Atomic Memory Operation? Uninterruptable read-modify-write](https://reader033.vdocument.in/reader033/viewer/2022043001/5f78d6929cdbdd210d1c64db/html5/thumbnails/16.jpg)
Overwriting Atomic Operations
atomicExch
exclusive single thread
old = *p;
*p = v;
*p, v
L2/DRAM
old
int atomicExch(int *p, int v) { exclusive_single_thread { int old = *p; *p = v; } return old; }
![Page 17: Understanding and Using Atomic Memory Operations...Atomic Memory Operations Lars Nyland & Stephen Jones, NVIDIA GTC 2013 . What Is an Atomic Memory Operation? Uninterruptable read-modify-write](https://reader033.vdocument.in/reader033/viewer/2022043001/5f78d6929cdbdd210d1c64db/html5/thumbnails/17.jpg)
Programming Styles using Coordination
1. Locking
2. Lock-free
3. Wait-free
Locking
Lock-free
Wait-free
![Page 18: Understanding and Using Atomic Memory Operations...Atomic Memory Operations Lars Nyland & Stephen Jones, NVIDIA GTC 2013 . What Is an Atomic Memory Operation? Uninterruptable read-modify-write](https://reader033.vdocument.in/reader033/viewer/2022043001/5f78d6929cdbdd210d1c64db/html5/thumbnails/18.jpg)
Locking Style of Programming
All threads try to get the lock
One does
— Does its work
— Releases the lock
![Page 19: Understanding and Using Atomic Memory Operations...Atomic Memory Operations Lars Nyland & Stephen Jones, NVIDIA GTC 2013 . What Is an Atomic Memory Operation? Uninterruptable read-modify-write](https://reader033.vdocument.in/reader033/viewer/2022043001/5f78d6929cdbdd210d1c64db/html5/thumbnails/19.jpg)
Lock-Free Style of Programming
At least one thread always
makes progress
Try to write their result
— On failure, repeat
Usually atomicCAS
— atomicExch, atomicAdd also used
![Page 20: Understanding and Using Atomic Memory Operations...Atomic Memory Operations Lars Nyland & Stephen Jones, NVIDIA GTC 2013 . What Is an Atomic Memory Operation? Uninterruptable read-modify-write](https://reader033.vdocument.in/reader033/viewer/2022043001/5f78d6929cdbdd210d1c64db/html5/thumbnails/20.jpg)
Wait-free Style of Programming
All threads make progress
Each updates memory
atomically
No thread blocked by other
threads
![Page 21: Understanding and Using Atomic Memory Operations...Atomic Memory Operations Lars Nyland & Stephen Jones, NVIDIA GTC 2013 . What Is an Atomic Memory Operation? Uninterruptable read-modify-write](https://reader033.vdocument.in/reader033/viewer/2022043001/5f78d6929cdbdd210d1c64db/html5/thumbnails/21.jpg)
Hardware Managed Memory Update
![Page 22: Understanding and Using Atomic Memory Operations...Atomic Memory Operations Lars Nyland & Stephen Jones, NVIDIA GTC 2013 . What Is an Atomic Memory Operation? Uninterruptable read-modify-write](https://reader033.vdocument.in/reader033/viewer/2022043001/5f78d6929cdbdd210d1c64db/html5/thumbnails/22.jpg)
Atomic Arithmetical Operations
Reducti
on
n0 n1 n2 n3 n4 n5 n6 n7 n8 nk ∑ni
![Page 23: Understanding and Using Atomic Memory Operations...Atomic Memory Operations Lars Nyland & Stephen Jones, NVIDIA GTC 2013 . What Is an Atomic Memory Operation? Uninterruptable read-modify-write](https://reader033.vdocument.in/reader033/viewer/2022043001/5f78d6929cdbdd210d1c64db/html5/thumbnails/23.jpg)
Atomic Arithmetical Operations
∑ni
n0 n1 n2 n3 n4 n5 n6 n7
![Page 24: Understanding and Using Atomic Memory Operations...Atomic Memory Operations Lars Nyland & Stephen Jones, NVIDIA GTC 2013 . What Is an Atomic Memory Operation? Uninterruptable read-modify-write](https://reader033.vdocument.in/reader033/viewer/2022043001/5f78d6929cdbdd210d1c64db/html5/thumbnails/24.jpg)
Atomic Arithmetical Operations
Hierarchical Reduction
∑ni
i01 i23 i45 i67
n0 n1 n2 n3 n4 n5 n6 n7
i0-3 i4-7
Pass 1
Pass 2
Pass 3
![Page 25: Understanding and Using Atomic Memory Operations...Atomic Memory Operations Lars Nyland & Stephen Jones, NVIDIA GTC 2013 . What Is an Atomic Memory Operation? Uninterruptable read-modify-write](https://reader033.vdocument.in/reader033/viewer/2022043001/5f78d6929cdbdd210d1c64db/html5/thumbnails/25.jpg)
Atomic Arithmetical Operations
Atomic Reduction
∑ni
n0 n1 n2 n3 n4 n5 n6 n7
atomicAdd() Single
Pass
![Page 26: Understanding and Using Atomic Memory Operations...Atomic Memory Operations Lars Nyland & Stephen Jones, NVIDIA GTC 2013 . What Is an Atomic Memory Operation? Uninterruptable read-modify-write](https://reader033.vdocument.in/reader033/viewer/2022043001/5f78d6929cdbdd210d1c64db/html5/thumbnails/26.jpg)
Atomic Arithmetical Operations
Atomic Reduction
∑ni
n0 n1 n2 n3 n4 n5 n6 n7
atomicAdd() Single
Pass
![Page 27: Understanding and Using Atomic Memory Operations...Atomic Memory Operations Lars Nyland & Stephen Jones, NVIDIA GTC 2013 . What Is an Atomic Memory Operation? Uninterruptable read-modify-write](https://reader033.vdocument.in/reader033/viewer/2022043001/5f78d6929cdbdd210d1c64db/html5/thumbnails/27.jpg)
Atomic Arithmetical Operations Hierarchical Reduction
∑ni
i01 i23 i45 i67
n0 n1 n2 n3 n4 n5 n6 n7
i0-3 i4-7
Atomic Reduction
∑ni
n0 n1 n2 n3 n4 n5 n6 n7
atomicAdd() 1.00E+00
1.00E+01
1.00E+02
1.00E+03
1.00E+04
1.00E+05
1.00E+06
1.00E+07
1.00E+08
1.00E+09
1.00E+10
Est
imate
d C
locks
Number of items being reduced
Estimated Time For Summation
DRAM load
Same-address atomicAdd
Hierarchical Reduction, NoAtomics
CTA-wide Reduction +Atomic
![Page 28: Understanding and Using Atomic Memory Operations...Atomic Memory Operations Lars Nyland & Stephen Jones, NVIDIA GTC 2013 . What Is an Atomic Memory Operation? Uninterruptable read-modify-write](https://reader033.vdocument.in/reader033/viewer/2022043001/5f78d6929cdbdd210d1c64db/html5/thumbnails/28.jpg)
Same Address
1 per clock
Same Cache Line
Adjacent addresses
Same issuing warp
8 per SM per clock
Scattered
Issued per cache-line
1 per SM per clock
Atomic Access Patterns
![Page 29: Understanding and Using Atomic Memory Operations...Atomic Memory Operations Lars Nyland & Stephen Jones, NVIDIA GTC 2013 . What Is an Atomic Memory Operation? Uninterruptable read-modify-write](https://reader033.vdocument.in/reader033/viewer/2022043001/5f78d6929cdbdd210d1c64db/html5/thumbnails/29.jpg)
Locks & Access Control
Locking guarantees exclusive access to data
Data
base
Lockin
g &
Exclu
sivit
y
Delete
Merge
Append
![Page 30: Understanding and Using Atomic Memory Operations...Atomic Memory Operations Lars Nyland & Stephen Jones, NVIDIA GTC 2013 . What Is an Atomic Memory Operation? Uninterruptable read-modify-write](https://reader033.vdocument.in/reader033/viewer/2022043001/5f78d6929cdbdd210d1c64db/html5/thumbnails/30.jpg)
Locks & Access Control
Data
base
Lockin
g &
Exclu
sivit
y
Delete
Merge
Append
![Page 31: Understanding and Using Atomic Memory Operations...Atomic Memory Operations Lars Nyland & Stephen Jones, NVIDIA GTC 2013 . What Is an Atomic Memory Operation? Uninterruptable read-modify-write](https://reader033.vdocument.in/reader033/viewer/2022043001/5f78d6929cdbdd210d1c64db/html5/thumbnails/31.jpg)
Locks & Access Control
Multi-threaded arithmetic
— Double precision addition
— Simple code is unsafe // Add “val” to “*data”. Return old value. double atomicAdd(double *data, double val) { double old = *data; *data = old + val; return old; }
![Page 32: Understanding and Using Atomic Memory Operations...Atomic Memory Operations Lars Nyland & Stephen Jones, NVIDIA GTC 2013 . What Is an Atomic Memory Operation? Uninterruptable read-modify-write](https://reader033.vdocument.in/reader033/viewer/2022043001/5f78d6929cdbdd210d1c64db/html5/thumbnails/32.jpg)
Locks & Access Control
Multi-threaded arithmetic
— Double precision addition
— Simple code is unsafe
— Add locks to protect
critical section
// Add “val” to “*data”. Return old value. double atomicAdd(double *data, double val) { while(try_lock() == false) ; // Retry lock double old = *data; *data = old + val; unlock(); return old; }
![Page 33: Understanding and Using Atomic Memory Operations...Atomic Memory Operations Lars Nyland & Stephen Jones, NVIDIA GTC 2013 . What Is an Atomic Memory Operation? Uninterruptable read-modify-write](https://reader033.vdocument.in/reader033/viewer/2022043001/5f78d6929cdbdd210d1c64db/html5/thumbnails/33.jpg)
Locks & Access Control
// Add “val” to “*data”. Return old value. double atomicAdd(double *data, double val) { while(try_lock() == false) ; // Retry lock double old = *data; *data = old + val; unlock(); return old; }
int locked = 0; bool try_lock() { if(locked == 0) { locked = 1; return true; } return false; }
![Page 34: Understanding and Using Atomic Memory Operations...Atomic Memory Operations Lars Nyland & Stephen Jones, NVIDIA GTC 2013 . What Is an Atomic Memory Operation? Uninterruptable read-modify-write](https://reader033.vdocument.in/reader033/viewer/2022043001/5f78d6929cdbdd210d1c64db/html5/thumbnails/34.jpg)
Locks & Access Control
// Add “val” to “*data”. Return old value. double atomicAdd(double *data, double val) { while(try_lock() == false) ; // Retry lock double old = *data; *data = old + val; unlock(); return old; }
int locked = 0; bool try_lock() { int prev = atomicExch(&locked, 1); if(prev == 0) return true; return false; }
int atomicExch(int *data, int new)
Atomically set (*data = new), and return
the previous value
![Page 35: Understanding and Using Atomic Memory Operations...Atomic Memory Operations Lars Nyland & Stephen Jones, NVIDIA GTC 2013 . What Is an Atomic Memory Operation? Uninterruptable read-modify-write](https://reader033.vdocument.in/reader033/viewer/2022043001/5f78d6929cdbdd210d1c64db/html5/thumbnails/35.jpg)
Locks & Access Control
Lock-based double precision atomicAdd()
But there’s a problem...
Don’t use this code!
// Add “val” to “*data”. Return old value. double atomicAdd(double *data, double val) { while(atomicExch(&locked, 1) != 0) ; // Retry lock double old = *data; *data = old + val; locked = 0; return old; }
![Page 36: Understanding and Using Atomic Memory Operations...Atomic Memory Operations Lars Nyland & Stephen Jones, NVIDIA GTC 2013 . What Is an Atomic Memory Operation? Uninterruptable read-modify-write](https://reader033.vdocument.in/reader033/viewer/2022043001/5f78d6929cdbdd210d1c64db/html5/thumbnails/36.jpg)
A CUDA warp:
A group of threads (32 on current GPUs) scheduled in lock-step
All threads execute the same line of code
Any thread not participating is idle
Warp of Threads
Locks & Warp Divergence
__device__ void example(bool condition) { if(condition) run_this_first(); else then_run_this(); converged_again(); }
All active
All Active
Others active
Some active
All active
![Page 37: Understanding and Using Atomic Memory Operations...Atomic Memory Operations Lars Nyland & Stephen Jones, NVIDIA GTC 2013 . What Is an Atomic Memory Operation? Uninterruptable read-modify-write](https://reader033.vdocument.in/reader033/viewer/2022043001/5f78d6929cdbdd210d1c64db/html5/thumbnails/37.jpg)
What does this mean for locks?
Only one thread in the warp will lock
We’re okay so long as that’s the thread which continues
Locking thread
continues
Locks & Warp Divergence
Every thread
tries to lock
But only one
succeeds
Unlock
Non-locked
threads idle
until unlock
![Page 38: Understanding and Using Atomic Memory Operations...Atomic Memory Operations Lars Nyland & Stephen Jones, NVIDIA GTC 2013 . What Is an Atomic Memory Operation? Uninterruptable read-modify-write](https://reader033.vdocument.in/reader033/viewer/2022043001/5f78d6929cdbdd210d1c64db/html5/thumbnails/38.jpg)
What does this mean for locks?
BUT: If the wrong thread idles, we deadlock
No way to predict which threads idle
Locks & Warp Divergence
Locking thread
idles
Every thread
tries to lock
But only one
succeeds
Non-locked
threads
retry first
Unlock Never
Happens
![Page 39: Understanding and Using Atomic Memory Operations...Atomic Memory Operations Lars Nyland & Stephen Jones, NVIDIA GTC 2013 . What Is an Atomic Memory Operation? Uninterruptable read-modify-write](https://reader033.vdocument.in/reader033/viewer/2022043001/5f78d6929cdbdd210d1c64db/html5/thumbnails/39.jpg)
Locks & Warp Divergence
Working around divergence deadlock
1. Don’t use locks between threads in a warp
2. Elect one thread to take the lock, then iterate
3. Use a lock-free algorithm...
![Page 40: Understanding and Using Atomic Memory Operations...Atomic Memory Operations Lars Nyland & Stephen Jones, NVIDIA GTC 2013 . What Is an Atomic Memory Operation? Uninterruptable read-modify-write](https://reader033.vdocument.in/reader033/viewer/2022043001/5f78d6929cdbdd210d1c64db/html5/thumbnails/40.jpg)
Lock Free Algorithms: Better Than Locks
Use atomic compare-and-swap to combine read, modify, write
Under contention, exactly one thread is guaranteed to succeed
High throughput - less work in critical section
Only applies if transaction is a single operation
uint64 atomicCAS(uint64 *data, uint64 oldval, uint64 newval);
If “*data” is equal to “oldval”, replace it with “newval” Always returns original value of “*data”
![Page 41: Understanding and Using Atomic Memory Operations...Atomic Memory Operations Lars Nyland & Stephen Jones, NVIDIA GTC 2013 . What Is an Atomic Memory Operation? Uninterruptable read-modify-write](https://reader033.vdocument.in/reader033/viewer/2022043001/5f78d6929cdbdd210d1c64db/html5/thumbnails/41.jpg)
Lock-Free Data Updates
// Add “val” to “*data”. Return old value. double atomicAdd(double *data, double val) { while(atomicExch(&locked, 1) != 0) ; // Retry lock double old = *data; *data = old + val; locked = 0; return old; }
Locking
Try taking lock
Read
Modify
Write
Unlock
Success?
Yes
No
![Page 42: Understanding and Using Atomic Memory Operations...Atomic Memory Operations Lars Nyland & Stephen Jones, NVIDIA GTC 2013 . What Is an Atomic Memory Operation? Uninterruptable read-modify-write](https://reader033.vdocument.in/reader033/viewer/2022043001/5f78d6929cdbdd210d1c64db/html5/thumbnails/42.jpg)
Lock-Free Data Updates
Locking
Try taking lock
Read
Modify
Write
Unlock
Success?
Yes
No
Lock-Free
Generate new
value based on
current data
Swap
success?
No
Done
Compare & Swap
current -> new
![Page 43: Understanding and Using Atomic Memory Operations...Atomic Memory Operations Lars Nyland & Stephen Jones, NVIDIA GTC 2013 . What Is an Atomic Memory Operation? Uninterruptable read-modify-write](https://reader033.vdocument.in/reader033/viewer/2022043001/5f78d6929cdbdd210d1c64db/html5/thumbnails/43.jpg)
Para
llel
Lin
ked L
ists
Lock-Free Parallel Data Structures
Xi Xi+1 Xi+2 Xi+3 Xi+4 Xi+5
![Page 44: Understanding and Using Atomic Memory Operations...Atomic Memory Operations Lars Nyland & Stephen Jones, NVIDIA GTC 2013 . What Is an Atomic Memory Operation? Uninterruptable read-modify-write](https://reader033.vdocument.in/reader033/viewer/2022043001/5f78d6929cdbdd210d1c64db/html5/thumbnails/44.jpg)
Para
llel
Lin
ked L
ists
Lock-Free Parallel Data Structures
Xi Xi+1 Xi+2 Xi+3
Xi+?
![Page 45: Understanding and Using Atomic Memory Operations...Atomic Memory Operations Lars Nyland & Stephen Jones, NVIDIA GTC 2013 . What Is an Atomic Memory Operation? Uninterruptable read-modify-write](https://reader033.vdocument.in/reader033/viewer/2022043001/5f78d6929cdbdd210d1c64db/html5/thumbnails/45.jpg)
Para
llel
Lin
ked L
ists
Lock-Free Parallel Data Structures
Xi Xi+1 Xi+2 Xi+3
Xi+?
![Page 46: Understanding and Using Atomic Memory Operations...Atomic Memory Operations Lars Nyland & Stephen Jones, NVIDIA GTC 2013 . What Is an Atomic Memory Operation? Uninterruptable read-modify-write](https://reader033.vdocument.in/reader033/viewer/2022043001/5f78d6929cdbdd210d1c64db/html5/thumbnails/46.jpg)
Para
llel
Lin
ked L
ists
Lock-Free Parallel Data Structures
Xi Xi+1 Xi+2 Xi+4
Xi+3
1. Read
Old Link
2. Connect
Old Link
3. Link In
New Data
![Page 47: Understanding and Using Atomic Memory Operations...Atomic Memory Operations Lars Nyland & Stephen Jones, NVIDIA GTC 2013 . What Is an Atomic Memory Operation? Uninterruptable read-modify-write](https://reader033.vdocument.in/reader033/viewer/2022043001/5f78d6929cdbdd210d1c64db/html5/thumbnails/47.jpg)
2. Connect
Old Link
Para
llel
Lin
ked L
ists
Lock-Free Parallel Data Structures
Xi Xi+1 Xi+2 Xi+4
Xi+3
1. Read
Old Link
3. Link In
New Data
Read, Modify,
Write Operation
![Page 48: Understanding and Using Atomic Memory Operations...Atomic Memory Operations Lars Nyland & Stephen Jones, NVIDIA GTC 2013 . What Is an Atomic Memory Operation? Uninterruptable read-modify-write](https://reader033.vdocument.in/reader033/viewer/2022043001/5f78d6929cdbdd210d1c64db/html5/thumbnails/48.jpg)
Para
llel
Lin
ked L
ists
Lock-Free Parallel Data Structures
Xi Xi+1 Xi+2 Xi+4 Xi+3
1. Read
Old Link
2. Connect
Old Link 3. Link In
New Data
// Insert node “mine” after node “prev” void insert(ListNode mine, ListNode prev) { ListNode old, link = prev->next; do { old = link; mine->next = old; link = atomicCAS(&prev->next, link, mine); } while(link != old); }
Generate new
value based on
current data
Swap
success?
No
Done
Compare & Swap
current -> new
1
2
3
![Page 49: Understanding and Using Atomic Memory Operations...Atomic Memory Operations Lars Nyland & Stephen Jones, NVIDIA GTC 2013 . What Is an Atomic Memory Operation? Uninterruptable read-modify-write](https://reader033.vdocument.in/reader033/viewer/2022043001/5f78d6929cdbdd210d1c64db/html5/thumbnails/49.jpg)
Para
llel
Lin
ked L
ists
Lock-Free Parallel Data Structures
Xi Xi+1 Xi+2 Xi+4 Xi+3
1. Read
Old Link
2. Connect
Old Link 3. Link In
New Data
// Insert node “mine” after node “prev” void insert(ListNode mine, ListNode prev) { ListNode old, link = prev->next; do { old = link; mine->next = old; link = atomicCAS(&prev->next, link, mine); } while(link != old); }
1
Generate new
value based on
current data
Swap
success?
No
Done
Compare & Swap
current -> new
1
2
3
![Page 50: Understanding and Using Atomic Memory Operations...Atomic Memory Operations Lars Nyland & Stephen Jones, NVIDIA GTC 2013 . What Is an Atomic Memory Operation? Uninterruptable read-modify-write](https://reader033.vdocument.in/reader033/viewer/2022043001/5f78d6929cdbdd210d1c64db/html5/thumbnails/50.jpg)
Para
llel
Lin
ked L
ists
Lock-Free Parallel Data Structures
Xi Xi+1 Xi+2 Xi+4 Xi+3
1. Read
Old Link
2. Connect
Old Link 3. Link In
New Data
// Insert node “mine” after node “prev” void insert(ListNode mine, ListNode prev) { ListNode old, link = prev->next; do { old = link; mine->next = old; link = atomicCAS(&prev->next, link, mine); } while(link != old); }
1
2
Generate new
value based on
current data
Swap
success?
No
Done
Compare & Swap
current -> new
1
2
3
![Page 51: Understanding and Using Atomic Memory Operations...Atomic Memory Operations Lars Nyland & Stephen Jones, NVIDIA GTC 2013 . What Is an Atomic Memory Operation? Uninterruptable read-modify-write](https://reader033.vdocument.in/reader033/viewer/2022043001/5f78d6929cdbdd210d1c64db/html5/thumbnails/51.jpg)
Para
llel
Lin
ked L
ists
Lock-Free Parallel Data Structures
Xi Xi+1 Xi+2 Xi+4 Xi+3
1. Read
Old Link
2. Connect
Old Link 3. Link In
New Data
// Insert node “mine” after node “prev” void insert(ListNode mine, ListNode prev) { ListNode old, link = prev->next; do { old = link; mine->next = old; link = atomicCAS(&prev->next, link, mine); } while(link != old); }
1
2
3
Generate new
value based on
current data
Swap
success?
No
Done
Compare & Swap
current -> new
1
2
3
![Page 52: Understanding and Using Atomic Memory Operations...Atomic Memory Operations Lars Nyland & Stephen Jones, NVIDIA GTC 2013 . What Is an Atomic Memory Operation? Uninterruptable read-modify-write](https://reader033.vdocument.in/reader033/viewer/2022043001/5f78d6929cdbdd210d1c64db/html5/thumbnails/52.jpg)
Para
llel
Lin
ked L
ists
Lock-Free Parallel Data Structures
Xi Xi+1 Xi+2 Xi+4 Xi+3
1. Read
Old Link
2. Connect
Old Link 3. Link In
New Data
// Insert node “mine” after node “prev” void insert(ListNode mine, ListNode prev) { ListNode old, link = prev->next; do { old = link; mine->next = old; link = atomicCAS(&prev->next, link, mine); } while(link != old); }
2
3 1
Generate new
value based on
current data
Swap
success?
No
Done
Compare & Swap
current -> new 1
2
3
![Page 53: Understanding and Using Atomic Memory Operations...Atomic Memory Operations Lars Nyland & Stephen Jones, NVIDIA GTC 2013 . What Is an Atomic Memory Operation? Uninterruptable read-modify-write](https://reader033.vdocument.in/reader033/viewer/2022043001/5f78d6929cdbdd210d1c64db/html5/thumbnails/53.jpg)
Worked Example: Skiplists & Sorting (LN)
Skiplists – hierarchical linked lists, ordered
— O(log n) lookup, insertion, deletion
— Self-balancing with high probability
— Concurrent operations well-defined, relies on atomic-CAS
Sorting strategy
— Use p threads to concurrently insert n items into a single skiplist
![Page 54: Understanding and Using Atomic Memory Operations...Atomic Memory Operations Lars Nyland & Stephen Jones, NVIDIA GTC 2013 . What Is an Atomic Memory Operation? Uninterruptable read-modify-write](https://reader033.vdocument.in/reader033/viewer/2022043001/5f78d6929cdbdd210d1c64db/html5/thumbnails/54.jpg)
Skiplist insertion – bottom level
Set next on new node, using ordinary STore
Swing prev from existing node to new node with CAS
— As long as it still points to the same node…
Skiplist stays legal at all times
Nobody can see upper pointers yet ST
CAS
![Page 55: Understanding and Using Atomic Memory Operations...Atomic Memory Operations Lars Nyland & Stephen Jones, NVIDIA GTC 2013 . What Is an Atomic Memory Operation? Uninterruptable read-modify-write](https://reader033.vdocument.in/reader033/viewer/2022043001/5f78d6929cdbdd210d1c64db/html5/thumbnails/55.jpg)
Skiplist insertion – upper levels
Move up one level; repeat (find, point, swing)
Lots could have changed
— But as long as the pointers are the same when you try to point to
the new node (with CAS), then all is well
![Page 56: Understanding and Using Atomic Memory Operations...Atomic Memory Operations Lars Nyland & Stephen Jones, NVIDIA GTC 2013 . What Is an Atomic Memory Operation? Uninterruptable read-modify-write](https://reader033.vdocument.in/reader033/viewer/2022043001/5f78d6929cdbdd210d1c64db/html5/thumbnails/56.jpg)
Skiplist Sorting Observations
Collisions high at first
— but skiplist doubles in
length every iteration
Collisions diminish
rapidly as N >> p
Performance dominated
by loads, not atomics
— O(n log n) loads
— O(n) atomics
Insertion sort = O(n2) ops
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
Tim
e t
o s
ort
(se
conds)
N, the number of elements to sort
Sorting Time
GTX580 Time
GTX680 Time
K20c time
![Page 57: Understanding and Using Atomic Memory Operations...Atomic Memory Operations Lars Nyland & Stephen Jones, NVIDIA GTC 2013 . What Is an Atomic Memory Operation? Uninterruptable read-modify-write](https://reader033.vdocument.in/reader033/viewer/2022043001/5f78d6929cdbdd210d1c64db/html5/thumbnails/57.jpg)
Conclusions
Atomics allow the creation of much more sophisticated
algorithms that have higher performance
GPU has parallel hardware to execute atomics
AtomicCAS can be used to mimic any coordination primitive
Atomics force serialization
— don’t ask for serialization when you don’t need it
— or, perform concurrent reductions when possible
![Page 58: Understanding and Using Atomic Memory Operations...Atomic Memory Operations Lars Nyland & Stephen Jones, NVIDIA GTC 2013 . What Is an Atomic Memory Operation? Uninterruptable read-modify-write](https://reader033.vdocument.in/reader033/viewer/2022043001/5f78d6929cdbdd210d1c64db/html5/thumbnails/58.jpg)
Thankyou!
![Page 59: Understanding and Using Atomic Memory Operations...Atomic Memory Operations Lars Nyland & Stephen Jones, NVIDIA GTC 2013 . What Is an Atomic Memory Operation? Uninterruptable read-modify-write](https://reader033.vdocument.in/reader033/viewer/2022043001/5f78d6929cdbdd210d1c64db/html5/thumbnails/59.jpg)
Extra Slides
![Page 60: Understanding and Using Atomic Memory Operations...Atomic Memory Operations Lars Nyland & Stephen Jones, NVIDIA GTC 2013 . What Is an Atomic Memory Operation? Uninterruptable read-modify-write](https://reader033.vdocument.in/reader033/viewer/2022043001/5f78d6929cdbdd210d1c64db/html5/thumbnails/60.jpg)
Safe Ways to Lock - none are pretty
__global__ void useLock() { int tid = threadIdx.x % warpSize; // Perform warp operation by // one thread only if(tid == 0) lock(); for(int i=0; i<warpSize; i++) { if(tid == i) do_stuff(); } if(tid == 0) unlock(); }
Serialise per-warp
__global__ void useLock() { int done = 0; while(!done) { // Returns "true" for only // one active thread in warp if(elect_one_thread()) { lock(); do_stuff(); unlock(); done = 1; } } }
Lock per-thread
Both of these require knowledge of warp execution
![Page 61: Understanding and Using Atomic Memory Operations...Atomic Memory Operations Lars Nyland & Stephen Jones, NVIDIA GTC 2013 . What Is an Atomic Memory Operation? Uninterruptable read-modify-write](https://reader033.vdocument.in/reader033/viewer/2022043001/5f78d6929cdbdd210d1c64db/html5/thumbnails/61.jpg)
Lock-Free Data Updates
Lock-Free
Generate new
value based on
current data
Swap
success?
No
Done
Compare & Swap
current -> new
// Add “val” to “*data”. Return old value. double atomicAdd(double *data, double val) { double old, newval, curr = *data; do { // Generate new value from current data old = curr; newval = curr + val; // Attempt to swap old <-> new. curr = atomicCAS(data, old, newval); // Repeat if value has changed in the meantime. } while(curr != old); return ret; }