![Page 1: THE CUDA C++ STANDARD LIBRARY · 2019-12-03 · Bryce Adelstein Lelbach CUDA C++ Core Libraries Lead ISO C++ Library Evolution Incubator Chair, ISO C++ Tooling Study Group Chair THE](https://reader031.vdocument.in/reader031/viewer/2022041003/5ea5651ba5a6d775716fa30f/html5/thumbnails/1.jpg)
Bryce Adelstein Lelbach
CUDA C++ Core Libraries Lead
ISO C++ Library Evolution Incubator Chair, ISO C++ Tooling Study Group Chair
THE CUDA C++ STANDARD LIBRARY@blelbach
![Page 2: THE CUDA C++ STANDARD LIBRARY · 2019-12-03 · Bryce Adelstein Lelbach CUDA C++ Core Libraries Lead ISO C++ Library Evolution Incubator Chair, ISO C++ Tooling Study Group Chair THE](https://reader031.vdocument.in/reader031/viewer/2022041003/5ea5651ba5a6d775716fa30f/html5/thumbnails/2.jpg)
ISO C++ == Core Language + Standard Library
2Copyright (C) 2019 NVIDIA
![Page 3: THE CUDA C++ STANDARD LIBRARY · 2019-12-03 · Bryce Adelstein Lelbach CUDA C++ Core Libraries Lead ISO C++ Library Evolution Incubator Chair, ISO C++ Tooling Study Group Chair THE](https://reader031.vdocument.in/reader031/viewer/2022041003/5ea5651ba5a6d775716fa30f/html5/thumbnails/3.jpg)
ISO C++ == Core Language + Standard Library
C++ without a Standard Library is severely diminished.
3Copyright (C) 2019 NVIDIA
![Page 4: THE CUDA C++ STANDARD LIBRARY · 2019-12-03 · Bryce Adelstein Lelbach CUDA C++ Core Libraries Lead ISO C++ Library Evolution Incubator Chair, ISO C++ Tooling Study Group Chair THE](https://reader031.vdocument.in/reader031/viewer/2022041003/5ea5651ba5a6d775716fa30f/html5/thumbnails/4.jpg)
CUDA C++ == Core Language + ???
4Copyright (C) 2019 NVIDIA
![Page 5: THE CUDA C++ STANDARD LIBRARY · 2019-12-03 · Bryce Adelstein Lelbach CUDA C++ Core Libraries Lead ISO C++ Library Evolution Incubator Chair, ISO C++ Tooling Study Group Chair THE](https://reader031.vdocument.in/reader031/viewer/2022041003/5ea5651ba5a6d775716fa30f/html5/thumbnails/5.jpg)
CUDA C++ == Core Language + libcu++
Version 1 in CUDA 10.2!
5Copyright (C) 2019 NVIDIA
![Page 6: THE CUDA C++ STANDARD LIBRARY · 2019-12-03 · Bryce Adelstein Lelbach CUDA C++ Core Libraries Lead ISO C++ Library Evolution Incubator Chair, ISO C++ Tooling Study Group Chair THE](https://reader031.vdocument.in/reader031/viewer/2022041003/5ea5651ba5a6d775716fa30f/html5/thumbnails/6.jpg)
libcu++ is the
opt-in,
heterogeneous,
incremental
CUDA C++ Standard Library.
6Copyright (C) 2019 NVIDIA
![Page 7: THE CUDA C++ STANDARD LIBRARY · 2019-12-03 · Bryce Adelstein Lelbach CUDA C++ Core Libraries Lead ISO C++ Library Evolution Incubator Chair, ISO C++ Tooling Study Group Chair THE](https://reader031.vdocument.in/reader031/viewer/2022041003/5ea5651ba5a6d775716fa30f/html5/thumbnails/7.jpg)
Opt-in
#include <…> ISO C++, __host__ only.
std:: Strictly conforming to ISO C++.
#include <cuda/std/…> CUDA C++, __host__ __device__.
cuda::std:: Strictly conforming to ISO C++.
#include <cuda/…> CUDA C++, __host__ __device__.
cuda:: Conforming extensions to ISO C++.
Does not interfere with or replace your host standard library.
7Copyright (C) 2019 NVIDIA
![Page 8: THE CUDA C++ STANDARD LIBRARY · 2019-12-03 · Bryce Adelstein Lelbach CUDA C++ Core Libraries Lead ISO C++ Library Evolution Incubator Chair, ISO C++ Tooling Study Group Chair THE](https://reader031.vdocument.in/reader031/viewer/2022041003/5ea5651ba5a6d775716fa30f/html5/thumbnails/8.jpg)
Opt-in
#include <atomic>std::atomic<int> x;
#include <cuda/std/atomic>cuda::std::atomic<int> x;
#include <cuda/atomic>cuda::atomic<int, cuda::thread_scope_block> x;
Does not interfere with or replace your host standard library.
8Copyright (C) 2019 NVIDIA
![Page 9: THE CUDA C++ STANDARD LIBRARY · 2019-12-03 · Bryce Adelstein Lelbach CUDA C++ Core Libraries Lead ISO C++ Library Evolution Incubator Chair, ISO C++ Tooling Study Group Chair THE](https://reader031.vdocument.in/reader031/viewer/2022041003/5ea5651ba5a6d775716fa30f/html5/thumbnails/9.jpg)
Heterogeneous
Copyable/Movable objects can migrate between host & device.
Host & device can call all (member) functions.
Host & device can concurrently use synchronization primitives*.
9Copyright (C) 2019 NVIDIA
*: Synchronization primitives must be in managed memory and be declared with cuda::std::thread_scope_system.
![Page 10: THE CUDA C++ STANDARD LIBRARY · 2019-12-03 · Bryce Adelstein Lelbach CUDA C++ Core Libraries Lead ISO C++ Library Evolution Incubator Chair, ISO C++ Tooling Study Group Chair THE](https://reader031.vdocument.in/reader031/viewer/2022041003/5ea5651ba5a6d775716fa30f/html5/thumbnails/10.jpg)
IncrementalNot a complete standard library today; each release will add more.
10Copyright (C) 2019 NVIDIA
Different Optimal Design for Host/Device
Large Effort to Implement for Device
Same Optimal Design for Host/Device
(just add __device__)
std::iostream
std::hashstd::min/std::max
Highest Impactstd::array
std::function
std::atomic
Already Available (in Thrust or CUB)
std::tuple
![Page 11: THE CUDA C++ STANDARD LIBRARY · 2019-12-03 · Bryce Adelstein Lelbach CUDA C++ Core Libraries Lead ISO C++ Library Evolution Incubator Chair, ISO C++ Tooling Study Group Chair THE](https://reader031.vdocument.in/reader031/viewer/2022041003/5ea5651ba5a6d775716fa30f/html5/thumbnails/11.jpg)
IncrementalNot a complete standard library today; each release will add more.
11Copyright (C) 2019 NVIDIA
Different Optimal Design for Host/Device
Difficult to Implement for Device
Same Optimal Design for Host/Device
(just add __device__)
std::iostream
std::hashstd::min/std::max
Highest Impactstd::array
std::function
std::atomic
Already Available (in Thrust or CUB)
std::tuple
We are prioritizing facilities that:Are neither trivial nor difficult to port.Are not already available in CUDA.Have a high impact.
![Page 12: THE CUDA C++ STANDARD LIBRARY · 2019-12-03 · Bryce Adelstein Lelbach CUDA C++ Core Libraries Lead ISO C++ Library Evolution Incubator Chair, ISO C++ Tooling Study Group Chair THE](https://reader031.vdocument.in/reader031/viewer/2022041003/5ea5651ba5a6d775716fa30f/html5/thumbnails/12.jpg)
Based on LLVM’s libc++
Forked from LLVM’s libc++.
License: Apache 2.0 with LLVM Exception.
NVIDIA is already contributing back to the community:
Freestanding atomic<T>: reviews.llvm.org/D56913
C++20 synchronization library: reviews.llvm.org/D68480
12Copyright (C) 2019 NVIDIA
![Page 13: THE CUDA C++ STANDARD LIBRARY · 2019-12-03 · Bryce Adelstein Lelbach CUDA C++ Core Libraries Lead ISO C++ Library Evolution Incubator Chair, ISO C++ Tooling Study Group Chair THE](https://reader031.vdocument.in/reader031/viewer/2022041003/5ea5651ba5a6d775716fa30f/html5/thumbnails/13.jpg)
libcu++ Release Schedule
Version 1 (CUDA 10.2, now): <atomic> (Pascal+), <type_traits>.
Version 2 (1H 2020): atomic<T>::wait/notify, <barrier>, <latch>, <counting_semaphore> (all Volta+), <chrono>, <ratio>, <functional> minus function.
Future priorities: atomic_ref<T>, <complex>, <tuple>, <array>, <utility>, <cmath>, string processing, …
13Copyright (C) 2019 NVIDIA
![Page 14: THE CUDA C++ STANDARD LIBRARY · 2019-12-03 · Bryce Adelstein Lelbach CUDA C++ Core Libraries Lead ISO C++ Library Evolution Incubator Chair, ISO C++ Tooling Study Group Chair THE](https://reader031.vdocument.in/reader031/viewer/2022041003/5ea5651ba5a6d775716fa30f/html5/thumbnails/14.jpg)
14Copyright (C) 2019 NVIDIA
namespace cuda {
enum thread_scope {thread_scope_system, // All threads.thread_scope_device,thread_scope_block
};
template <typename T,thread_scope S = thread_scope_system>
struct atomic;
namespace std {template <typename T>using atomic = cuda::atomic<T>;
} // namespace std
} // namespace cuda
![Page 15: THE CUDA C++ STANDARD LIBRARY · 2019-12-03 · Bryce Adelstein Lelbach CUDA C++ Core Libraries Lead ISO C++ Library Evolution Incubator Chair, ISO C++ Tooling Study Group Chair THE](https://reader031.vdocument.in/reader031/viewer/2022041003/5ea5651ba5a6d775716fa30f/html5/thumbnails/15.jpg)
__host__ __device__void signal_flag(volatile int& flag) {
// ^^^ volatile was "notionally right" for flag in legacy CUDA C++.__threadfence_system(); // <- Should be fused on the operation.// vvv We "cast away" the `volatile` qualifier.atomicExch((int*)&flag, 1); // <- Ideally want an atomic store.
}
15Copyright (C) 2019 NVIDIA
![Page 16: THE CUDA C++ STANDARD LIBRARY · 2019-12-03 · Bryce Adelstein Lelbach CUDA C++ Core Libraries Lead ISO C++ Library Evolution Incubator Chair, ISO C++ Tooling Study Group Chair THE](https://reader031.vdocument.in/reader031/viewer/2022041003/5ea5651ba5a6d775716fa30f/html5/thumbnails/16.jpg)
__host__ __device__void signal_flag(volatile int& flag) {
// ^^^ volatile was "notionally right" for flag in legacy CUDA C++.__threadfence_system(); // <- Should be fused on the operation.// vvv We "cast away" the `volatile` qualifier.flag = 1; // <- "Works" for a store but is UB (volatile != atomic).
}
16Copyright (C) 2019 NVIDIA
![Page 17: THE CUDA C++ STANDARD LIBRARY · 2019-12-03 · Bryce Adelstein Lelbach CUDA C++ Core Libraries Lead ISO C++ Library Evolution Incubator Chair, ISO C++ Tooling Study Group Chair THE](https://reader031.vdocument.in/reader031/viewer/2022041003/5ea5651ba5a6d775716fa30f/html5/thumbnails/17.jpg)
__host__ __device__void signal_flag_better(atomic<bool>& flag) {
flag = true;}
17Copyright (C) 2019 NVIDIA
![Page 18: THE CUDA C++ STANDARD LIBRARY · 2019-12-03 · Bryce Adelstein Lelbach CUDA C++ Core Libraries Lead ISO C++ Library Evolution Incubator Chair, ISO C++ Tooling Study Group Chair THE](https://reader031.vdocument.in/reader031/viewer/2022041003/5ea5651ba5a6d775716fa30f/html5/thumbnails/18.jpg)
__host__ __device__void signal_flag_even_better(atomic<bool>& flag) {
flag.store(true, memory_order_release);}
18Copyright (C) 2019 NVIDIA
![Page 19: THE CUDA C++ STANDARD LIBRARY · 2019-12-03 · Bryce Adelstein Lelbach CUDA C++ Core Libraries Lead ISO C++ Library Evolution Incubator Chair, ISO C++ Tooling Study Group Chair THE](https://reader031.vdocument.in/reader031/viewer/2022041003/5ea5651ba5a6d775716fa30f/html5/thumbnails/19.jpg)
__host__ __device__void signal_flag_excellent(atomic<bool>& flag) {
flag.store(true, memory_order_release);flag.notify_all(); // <- Will make sense later (Version 2).
}
19Copyright (C) 2019 NVIDIA
![Page 20: THE CUDA C++ STANDARD LIBRARY · 2019-12-03 · Bryce Adelstein Lelbach CUDA C++ Core Libraries Lead ISO C++ Library Evolution Incubator Chair, ISO C++ Tooling Study Group Chair THE](https://reader031.vdocument.in/reader031/viewer/2022041003/5ea5651ba5a6d775716fa30f/html5/thumbnails/20.jpg)
__host__ __device__int poll_flag_then_read(volatile int& flag, int& data) {
// ^^^ volatile was "notionally right" for flag in legacy CUDA C++.// vvv We "cast away" the volatile qualifier.while (1 != atomicAdd((int*)&flag, 0)) // <- Should be atomic load.; // <- Spinloop without backoff is bad under contention.
__threadfence_system(); // <- 9 out of 10 of you forget this one!return data; // <- Even if volatile, you still need the fence.
}
20Copyright (C) 2019 NVIDIA
![Page 21: THE CUDA C++ STANDARD LIBRARY · 2019-12-03 · Bryce Adelstein Lelbach CUDA C++ Core Libraries Lead ISO C++ Library Evolution Incubator Chair, ISO C++ Tooling Study Group Chair THE](https://reader031.vdocument.in/reader031/viewer/2022041003/5ea5651ba5a6d775716fa30f/html5/thumbnails/21.jpg)
__host__ __device__int poll_flag_then_read(volatile int& flag, int& data) {
// ^^^ volatile was "notionally right" for flag in legacy CUDA C++.// vvv We "cast away" the volatile qualifier.while (1 != flag) // <- “Works" but is UB (volatile != atomic).; // <- Spinloop without backoff is bad under contention.
__threadfence_system(); // <- 9 out of 10 of you forget this one!return data; // <- Even if volatile, you still need the fence.
}
21Copyright (C) 2019 NVIDIA
![Page 22: THE CUDA C++ STANDARD LIBRARY · 2019-12-03 · Bryce Adelstein Lelbach CUDA C++ Core Libraries Lead ISO C++ Library Evolution Incubator Chair, ISO C++ Tooling Study Group Chair THE](https://reader031.vdocument.in/reader031/viewer/2022041003/5ea5651ba5a6d775716fa30f/html5/thumbnails/22.jpg)
__host__ __device__int poll_flag_then_read_better(atomic<bool>& flag, int& data) {
while (!flag); // <- Spinloop without backoff is bad under contention.
return data;}
22Copyright (C) 2019 NVIDIA
![Page 23: THE CUDA C++ STANDARD LIBRARY · 2019-12-03 · Bryce Adelstein Lelbach CUDA C++ Core Libraries Lead ISO C++ Library Evolution Incubator Chair, ISO C++ Tooling Study Group Chair THE](https://reader031.vdocument.in/reader031/viewer/2022041003/5ea5651ba5a6d775716fa30f/html5/thumbnails/23.jpg)
__host__ __device__int poll_flag_then_read_even_better(atomic<bool>& flag, int& data) {
while (!flag.load(memory_order_acquire)); // <- Spinloop without backoff is bad under contention.
return data;}
23Copyright (C) 2019 NVIDIA
![Page 24: THE CUDA C++ STANDARD LIBRARY · 2019-12-03 · Bryce Adelstein Lelbach CUDA C++ Core Libraries Lead ISO C++ Library Evolution Incubator Chair, ISO C++ Tooling Study Group Chair THE](https://reader031.vdocument.in/reader031/viewer/2022041003/5ea5651ba5a6d775716fa30f/html5/thumbnails/24.jpg)
__host__ __device__int poll_flag_then_read_excellent(atomic<bool>& flag, int& data) {
flag.wait(false, memory_order_acquire); // Version 2.// ^^^ Backoff to mitigate heavy contention.return data;
}
24Copyright (C) 2019 NVIDIA
![Page 25: THE CUDA C++ STANDARD LIBRARY · 2019-12-03 · Bryce Adelstein Lelbach CUDA C++ Core Libraries Lead ISO C++ Library Evolution Incubator Chair, ISO C++ Tooling Study Group Chair THE](https://reader031.vdocument.in/reader031/viewer/2022041003/5ea5651ba5a6d775716fa30f/html5/thumbnails/25.jpg)
// Mixing scopes can be a messy error; we prevent it at compile time.__host__ __device__ void foo() {
atomic<bool> s_flag;signal_flag(s_flag); // Ok; expects and got system atomic type.
atomic<bool, thread_scope_device> d_flag;signal_flag(d_flag); // Compile error; expects system atomic type.
}
25Copyright (C) 2019 NVIDIA
![Page 26: THE CUDA C++ STANDARD LIBRARY · 2019-12-03 · Bryce Adelstein Lelbach CUDA C++ Core Libraries Lead ISO C++ Library Evolution Incubator Chair, ISO C++ Tooling Study Group Chair THE](https://reader031.vdocument.in/reader031/viewer/2022041003/5ea5651ba5a6d775716fa30f/html5/thumbnails/26.jpg)
// Writing __host__ __device__ functions today is nearly impossible.__host__ __device__ void bar(volatile int& a) {
#ifdef __CUDA_ARCH__atomicAdd((int*)&a, 1);
#else// What do I write here for all the CPUs & compilers I support?
#endif}
26Copyright (C) 2019 NVIDIA
![Page 27: THE CUDA C++ STANDARD LIBRARY · 2019-12-03 · Bryce Adelstein Lelbach CUDA C++ Core Libraries Lead ISO C++ Library Evolution Incubator Chair, ISO C++ Tooling Study Group Chair THE](https://reader031.vdocument.in/reader031/viewer/2022041003/5ea5651ba5a6d775716fa30f/html5/thumbnails/27.jpg)
__host__ __device__ void bar_better(atomic<int>& a) {a += 1;
}
27Copyright (C) 2019 NVIDIA
![Page 28: THE CUDA C++ STANDARD LIBRARY · 2019-12-03 · Bryce Adelstein Lelbach CUDA C++ Core Libraries Lead ISO C++ Library Evolution Incubator Chair, ISO C++ Tooling Study Group Chair THE](https://reader031.vdocument.in/reader031/viewer/2022041003/5ea5651ba5a6d775716fa30f/html5/thumbnails/28.jpg)
__host__ __device__ void bar_even_better(atomic<int>& a) {a.fetch_add(1, memory_order_relaxed);
}
28Copyright (C) 2019 NVIDIA
![Page 29: THE CUDA C++ STANDARD LIBRARY · 2019-12-03 · Bryce Adelstein Lelbach CUDA C++ Core Libraries Lead ISO C++ Library Evolution Incubator Chair, ISO C++ Tooling Study Group Chair THE](https://reader031.vdocument.in/reader031/viewer/2022041003/5ea5651ba5a6d775716fa30f/html5/thumbnails/29.jpg)
Stop Using Legacy Atomics (atomic[A-Z]*):
Sequential consistency & acquire/release are not first-class.
Device-only.
Memory scope is a property of operations not objects.
Atomicity is a property of operations not objects.
29Copyright (C) 2019 NVIDIA
![Page 30: THE CUDA C++ STANDARD LIBRARY · 2019-12-03 · Bryce Adelstein Lelbach CUDA C++ Core Libraries Lead ISO C++ Library Evolution Incubator Chair, ISO C++ Tooling Study Group Chair THE](https://reader031.vdocument.in/reader031/viewer/2022041003/5ea5651ba5a6d775716fa30f/html5/thumbnails/30.jpg)
Stop Using volatile for synchronization:
volatile != atomic.
volatile is a vague pact; atomic<T> has clear semantics.
30Copyright (C) 2019 NVIDIA
![Page 31: THE CUDA C++ STANDARD LIBRARY · 2019-12-03 · Bryce Adelstein Lelbach CUDA C++ Core Libraries Lead ISO C++ Library Evolution Incubator Chair, ISO C++ Tooling Study Group Chair THE](https://reader031.vdocument.in/reader031/viewer/2022041003/5ea5651ba5a6d775716fa30f/html5/thumbnails/31.jpg)
Volta+ NVIDIA GPUs deliver and libcu++ exposes:
C++ Parallel Forward Progress Guarantees.
The C++ Memory Model.
Why does this matter?
31Copyright (C) 2019 NVIDIA
![Page 32: THE CUDA C++ STANDARD LIBRARY · 2019-12-03 · Bryce Adelstein Lelbach CUDA C++ Core Libraries Lead ISO C++ Library Evolution Incubator Chair, ISO C++ Tooling Study Group Chair THE](https://reader031.vdocument.in/reader031/viewer/2022041003/5ea5651ba5a6d775716fa30f/html5/thumbnails/32.jpg)
Why does this matter?
Volta+ NVIDIA GPUs and libcu++ enable a wide range of concurrent algorithms & data structures previously unavailable on GPUs.
More concurrent algorithms & data structures means more code can run on GPUs!
32Copyright (C) 2019 NVIDIA
![Page 33: THE CUDA C++ STANDARD LIBRARY · 2019-12-03 · Bryce Adelstein Lelbach CUDA C++ Core Libraries Lead ISO C++ Library Evolution Incubator Chair, ISO C++ Tooling Study Group Chair THE](https://reader031.vdocument.in/reader031/viewer/2022041003/5ea5651ba5a6d775716fa30f/html5/thumbnails/33.jpg)
33Copyright (C) 2019 NVIDIA
No limitations on
thread delays
Threads delayed
infinitely often
Thread delays
limited
Every thread
makes progressWait-free Obstruction-free Starvation-free
Some thread
makes progressLock-free Clash-free Deadlock-free
Non-Blocking Blocking
Source: http://www.cs.tau.ac.il/~shanir/progress.pdf
![Page 34: THE CUDA C++ STANDARD LIBRARY · 2019-12-03 · Bryce Adelstein Lelbach CUDA C++ Core Libraries Lead ISO C++ Library Evolution Incubator Chair, ISO C++ Tooling Study Group Chair THE](https://reader031.vdocument.in/reader031/viewer/2022041003/5ea5651ba5a6d775716fa30f/html5/thumbnails/34.jpg)
34Copyright (C) 2019 NVIDIA
No limitations on
thread delays
Threads delayed
infinitely often
Thread delays
limited
Every thread
makes progressWait-free Obstruction-free Starvation-free
Some thread
makes progressLock-free Clash-free Deadlock-free
Non-Blocking Blocking
Source: http://www.cs.tau.ac.il/~shanir/progress.pdf
Weakly Parallel
Forward Progress
Pre Volta NVIDIA GPUs
Other GPUs
![Page 35: THE CUDA C++ STANDARD LIBRARY · 2019-12-03 · Bryce Adelstein Lelbach CUDA C++ Core Libraries Lead ISO C++ Library Evolution Incubator Chair, ISO C++ Tooling Study Group Chair THE](https://reader031.vdocument.in/reader031/viewer/2022041003/5ea5651ba5a6d775716fa30f/html5/thumbnails/35.jpg)
35Copyright (C) 2019 NVIDIA
No limitations on
thread delays
Threads delayed
infinitely often
Thread delays
limited
Every thread
makes progressWait-free Obstruction-free Starvation-free
Some thread
makes progressLock-free Clash-free Deadlock-free
Non-Blocking Blocking
Source: http://www.cs.tau.ac.il/~shanir/progress.pdf
Weakly Parallel
Forward Progress
Pre Volta NVIDIA GPUs
Other GPUs
Parallel
Forward Progress
Only Volta+
![Page 36: THE CUDA C++ STANDARD LIBRARY · 2019-12-03 · Bryce Adelstein Lelbach CUDA C++ Core Libraries Lead ISO C++ Library Evolution Incubator Chair, ISO C++ Tooling Study Group Chair THE](https://reader031.vdocument.in/reader031/viewer/2022041003/5ea5651ba5a6d775716fa30f/html5/thumbnails/36.jpg)
Why does this matter?
Volta+ NVIDIA GPUs and libcu++ enable a wide range of concurrent algorithms & data structures previously unavailable on GPUs.
More concurrent algorithms & data structures means more code can run on GPUs!
36Copyright (C) 2019 NVIDIA
![Page 37: THE CUDA C++ STANDARD LIBRARY · 2019-12-03 · Bryce Adelstein Lelbach CUDA C++ Core Libraries Lead ISO C++ Library Evolution Incubator Chair, ISO C++ Tooling Study Group Chair THE](https://reader031.vdocument.in/reader031/viewer/2022041003/5ea5651ba5a6d775716fa30f/html5/thumbnails/37.jpg)
template <typename Key, typename Value,typename Hash = hash<Key>,typename Equal = equal_to<Key>>
struct concurrent_insert_only_map {enum state_type { state_empty, state_reserved, state_filled };
// ...
__host__ __device__ Value* try_insert(Key const& key, Value const& value);
private:uint64_t capacity_;Key* keys_;Value* values_;atomic<state_type>* states_;Hash hash_;Equal equal_;
};
38Copyright (C) 2019 NVIDIA
![Page 38: THE CUDA C++ STANDARD LIBRARY · 2019-12-03 · Bryce Adelstein Lelbach CUDA C++ Core Libraries Lead ISO C++ Library Evolution Incubator Chair, ISO C++ Tooling Study Group Chair THE](https://reader031.vdocument.in/reader031/viewer/2022041003/5ea5651ba5a6d775716fa30f/html5/thumbnails/38.jpg)
template <typename Key, typename Value,typename Hash = hash<Key>,typename Equal = equal_to<Key>>
struct concurrent_insert_only_map {enum state_type { state_empty, state_reserved, state_filled };
// ...
__host__ __device__ Value* try_insert(Key const& key, Value const& value);
private:uint64_t capacity_;Key* keys_;Value* values_;atomic<state_type>* states_;Hash hash_;Equal equal_;
};
39Copyright (C) 2019 NVIDIA
![Page 39: THE CUDA C++ STANDARD LIBRARY · 2019-12-03 · Bryce Adelstein Lelbach CUDA C++ Core Libraries Lead ISO C++ Library Evolution Incubator Chair, ISO C++ Tooling Study Group Chair THE](https://reader031.vdocument.in/reader031/viewer/2022041003/5ea5651ba5a6d775716fa30f/html5/thumbnails/39.jpg)
template <typename Key, typename Value,typename Hash = hash<Key>,typename Equal = equal_to<Key>>
struct concurrent_insert_only_map {enum state_type { state_empty, state_reserved, state_filled };
// ...
__host__ __device__ Value* try_insert(Key const& key, Value const& value);
private:uint64_t capacity_;Key* keys_;Value* values_;atomic<state_type>* states_;Hash hash_;Equal equal_;
};
40Copyright (C) 2019 NVIDIA
![Page 40: THE CUDA C++ STANDARD LIBRARY · 2019-12-03 · Bryce Adelstein Lelbach CUDA C++ Core Libraries Lead ISO C++ Library Evolution Incubator Chair, ISO C++ Tooling Study Group Chair THE](https://reader031.vdocument.in/reader031/viewer/2022041003/5ea5651ba5a6d775716fa30f/html5/thumbnails/40.jpg)
template <typename Key, typename Value,typename Hash = hash<Key>,typename Equal = equal_to<Key>>
struct concurrent_insert_only_map {enum state_type { state_empty, state_reserved, state_filled };
// ...
__host__ __device__ Value* try_insert(Key const& key, Value const& value);
private:uint64_t capacity_;Key* keys_;Value* values_;atomic<state_type>* states_;Hash hash_;Equal equal_;
};
41Copyright (C) 2019 NVIDIA
![Page 41: THE CUDA C++ STANDARD LIBRARY · 2019-12-03 · Bryce Adelstein Lelbach CUDA C++ Core Libraries Lead ISO C++ Library Evolution Incubator Chair, ISO C++ Tooling Study Group Chair THE](https://reader031.vdocument.in/reader031/viewer/2022041003/5ea5651ba5a6d775716fa30f/html5/thumbnails/41.jpg)
template <typename Key, typename Value,typename Hash = hash<Key>,typename Equal = equal_to<Key>>
struct concurrent_insert_only_map {enum state_type { state_empty, state_reserved, state_filled };
// ...
__host__ __device__ Value* try_insert(Key const& key, Value const& value);
private:uint64_t capacity_;Key* keys_;Value* values_;atomic<state_type>* states_;Hash hash_;Equal equal_;
};
42Copyright (C) 2019 NVIDIA
![Page 42: THE CUDA C++ STANDARD LIBRARY · 2019-12-03 · Bryce Adelstein Lelbach CUDA C++ Core Libraries Lead ISO C++ Library Evolution Incubator Chair, ISO C++ Tooling Study Group Chair THE](https://reader031.vdocument.in/reader031/viewer/2022041003/5ea5651ba5a6d775716fa30f/html5/thumbnails/42.jpg)
template <typename Key, typename Value,typename Hash = hash<Key>,typename Equal = equal_to<Key>>
struct concurrent_insert_only_map {enum state_type { state_empty, state_reserved, state_filled };
// ...
__host__ __device__ Value* try_insert(Key const& key, Value const& value);
private:uint64_t capacity_;Key* keys_;Value* values_;atomic<state_type>* states_;Hash hash_;Equal equal_;
};
43Copyright (C) 2019 NVIDIA
![Page 43: THE CUDA C++ STANDARD LIBRARY · 2019-12-03 · Bryce Adelstein Lelbach CUDA C++ Core Libraries Lead ISO C++ Library Evolution Incubator Chair, ISO C++ Tooling Study Group Chair THE](https://reader031.vdocument.in/reader031/viewer/2022041003/5ea5651ba5a6d775716fa30f/html5/thumbnails/43.jpg)
template <typename Key, typename Value,typename Hash = hash<Key>,typename Equal = equal_to<Key>>
struct concurrent_insert_only_map {enum state_type { state_empty, state_reserved, state_filled };
// ...
__host__ __device__ Value* try_insert(Key const& key, Value const& value);
private:uint64_t capacity_;Key* keys_;Value* values_;atomic<state_type>* states_;Hash hash_;Equal equal_;
};
44Copyright (C) 2019 NVIDIA
![Page 44: THE CUDA C++ STANDARD LIBRARY · 2019-12-03 · Bryce Adelstein Lelbach CUDA C++ Core Libraries Lead ISO C++ Library Evolution Incubator Chair, ISO C++ Tooling Study Group Chair THE](https://reader031.vdocument.in/reader031/viewer/2022041003/5ea5651ba5a6d775716fa30f/html5/thumbnails/44.jpg)
struct concurrent_insert_only_map {__host__ __device__ Value* try_insert(Key const& key, Value const& value) {
auto index(hash_(key) % capacity_);// ...
}};
45Copyright (C) 2019 NVIDIA
![Page 45: THE CUDA C++ STANDARD LIBRARY · 2019-12-03 · Bryce Adelstein Lelbach CUDA C++ Core Libraries Lead ISO C++ Library Evolution Incubator Chair, ISO C++ Tooling Study Group Chair THE](https://reader031.vdocument.in/reader031/viewer/2022041003/5ea5651ba5a6d775716fa30f/html5/thumbnails/45.jpg)
struct concurrent_insert_only_map {__host__ __device__ Value* try_insert(Key const& key, Value const& value) {
auto index(hash_(key) % capacity_);for (uint64_t i = 0; i < capacity_; ++i) { // Linearly probe up to `capacity_` times.
// ...}return nullptr; // If we are here, the container is full.
}};
46Copyright (C) 2019 NVIDIA
![Page 46: THE CUDA C++ STANDARD LIBRARY · 2019-12-03 · Bryce Adelstein Lelbach CUDA C++ Core Libraries Lead ISO C++ Library Evolution Incubator Chair, ISO C++ Tooling Study Group Chair THE](https://reader031.vdocument.in/reader031/viewer/2022041003/5ea5651ba5a6d775716fa30f/html5/thumbnails/46.jpg)
struct concurrent_insert_only_map {__host__ __device__ Value* try_insert(Key const& key, Value const& value) {
auto index(hash_(key) % capacity_);for (uint64_t i = 0; i < capacity_; ++i) { // Linearly probe up to `capacity_` times.
// ...}return nullptr; // If we are here, the container is full.
}};
47Copyright (C) 2019 NVIDIA
![Page 47: THE CUDA C++ STANDARD LIBRARY · 2019-12-03 · Bryce Adelstein Lelbach CUDA C++ Core Libraries Lead ISO C++ Library Evolution Incubator Chair, ISO C++ Tooling Study Group Chair THE](https://reader031.vdocument.in/reader031/viewer/2022041003/5ea5651ba5a6d775716fa30f/html5/thumbnails/47.jpg)
struct concurrent_insert_only_map {__host__ __device__ Value* try_insert(Key const& key, Value const& value) {
auto index(hash_(key) % capacity_);for (uint64_t i = 0; i < capacity_; ++i) { // Linearly probe up to `capacity_` times.
state_type old = states_[index].load(memory_order_acquire);while (old == state_empty) { // As long as the slot is empty, try to lock it.
// ...}// ...
}return nullptr; // If we are here, the container is full.
}};
48Copyright (C) 2019 NVIDIA
![Page 48: THE CUDA C++ STANDARD LIBRARY · 2019-12-03 · Bryce Adelstein Lelbach CUDA C++ Core Libraries Lead ISO C++ Library Evolution Incubator Chair, ISO C++ Tooling Study Group Chair THE](https://reader031.vdocument.in/reader031/viewer/2022041003/5ea5651ba5a6d775716fa30f/html5/thumbnails/48.jpg)
struct concurrent_insert_only_map {__host__ __device__ Value* try_insert(Key const& key, Value const& value) {
auto index(hash_(key) % capacity_);for (uint64_t i = 0; i < capacity_; ++i) { // Linearly probe up to `capacity_` times.
state_type old = states_[index].load(memory_order_acquire);while (old == state_empty) { // As long as the slot is empty, try to lock it.
if (states_[index].compare_exchange_weak(old, state_reserved, memory_order_acq_rel)) {// We locked it by setting the state to `state_reserved`; now insert the key & value.// ...
}}// ...
}return nullptr; // If we are here, the container is full.
}};
49Copyright (C) 2019 NVIDIA
![Page 49: THE CUDA C++ STANDARD LIBRARY · 2019-12-03 · Bryce Adelstein Lelbach CUDA C++ Core Libraries Lead ISO C++ Library Evolution Incubator Chair, ISO C++ Tooling Study Group Chair THE](https://reader031.vdocument.in/reader031/viewer/2022041003/5ea5651ba5a6d775716fa30f/html5/thumbnails/49.jpg)
struct concurrent_insert_only_map {__host__ __device__ Value* try_insert(Key const& key, Value const& value) {
auto index(hash_(key) % capacity_);for (uint64_t i = 0; i < capacity_; ++i) { // Linearly probe up to `capacity_` times.
state_type old = states_[index].load(memory_order_acquire);while (old == state_empty) { // As long as the slot is empty, try to lock it.
if (states_[index].compare_exchange_weak(old, state_reserved, memory_order_acq_rel)) {// We locked it by setting the state to `state_reserved`; now insert the key & value.new (keys_ + index) Key(key);new (values_ + index) Value(value);states_[index].store(state_filled, memory_order_release); // Unlock the slot.
return values_ + index;}
}// ...
}return nullptr; // If we are here, the container is full.
}};
50Copyright (C) 2019 NVIDIA
![Page 50: THE CUDA C++ STANDARD LIBRARY · 2019-12-03 · Bryce Adelstein Lelbach CUDA C++ Core Libraries Lead ISO C++ Library Evolution Incubator Chair, ISO C++ Tooling Study Group Chair THE](https://reader031.vdocument.in/reader031/viewer/2022041003/5ea5651ba5a6d775716fa30f/html5/thumbnails/50.jpg)
struct concurrent_insert_only_map {__host__ __device__ Value* try_insert(Key const& key, Value const& value) {
auto index(hash_(key) % capacity_);for (uint64_t i = 0; i < capacity_; ++i) { // Linearly probe up to `capacity_` times.
state_type old = states_[index].load(memory_order_acquire);while (old == state_empty) { // As long as the slot is empty, try to lock it.
if (states_[index].compare_exchange_weak(old, state_reserved, memory_order_acq_rel)) {// We locked it by setting the state to `state_reserved`; now insert the key & value.new (keys_ + index) Key(key);new (values_ + index) Value(value);states_[index].store(state_filled, memory_order_release); // Unlock the slot.
return values_ + index;}
}// ...
}return nullptr; // If we are here, the container is full.
}};
51Copyright (C) 2019 NVIDIA
![Page 51: THE CUDA C++ STANDARD LIBRARY · 2019-12-03 · Bryce Adelstein Lelbach CUDA C++ Core Libraries Lead ISO C++ Library Evolution Incubator Chair, ISO C++ Tooling Study Group Chair THE](https://reader031.vdocument.in/reader031/viewer/2022041003/5ea5651ba5a6d775716fa30f/html5/thumbnails/51.jpg)
struct concurrent_insert_only_map {__host__ __device__ Value* try_insert(Key const& key, Value const& value) {
auto index(hash_(key) % capacity_);for (uint64_t i = 0; i < capacity_; ++i) { // Linearly probe up to `capacity_` times.
state_type old = states_[index].load(memory_order_acquire);while (old == state_empty) { // As long as the slot is empty, try to lock it.
if (states_[index].compare_exchange_weak(old, state_reserved, memory_order_acq_rel)) {// We locked it by setting the state to `state_reserved`; now insert the key & value.new (keys_ + index) Key(key);new (values_ + index) Value(value);states_[index].store(state_filled, memory_order_release); // Unlock the slot.states_[index].notify_all(); // Wake up anyone who was waiting for us to fill the slot.return values_ + index;
}}// ...
}return nullptr; // If we are here, the container is full.
}};
52Copyright (C) 2019 NVIDIA
![Page 52: THE CUDA C++ STANDARD LIBRARY · 2019-12-03 · Bryce Adelstein Lelbach CUDA C++ Core Libraries Lead ISO C++ Library Evolution Incubator Chair, ISO C++ Tooling Study Group Chair THE](https://reader031.vdocument.in/reader031/viewer/2022041003/5ea5651ba5a6d775716fa30f/html5/thumbnails/52.jpg)
struct concurrent_insert_only_map {__host__ __device__ Value* try_insert(Key const& key, Value const& value) {
auto index(hash_(key) % capacity_);for (uint64_t i = 0; i < capacity_; ++i) { // Linearly probe up to `capacity_` times.
state_type old = states_[index].load(memory_order_acquire);while (old == state_empty) { // As long as the slot is empty, try to lock it.
if (states_[index].compare_exchange_weak(old, state_reserved, memory_order_acq_rel)) {// We locked it by setting the state to `state_reserved`; now insert the key & value.new (keys_ + index) Key(key);new (values_ + index) Value(value);states_[index].store(state_filled, memory_order_release); // Unlock the slot.states_[index].notify_all(); // Wake up anyone who was waiting for us to fill the slot.return values_ + index;
}}// ...
}return nullptr; // If we are here, the container is full.
}};
53Copyright (C) 2019 NVIDIA
![Page 53: THE CUDA C++ STANDARD LIBRARY · 2019-12-03 · Bryce Adelstein Lelbach CUDA C++ Core Libraries Lead ISO C++ Library Evolution Incubator Chair, ISO C++ Tooling Study Group Chair THE](https://reader031.vdocument.in/reader031/viewer/2022041003/5ea5651ba5a6d775716fa30f/html5/thumbnails/53.jpg)
struct concurrent_insert_only_map {__host__ __device__ Value* try_insert(Key const& key, Value const& value) {
auto index(hash_(key) % capacity_);for (uint64_t i = 0; i < capacity_; ++i) { // Linearly probe up to `capacity_` times.
state_type old = states_[index].load(memory_order_acquire);while (old == state_empty) { // As long as the slot is empty, try to lock it.
if (states_[index].compare_exchange_weak(old, state_reserved, memory_order_acq_rel)) {// We locked it by setting the state to `state_reserved`; now insert the key & value.new (keys_ + index) Key(key);new (values_ + index) Value(value);states_[index].store(state_filled, memory_order_release); // Unlock the slot.states_[index].notify_all(); // Wake up anyone who was waiting for us to fill the slot.return values_ + index;
}} // If we didn’t fill the slot, wait for it to be filled and check if it matches.while (state_filled != states_[index].load(memory_order_acquire))
;// ...
}return nullptr; // If we are here, the container is full.
}};
54Copyright (C) 2019 NVIDIA
![Page 54: THE CUDA C++ STANDARD LIBRARY · 2019-12-03 · Bryce Adelstein Lelbach CUDA C++ Core Libraries Lead ISO C++ Library Evolution Incubator Chair, ISO C++ Tooling Study Group Chair THE](https://reader031.vdocument.in/reader031/viewer/2022041003/5ea5651ba5a6d775716fa30f/html5/thumbnails/54.jpg)
struct concurrent_insert_only_map {__host__ __device__ Value* try_insert(Key const& key, Value const& value) {
auto index(hash_(key) % capacity_);for (uint64_t i = 0; i < capacity_; ++i) { // Linearly probe up to `capacity_` times.
state_type old = states_[index].load(memory_order_acquire);while (old == state_empty) { // As long as the slot is empty, try to lock it.
if (states_[index].compare_exchange_weak(old, state_reserved, memory_order_acq_rel)) {// We locked it by setting the state to `state_reserved`; now insert the key & value.new (keys_ + index) Key(key);new (values_ + index) Value(value);states_[index].store(state_filled, memory_order_release); // Unlock the slot.states_[index].notify_all(); // Wake up anyone who was waiting for us to fill the slot.return values_ + index;
}} // If we didn’t fill the slot, wait for it to be filled and check if it matches.states_[index].wait(state_reserved, memory_order_acquire);// ...
}return nullptr; // If we are here, the container is full.
}};
55Copyright (C) 2019 NVIDIA
![Page 55: THE CUDA C++ STANDARD LIBRARY · 2019-12-03 · Bryce Adelstein Lelbach CUDA C++ Core Libraries Lead ISO C++ Library Evolution Incubator Chair, ISO C++ Tooling Study Group Chair THE](https://reader031.vdocument.in/reader031/viewer/2022041003/5ea5651ba5a6d775716fa30f/html5/thumbnails/55.jpg)
struct concurrent_insert_only_map {__host__ __device__ Value* try_insert(Key const& key, Value const& value) {
auto index(hash_(key) % capacity_);for (uint64_t i = 0; i < capacity_; ++i) { // Linearly probe up to `capacity_` times.
state_type old = states_[index].load(memory_order_acquire);while (old == state_empty) { // As long as the slot is empty, try to lock it.
if (states_[index].compare_exchange_weak(old, state_reserved, memory_order_acq_rel)) {// We locked it by setting the state to `state_reserved`; now insert the key & value.new (keys_ + index) Key(key);new (values_ + index) Value(value);states_[index].store(state_filled, memory_order_release); // Unlock the slot.states_[index].notify_all(); // Wake up anyone who was waiting for us to fill the slot.return values_ + index;
}} // If we didn’t fill the slot, wait for it to be filled and check if it matches.states_[index].wait(state_reserved, memory_order_acquire);if (equal_(keys_[index], key)) return values_ + index; // Someone else inserted.// ...
}return nullptr; // If we are here, the container is full.
}};
56Copyright (C) 2019 NVIDIA
![Page 56: THE CUDA C++ STANDARD LIBRARY · 2019-12-03 · Bryce Adelstein Lelbach CUDA C++ Core Libraries Lead ISO C++ Library Evolution Incubator Chair, ISO C++ Tooling Study Group Chair THE](https://reader031.vdocument.in/reader031/viewer/2022041003/5ea5651ba5a6d775716fa30f/html5/thumbnails/56.jpg)
struct concurrent_insert_only_map {__host__ __device__ Value* try_insert(Key const& key, Value const& value) {
auto index(hash_(key) % capacity_);for (uint64_t i = 0; i < capacity_; ++i) { // Linearly probe up to `capacity_` times.
state_type old = states_[index].load(memory_order_acquire);while (old == state_empty) { // As long as the slot is empty, try to lock it.
if (states_[index].compare_exchange_weak(old, state_reserved, memory_order_acq_rel)) {// We locked it by setting the state to `state_reserved`; now insert the key & value.new (keys_ + index) Key(key);new (values_ + index) Value(value);states_[index].store(state_filled, memory_order_release); // Unlock the slot.states_[index].notify_all(); // Wake up anyone who was waiting for us to fill the slot.return values_ + index;
}} // If we didn’t fill the slot, wait for it to be filled and check if it matches.states_[index].wait(state_reserved, memory_order_acquire);if (equal_(keys_[index], key)) return values_ + index; // Someone else inserted.index = (index + 1) % capacity_; // Collision: keys didn’t match. Try the next slot.
}return nullptr; // If we are here, the container is full.
}};
57Copyright (C) 2019 NVIDIA
![Page 57: THE CUDA C++ STANDARD LIBRARY · 2019-12-03 · Bryce Adelstein Lelbach CUDA C++ Core Libraries Lead ISO C++ Library Evolution Incubator Chair, ISO C++ Tooling Study Group Chair THE](https://reader031.vdocument.in/reader031/viewer/2022041003/5ea5651ba5a6d775716fa30f/html5/thumbnails/57.jpg)
struct concurrent_insert_only_map {__host__ __device__ Value* try_insert(Key const& key, Value const& value) {
auto index(hash_(key) % capacity_);for (uint64_t i = 0; i < capacity_; ++i) { // Linearly probe up to `capacity_` times.
state_type old = states_[index].load(memory_order_acquire);while (old == state_empty) { // As long as the slot is empty, try to lock it.
if (states_[index].compare_exchange_weak(old, state_reserved, memory_order_acq_rel)) {// We locked it by setting the state to `state_reserved`; now insert the key & value.new (keys_ + index) Key(key);new (values_ + index) Value(value);states_[index].store(state_filled, memory_order_release); // Unlock the slot.states_[index].notify_all(); // Wake up anyone who was waiting for us to fill the slot.return values_ + index;
}} // If we didn’t fill the slot, wait for it to be filled and check if it matches.states_[index].wait(state_reserved, memory_order_acquire);if (equal_(keys_[index], key)) return values_ + index; // Someone else inserted.index = (index + 1) % capacity_; // Collision: keys didn’t match. Try the next slot.
}return nullptr; // If we are here, the container is full.
}};
58Copyright (C) 2019 NVIDIA
![Page 58: THE CUDA C++ STANDARD LIBRARY · 2019-12-03 · Bryce Adelstein Lelbach CUDA C++ Core Libraries Lead ISO C++ Library Evolution Incubator Chair, ISO C++ Tooling Study Group Chair THE](https://reader031.vdocument.in/reader031/viewer/2022041003/5ea5651ba5a6d775716fa30f/html5/thumbnails/58.jpg)
libcu++
Opt-in, heterogeneous, incremental C++ standard library for CUDA.
Open source; port of LLVM’s libc++; contributing upstream.
Version 1 (next week): <atomic> (Pascal+), <type_traits>.
Version 2 (1H 2020): atomic<T>::wait/notify, <barrier>, <latch>, <counting_semaphore> (all Volta+), <chrono>, <ratio>, <functional> minus function.
Future priorities: atomic_ref<T>, <complex>, <tuple>, <array>, <utility>, <cmath>, string processing, …
The CUDA C++ Standard Library
59Copyright (C) 2019 NVIDIA
![Page 59: THE CUDA C++ STANDARD LIBRARY · 2019-12-03 · Bryce Adelstein Lelbach CUDA C++ Core Libraries Lead ISO C++ Library Evolution Incubator Chair, ISO C++ Tooling Study Group Chair THE](https://reader031.vdocument.in/reader031/viewer/2022041003/5ea5651ba5a6d775716fa30f/html5/thumbnails/59.jpg)
60Copyright (C) 2019 Bryce Adelstein Lelbach
@blelbach