c++ memory model · data race memory location [intro.memory(1.7)/3] an object of scalar type or a...
TRANSCRIPT
![Page 1: C++ Memory Model · Data Race memory location [intro.memory(1.7)/3] an object of scalar type or a maximal sequence of adjacent non-zero width bit-fields conflicting action [intro.multithread(1.10)/4]](https://reader034.vdocument.in/reader034/viewer/2022050716/5e4247622266a35ec3683b86/html5/thumbnails/1.jpg)
C++ Memory ModelValentin Ziegler
Fabio Fracassi
Meeting C++ Berlin, December 6th, 2014
![Page 2: C++ Memory Model · Data Race memory location [intro.memory(1.7)/3] an object of scalar type or a maximal sequence of adjacent non-zero width bit-fields conflicting action [intro.multithread(1.10)/4]](https://reader034.vdocument.in/reader034/viewer/2022050716/5e4247622266a35ec3683b86/html5/thumbnails/2.jpg)
The machine does not execute the code you wrote…
2
![Page 3: C++ Memory Model · Data Race memory location [intro.memory(1.7)/3] an object of scalar type or a maximal sequence of adjacent non-zero width bit-fields conflicting action [intro.multithread(1.10)/4]](https://reader034.vdocument.in/reader034/viewer/2022050716/5e4247622266a35ec3683b86/html5/thumbnails/3.jpg)
How your code is executed
3
Code Compiler CPU Cachevoid foo(int n, int m)
{
int x=1;
for (
int i=0;
i<n;
++i
){
if (m<0) {
x -= x*m;
}
}
return x;
}
mov esi,1
test ecx,ecx
jle 011F1301
lea ebx,[ebx]
mov eax,1
sub eax,edx
imul esi,eax
dec ecx
jne 011F12F0
...
• Register allocation• Loop unswitching
• Branch prediction• Out of order exec.
• Prefetching• Buffering
000001002003
Memory
010011012013
00C00D00E00F
00800900A00B
004005006007
Optimization Memory
000001222003
008999AAA00B
00C00D00E00F
004005006007
![Page 4: C++ Memory Model · Data Race memory location [intro.memory(1.7)/3] an object of scalar type or a maximal sequence of adjacent non-zero width bit-fields conflicting action [intro.multithread(1.10)/4]](https://reader034.vdocument.in/reader034/viewer/2022050716/5e4247622266a35ec3683b86/html5/thumbnails/4.jpg)
How your code is executed
Single thread execution model (C++03):
• Program will behave as-if it was yours:Result is the same as if operations were executed in the order specified by the program
• We can not observe optimizations performed by the system
4
![Page 5: C++ Memory Model · Data Race memory location [intro.memory(1.7)/3] an object of scalar type or a maximal sequence of adjacent non-zero width bit-fields conflicting action [intro.multithread(1.10)/4]](https://reader034.vdocument.in/reader034/viewer/2022050716/5e4247622266a35ec3683b86/html5/thumbnails/5.jpg)
Two threads of execution?
• Optimizations become observable
• Optimizations may break “naive” concurrent algorithms
5
f1 = true;
if (!f2) {
// critical section
}
…
Thread #1
f2 = true;
if (!f1) {
// critical section
}
…
Thread #2
bool f1 = false; bool f2 = false;
![Page 6: C++ Memory Model · Data Race memory location [intro.memory(1.7)/3] an object of scalar type or a maximal sequence of adjacent non-zero width bit-fields conflicting action [intro.multithread(1.10)/4]](https://reader034.vdocument.in/reader034/viewer/2022050716/5e4247622266a35ec3683b86/html5/thumbnails/6.jpg)
Memory Model
• Describes the interactions of threads through memory and their shared use of data.
• Tells us if our program has well defined behavior.
• Constrains code generation for compiler
6
![Page 7: C++ Memory Model · Data Race memory location [intro.memory(1.7)/3] an object of scalar type or a maximal sequence of adjacent non-zero width bit-fields conflicting action [intro.multithread(1.10)/4]](https://reader034.vdocument.in/reader034/viewer/2022050716/5e4247622266a35ec3683b86/html5/thumbnails/7.jpg)
The C++ Memory model
7
C++ Memory Model BasicsData Races, Sequential Consistency, Synchronization
Meddling with Memory OrderRelaxed Atomic Operations, and subtle Consequences
![Page 8: C++ Memory Model · Data Race memory location [intro.memory(1.7)/3] an object of scalar type or a maximal sequence of adjacent non-zero width bit-fields conflicting action [intro.multithread(1.10)/4]](https://reader034.vdocument.in/reader034/viewer/2022050716/5e4247622266a35ec3683b86/html5/thumbnails/8.jpg)
Data Race
memory location [intro.memory(1.7)/3]an object of scalar type or a maximal sequence of adjacent non-zero width bit-fields
conflicting action [intro.multithread(1.10)/4]two (or more) actions that access the same memory location and at least one of them is a write
data race [intro.multithread(1.10)/21]two conflicting actions in different threads and neither happens before the other.
8
![Page 9: C++ Memory Model · Data Race memory location [intro.memory(1.7)/3] an object of scalar type or a maximal sequence of adjacent non-zero width bit-fields conflicting action [intro.multithread(1.10)/4]](https://reader034.vdocument.in/reader034/viewer/2022050716/5e4247622266a35ec3683b86/html5/thumbnails/9.jpg)
Data Race
memory location [intro.memory(1.7)/3]an object of scalar type or a maximal sequence of adjacent non-zero width bit-fields
conflicting action [intro.multithread(1.10)/4]two (or more) actions that access the same memory location and at least one of them is a write
data race [intro.multithread(1.10)/21]two conflicting actions in different threads and neither happens before the other.
9
int i;
char c;
int a:5,
b:7;
X* p;
![Page 10: C++ Memory Model · Data Race memory location [intro.memory(1.7)/3] an object of scalar type or a maximal sequence of adjacent non-zero width bit-fields conflicting action [intro.multithread(1.10)/4]](https://reader034.vdocument.in/reader034/viewer/2022050716/5e4247622266a35ec3683b86/html5/thumbnails/10.jpg)
Data Race
memory location [intro.memory(1.7)/3]an object of scalar type or a maximal sequence of adjacent non-zero width bit-fields
conflicting action [intro.multithread(1.10)/4]two (or more) actions that access the same memory location and at least one of them is a write
data race [intro.multithread(1.10)/21]two conflicting actions in different threads and neither happens before the other.
10
int i;
char c;
int a:5,
b:7;
X* p;
a = 23;
X* x = p;
int n = i
Thread #1
c = ‘@’;
X* x = p;
i = 42;
Thread #2
![Page 11: C++ Memory Model · Data Race memory location [intro.memory(1.7)/3] an object of scalar type or a maximal sequence of adjacent non-zero width bit-fields conflicting action [intro.multithread(1.10)/4]](https://reader034.vdocument.in/reader034/viewer/2022050716/5e4247622266a35ec3683b86/html5/thumbnails/11.jpg)
Data Race
memory location [intro.memory(1.7)/3]an object of scalar type or a maximal sequence of adjacent non-zero width bit-fields
conflicting action [intro.multithread(1.10)/4]two (or more) actions that access the same memory location and at least one of them is a write
data race [intro.multithread(1.10)/21]two conflicting actions in different threads and neither happens before the other.
11
data race == undefined behavior !
![Page 12: C++ Memory Model · Data Race memory location [intro.memory(1.7)/3] an object of scalar type or a maximal sequence of adjacent non-zero width bit-fields conflicting action [intro.multithread(1.10)/4]](https://reader034.vdocument.in/reader034/viewer/2022050716/5e4247622266a35ec3683b86/html5/thumbnails/12.jpg)
Sequential Consistency
sequential consistency [Leslie Lamport, 1979]
the result of any execution is the same as-if
1. the operations of all threads are executed in some sequential order
2. the operations of each thread appear in this sequence in the order specified by their program
12
![Page 13: C++ Memory Model · Data Race memory location [intro.memory(1.7)/3] an object of scalar type or a maximal sequence of adjacent non-zero width bit-fields conflicting action [intro.multithread(1.10)/4]](https://reader034.vdocument.in/reader034/viewer/2022050716/5e4247622266a35ec3683b86/html5/thumbnails/13.jpg)
Sequential Consistency
sequential consistency [Leslie Lamport, 1979]the result of any execution is the same as-if the operations of all threads are executed in some sequential order, and the operations of each thread appear in this sequence in the order specified by their program
13
13
A
BC
D
A
B
C
D
A
B
C
D
A
B
C
D
A
B
C
D
A
B
C
D
A
B
C
D
A
B C
D
![Page 14: C++ Memory Model · Data Race memory location [intro.memory(1.7)/3] an object of scalar type or a maximal sequence of adjacent non-zero width bit-fields conflicting action [intro.multithread(1.10)/4]](https://reader034.vdocument.in/reader034/viewer/2022050716/5e4247622266a35ec3683b86/html5/thumbnails/14.jpg)
The C++ memory model
Here is the deal:
• We take care our program does not contain data races
• The system guarantees sequentially consistent execution
14
sequential consistency for data-race-free programsSC-DRF
![Page 15: C++ Memory Model · Data Race memory location [intro.memory(1.7)/3] an object of scalar type or a maximal sequence of adjacent non-zero width bit-fields conflicting action [intro.multithread(1.10)/4]](https://reader034.vdocument.in/reader034/viewer/2022050716/5e4247622266a35ec3683b86/html5/thumbnails/15.jpg)
synchronize (the easy way)…
15
![Page 16: C++ Memory Model · Data Race memory location [intro.memory(1.7)/3] an object of scalar type or a maximal sequence of adjacent non-zero width bit-fields conflicting action [intro.multithread(1.10)/4]](https://reader034.vdocument.in/reader034/viewer/2022050716/5e4247622266a35ec3683b86/html5/thumbnails/16.jpg)
std::mutex mtx;
{
mtx.lock();
// access shared data here
mtx.unlock();
}
Locks
Mutually exclusive execution of critical code blocks
Mutex provides inter-thread synchronization:
unlock() synchronizes with calls to
lock() on the same mutex object.
16
![Page 17: C++ Memory Model · Data Race memory location [intro.memory(1.7)/3] an object of scalar type or a maximal sequence of adjacent non-zero width bit-fields conflicting action [intro.multithread(1.10)/4]](https://reader034.vdocument.in/reader034/viewer/2022050716/5e4247622266a35ec3683b86/html5/thumbnails/17.jpg)
std::mutex mtx;
{
mtx.lock();
// access shared data here
mtx.unlock();
}
Locks
Mutually exclusive execution of critical code blocks
Mutex provides inter-thread synchronization:
unlock() synchronizes with calls to
lock() on the same mutex object.
17
![Page 18: C++ Memory Model · Data Race memory location [intro.memory(1.7)/3] an object of scalar type or a maximal sequence of adjacent non-zero width bit-fields conflicting action [intro.multithread(1.10)/4]](https://reader034.vdocument.in/reader034/viewer/2022050716/5e4247622266a35ec3683b86/html5/thumbnails/18.jpg)
Locks
Mutually exclusive execution of critical code blocks
Mutex provides inter-thread synchronization:
unlock() synchronizes with calls to
lock() on the same mutex object.
18
std::mutex mtx;
{
std::lock_guard<std::mutex> lg(mtx);
// access shared data here
// lg destructor releases mtx
}
![Page 19: C++ Memory Model · Data Race memory location [intro.memory(1.7)/3] an object of scalar type or a maximal sequence of adjacent non-zero width bit-fields conflicting action [intro.multithread(1.10)/4]](https://reader034.vdocument.in/reader034/viewer/2022050716/5e4247622266a35ec3683b86/html5/thumbnails/19.jpg)
{
mtx.lock();
PrepareData();
bDataReady=true;
mtx.unlock();
}
Thread #1
{
mtx.lock();
if (bDataReady) {
ConsumeData();
}
mtx.unlock();
}
Thread #2
Synchronize using Locks
std::mutex mtx; bool bDataReady=false;
“Simplistic view” on locking:Critical code cannot run in both threads “simultaneously”
19
![Page 20: C++ Memory Model · Data Race memory location [intro.memory(1.7)/3] an object of scalar type or a maximal sequence of adjacent non-zero width bit-fields conflicting action [intro.multithread(1.10)/4]](https://reader034.vdocument.in/reader034/viewer/2022050716/5e4247622266a35ec3683b86/html5/thumbnails/20.jpg)
• The C++ standard identifies certain operations to be synchronizing operations.
• If A()synchronizes with B(),then X() happens before Y().
X();
A();
Thread #1
B();
Y();
Thread #2
What about synchronization?
20
![Page 21: C++ Memory Model · Data Race memory location [intro.memory(1.7)/3] an object of scalar type or a maximal sequence of adjacent non-zero width bit-fields conflicting action [intro.multithread(1.10)/4]](https://reader034.vdocument.in/reader034/viewer/2022050716/5e4247622266a35ec3683b86/html5/thumbnails/21.jpg)
std::mutex mtx;
{
mtx.lock();
// access shared data here
mtx.unlock();
}
Locks
Mutual exclusive execution of critical code blocks
Mutex provides inter-thread synchronization:
unlock() synchronizes with calls to
lock() on the same mutex object.
21
![Page 22: C++ Memory Model · Data Race memory location [intro.memory(1.7)/3] an object of scalar type or a maximal sequence of adjacent non-zero width bit-fields conflicting action [intro.multithread(1.10)/4]](https://reader034.vdocument.in/reader034/viewer/2022050716/5e4247622266a35ec3683b86/html5/thumbnails/22.jpg)
{
mtx.lock();
PrepareData();
bDataReady=true;
mtx.unlock();
}
Thread #1
{
mtx.lock();
if (bDataReady) {
ConsumeData();
}
mtx.unlock();
}
mtx.unlock()synchronizes with mtx.lock()PrepareData() happens before ConsumeData()
Thread #2
Synchronize using Locks
std::mutex mtx; bool bDataReady=false;
22
![Page 23: C++ Memory Model · Data Race memory location [intro.memory(1.7)/3] an object of scalar type or a maximal sequence of adjacent non-zero width bit-fields conflicting action [intro.multithread(1.10)/4]](https://reader034.vdocument.in/reader034/viewer/2022050716/5e4247622266a35ec3683b86/html5/thumbnails/23.jpg)
PrepareData(); // once
{
mtx.lock();
bDataReady=true;
mtx.unlock();
}
Thread #1
bool b;
{
mtx.lock();
b=bDataReady;
mtx.unlock();
}
if (b) ConsumeData();
Proper synchronization,if PrepareData()is never executed again.
Thread #2
Synchronize using Locks
std::mutex mtx; bool bDataReady=false;
23
![Page 24: C++ Memory Model · Data Race memory location [intro.memory(1.7)/3] an object of scalar type or a maximal sequence of adjacent non-zero width bit-fields conflicting action [intro.multithread(1.10)/4]](https://reader034.vdocument.in/reader034/viewer/2022050716/5e4247622266a35ec3683b86/html5/thumbnails/24.jpg)
?
PrepareData(); // once
{
mtx.lock();
bDataReady=true;
mtx.unlock();
}
Thread #1
if (!mtx.try_lock()){
// l33t optimization:
// thread 1 should
// be done with
// PrepareData :-)
ConsumeData();
}
Thread #2
Clever ?
std::mutex mtx; bool bDataReady=false;
No synchronization.
Data Race!
24
![Page 25: C++ Memory Model · Data Race memory location [intro.memory(1.7)/3] an object of scalar type or a maximal sequence of adjacent non-zero width bit-fields conflicting action [intro.multithread(1.10)/4]](https://reader034.vdocument.in/reader034/viewer/2022050716/5e4247622266a35ec3683b86/html5/thumbnails/25.jpg)
std::atomic<>
• “Data race free” variable, e.g., std::atomic<int>
• (by default) provides inter-thread synchronization:
a store synchronizes with operations that load the stored value.
• (by default) sequential consistency
• Needs hardware support(not all platforms provide lock-free atomics)
25
![Page 26: C++ Memory Model · Data Race memory location [intro.memory(1.7)/3] an object of scalar type or a maximal sequence of adjacent non-zero width bit-fields conflicting action [intro.multithread(1.10)/4]](https://reader034.vdocument.in/reader034/viewer/2022050716/5e4247622266a35ec3683b86/html5/thumbnails/26.jpg)
std::atomic<>
• “Data race free” variable, e.g., std::atomic<int>
• (by default) provides inter-thread synchronization:
a store synchronizes with operations that load the stored value.
• (by default) sequential consistency
• Needs hardware support(not all platforms provide lock-free atomics)
26
In C++, this is spelled std::atomic, not volatile !
![Page 27: C++ Memory Model · Data Race memory location [intro.memory(1.7)/3] an object of scalar type or a maximal sequence of adjacent non-zero width bit-fields conflicting action [intro.multithread(1.10)/4]](https://reader034.vdocument.in/reader034/viewer/2022050716/5e4247622266a35ec3683b86/html5/thumbnails/27.jpg)
PrepareData(); // once
bDataReady.store(true);
Thread #1
if(bDataReady.load()){
ConsumeData()
}
Thread #2
Synchronize using atomics
std::mutex mtx; std::atomic<bool> bDataReady(false);
Proper synchronization,if PrepareData()is never executed again.
27
![Page 28: C++ Memory Model · Data Race memory location [intro.memory(1.7)/3] an object of scalar type or a maximal sequence of adjacent non-zero width bit-fields conflicting action [intro.multithread(1.10)/4]](https://reader034.vdocument.in/reader034/viewer/2022050716/5e4247622266a35ec3683b86/html5/thumbnails/28.jpg)
Excursion: lock-free programming
28
template<typename T> class lock_free_list {
struct node{
T data; node* next;
};
std::atomic<node*> head;
public:
void push(T const& data) {
node* const newNode = new node(data);
newNode->next = head.load();
while(!head.compare_exchange_weak(newNode->next, newNode))
;
}
};
DesiredExpectedTarget
![Page 29: C++ Memory Model · Data Race memory location [intro.memory(1.7)/3] an object of scalar type or a maximal sequence of adjacent non-zero width bit-fields conflicting action [intro.multithread(1.10)/4]](https://reader034.vdocument.in/reader034/viewer/2022050716/5e4247622266a35ec3683b86/html5/thumbnails/29.jpg)
Are we there yet?
Avoid data races and you will be fine• Synchronize correctly
• Implement lock-free data structures as described in your favorite computer science text book
The C++ memory model guarantees sequential consistency
• as does the memory model of Java and C#
29
![Page 30: C++ Memory Model · Data Race memory location [intro.memory(1.7)/3] an object of scalar type or a maximal sequence of adjacent non-zero width bit-fields conflicting action [intro.multithread(1.10)/4]](https://reader034.vdocument.in/reader034/viewer/2022050716/5e4247622266a35ec3683b86/html5/thumbnails/30.jpg)
The C++ Memory model
30
C++ Memory Model BasicsData Races, Sequential Consistency, Synchronization
Meddling with Memory OrderRelaxed Atomic Operations, and subtle Consequences
![Page 31: C++ Memory Model · Data Race memory location [intro.memory(1.7)/3] an object of scalar type or a maximal sequence of adjacent non-zero width bit-fields conflicting action [intro.multithread(1.10)/4]](https://reader034.vdocument.in/reader034/viewer/2022050716/5e4247622266a35ec3683b86/html5/thumbnails/31.jpg)
std::atomic<>
• “Data race free” variable, e.g., std::atomic<int>
• (by default) provides inter-thread synchronization:
a store synchronizes with operations that load the stored value.
• (by default) sequential consistency
• Needs hardware support(not all platforms provide lock-free atomics)
31
![Page 32: C++ Memory Model · Data Race memory location [intro.memory(1.7)/3] an object of scalar type or a maximal sequence of adjacent non-zero width bit-fields conflicting action [intro.multithread(1.10)/4]](https://reader034.vdocument.in/reader034/viewer/2022050716/5e4247622266a35ec3683b86/html5/thumbnails/32.jpg)
Memory Order
Why only have one memory model when we can have a mix of 3 (and a half)?
32
sequentially consistent (SC)
acquire-release
relaxed
consume-release
seq_cst
acquire
release
acq_rel
relaxed
consume
release
acq_rel
memory_order_*memory model
*
![Page 33: C++ Memory Model · Data Race memory location [intro.memory(1.7)/3] an object of scalar type or a maximal sequence of adjacent non-zero width bit-fields conflicting action [intro.multithread(1.10)/4]](https://reader034.vdocument.in/reader034/viewer/2022050716/5e4247622266a35ec3683b86/html5/thumbnails/33.jpg)
sequentially consistent (SC)
acquire-release
relaxed
consume-release
seq_cst
acquire
release
acq_rel
relaxed
consume
release
acq_rel
*
Relaxed memory ordering
33
![Page 34: C++ Memory Model · Data Race memory location [intro.memory(1.7)/3] an object of scalar type or a maximal sequence of adjacent non-zero width bit-fields conflicting action [intro.multithread(1.10)/4]](https://reader034.vdocument.in/reader034/viewer/2022050716/5e4247622266a35ec3683b86/html5/thumbnails/34.jpg)
What is relaxed memory order
• Each memory location has a total modification order (however, this order cannot be observed directly)
• Memory operations performed by the same threadon the same memory location are not reordered with respect to the modification order.
34
std::atomic<int> x;
x.store(42, memory_order_relaxed);
x.load(memory_order_relaxed);
x.compare_exchange_weak(
n, 42, memory_order_relaxed
);
![Page 35: C++ Memory Model · Data Race memory location [intro.memory(1.7)/3] an object of scalar type or a maximal sequence of adjacent non-zero width bit-fields conflicting action [intro.multithread(1.10)/4]](https://reader034.vdocument.in/reader034/viewer/2022050716/5e4247622266a35ec3683b86/html5/thumbnails/35.jpg)
Relaxed load and store
0x.store(1);
x.store(2);
x.load();
x.load();
x.store(4);
x.store(3);
x.load();
Mo
dif
icat
ion
Ord
er
Thread #1 Thread #2x
1
35
![Page 36: C++ Memory Model · Data Race memory location [intro.memory(1.7)/3] an object of scalar type or a maximal sequence of adjacent non-zero width bit-fields conflicting action [intro.multithread(1.10)/4]](https://reader034.vdocument.in/reader034/viewer/2022050716/5e4247622266a35ec3683b86/html5/thumbnails/36.jpg)
Relaxed load and store
0x.store(1);
x.store(2);
x.load();
x.load();
x.store(4);
x.store(3);
x.load();
Mo
dif
icat
ion
Ord
er
Thread #1 Thread #2x
1
36
![Page 37: C++ Memory Model · Data Race memory location [intro.memory(1.7)/3] an object of scalar type or a maximal sequence of adjacent non-zero width bit-fields conflicting action [intro.multithread(1.10)/4]](https://reader034.vdocument.in/reader034/viewer/2022050716/5e4247622266a35ec3683b86/html5/thumbnails/37.jpg)
Relaxed load and store
0x.store(1);
x.store(2);
x.load();
x.load();
x.store(4);
x.store(3);
x.load();
Mo
dif
icat
ion
Ord
er
Thread #1 Thread #2x
1
2
37
![Page 38: C++ Memory Model · Data Race memory location [intro.memory(1.7)/3] an object of scalar type or a maximal sequence of adjacent non-zero width bit-fields conflicting action [intro.multithread(1.10)/4]](https://reader034.vdocument.in/reader034/viewer/2022050716/5e4247622266a35ec3683b86/html5/thumbnails/38.jpg)
Relaxed load and store
0x.store(1);
x.store(2);
x.load();
x.load();
x.store(4);
x.store(3);
x.load();
Mo
dif
icat
ion
Ord
er
Thread #1 Thread #2x
1
2
3
4
38
![Page 39: C++ Memory Model · Data Race memory location [intro.memory(1.7)/3] an object of scalar type or a maximal sequence of adjacent non-zero width bit-fields conflicting action [intro.multithread(1.10)/4]](https://reader034.vdocument.in/reader034/viewer/2022050716/5e4247622266a35ec3683b86/html5/thumbnails/39.jpg)
Relaxed load and store
0x.store(1);
x.store(2);
x.load();
x.load();
x.store(4);
x.store(3);
x.load();
Mo
dif
icat
ion
Ord
er
Thread #1 Thread #2x
1
2
3
4
39
![Page 40: C++ Memory Model · Data Race memory location [intro.memory(1.7)/3] an object of scalar type or a maximal sequence of adjacent non-zero width bit-fields conflicting action [intro.multithread(1.10)/4]](https://reader034.vdocument.in/reader034/viewer/2022050716/5e4247622266a35ec3683b86/html5/thumbnails/40.jpg)
Relaxed load and store
0x.store(1);
x.store(2);
x.load();
x.load();
x.store(4);
x.store(3);
x.load();
Mo
dif
icat
ion
Ord
er
Thread #1 Thread #2x
1
2
3
4
40
![Page 41: C++ Memory Model · Data Race memory location [intro.memory(1.7)/3] an object of scalar type or a maximal sequence of adjacent non-zero width bit-fields conflicting action [intro.multithread(1.10)/4]](https://reader034.vdocument.in/reader034/viewer/2022050716/5e4247622266a35ec3683b86/html5/thumbnails/41.jpg)
Relaxed load and store
0x.store(1);
x.store(2);
x.load();
x.load();
x.store(4);
x.store(3);
x.load();
Mo
dif
icat
ion
Ord
er
Thread #1 Thread #2x
1
2
3
4
41
![Page 42: C++ Memory Model · Data Race memory location [intro.memory(1.7)/3] an object of scalar type or a maximal sequence of adjacent non-zero width bit-fields conflicting action [intro.multithread(1.10)/4]](https://reader034.vdocument.in/reader034/viewer/2022050716/5e4247622266a35ec3683b86/html5/thumbnails/42.jpg)
Safe ?
atomic<bool> f=false;
atomic<bool> g=false;
Thread #1:
f.store(true, memory_order_relaxed);
g.store(true, memory_order_relaxed);
Thread #2:
while(!g.load(memory_order_relaxed));
assert(f.load(memory_order_relaxed));
42
Relaxed load and store
![Page 43: C++ Memory Model · Data Race memory location [intro.memory(1.7)/3] an object of scalar type or a maximal sequence of adjacent non-zero width bit-fields conflicting action [intro.multithread(1.10)/4]](https://reader034.vdocument.in/reader034/viewer/2022050716/5e4247622266a35ec3683b86/html5/thumbnails/43.jpg)
Safe ?
atomic<bool> f=false;
atomic<bool> g=false;
Thread #1:
f.store(true, memory_order_relaxed);
g.store(true, memory_order_relaxed);
Thread #2:
while(!g.load(memory_order_relaxed));
assert(f.load(memory_order_relaxed));
43
ff=true;
g=true;
while(
!g.load()
);
f.load();
Thread #1 Thread #2f
t
f
g
t
Relaxed load and store
![Page 44: C++ Memory Model · Data Race memory location [intro.memory(1.7)/3] an object of scalar type or a maximal sequence of adjacent non-zero width bit-fields conflicting action [intro.multithread(1.10)/4]](https://reader034.vdocument.in/reader034/viewer/2022050716/5e4247622266a35ec3683b86/html5/thumbnails/44.jpg)
Relaxed read-modify-write
0int oldVal=x.load();
Sets oldVal to 0
x.compare_exchange_weak(
oldVal, // expected
oldVal+1, // desired
memory_order_relaxed);
Returns falseSets oldVal to 1
x.compare_exchange_weak(
oldVal, // expected
oldVal+1, // desired
memory_order_relaxed);
Returns true
Mo
dif
icat
ion
Ord
er
x
1
lock-free ++x
2
44
![Page 45: C++ Memory Model · Data Race memory location [intro.memory(1.7)/3] an object of scalar type or a maximal sequence of adjacent non-zero width bit-fields conflicting action [intro.multithread(1.10)/4]](https://reader034.vdocument.in/reader034/viewer/2022050716/5e4247622266a35ec3683b86/html5/thumbnails/45.jpg)
Safe? Yes. Progress ? atomic<int> c = 0
Worker thread #1,#2, …:
for (int i=0; i<100; ++i) {
…
c.fetch_add(1, memory_order_relaxed);
}
Main thread:
start_n_threads();
join_n_threads();
assert(100*n == c);
45
Built-in forint oldVal=c.load();
while(!c.compare_exchange_weak(
oldVal, // expected
oldVal+1 // desired
memory_order_relaxed
);
![Page 46: C++ Memory Model · Data Race memory location [intro.memory(1.7)/3] an object of scalar type or a maximal sequence of adjacent non-zero width bit-fields conflicting action [intro.multithread(1.10)/4]](https://reader034.vdocument.in/reader034/viewer/2022050716/5e4247622266a35ec3683b86/html5/thumbnails/46.jpg)
Safe? Yes. Progress ? Yes.atomic<int> c = 0
Worker thread #1,#2, …:
for (int i=0; i<100; ++i) {
…
c.fetch_add(1, memory_order_relaxed);
}
Observing thread (“progress bar”):
for (int old_c = 0;;) {
int c_now = c.load(memory_order_relaxed);
assert(old_c <= c_now);
old_c = c_now;
}
Main thread:
start_n_threads();
join_n_threads();
assert(100*n == c);
46
![Page 47: C++ Memory Model · Data Race memory location [intro.memory(1.7)/3] an object of scalar type or a maximal sequence of adjacent non-zero width bit-fields conflicting action [intro.multithread(1.10)/4]](https://reader034.vdocument.in/reader034/viewer/2022050716/5e4247622266a35ec3683b86/html5/thumbnails/47.jpg)
sequentially consistent (SC)
acquire-release
relaxed
consume-release
seq_cst
acquire
release
acq_rel
relaxed
consume
release
acq_rel
*
The acquire/release model
47
![Page 48: C++ Memory Model · Data Race memory location [intro.memory(1.7)/3] an object of scalar type or a maximal sequence of adjacent non-zero width bit-fields conflicting action [intro.multithread(1.10)/4]](https://reader034.vdocument.in/reader034/viewer/2022050716/5e4247622266a35ec3683b86/html5/thumbnails/48.jpg)
What does acquire/release mean
• a store-release operation synchronizes with all load-acquire operations reading the stored value.
• All Operations in the releasing thread preceding the store-release happen-before all operations following the load-acquire in the acquiring thread.
48
x.store(42, memory_order_release);
x.load(memory_order_acquire);
![Page 49: C++ Memory Model · Data Race memory location [intro.memory(1.7)/3] an object of scalar type or a maximal sequence of adjacent non-zero width bit-fields conflicting action [intro.multithread(1.10)/4]](https://reader034.vdocument.in/reader034/viewer/2022050716/5e4247622266a35ec3683b86/html5/thumbnails/49.jpg)
Store-relase and load-acquire
0n=23;
x.store(1);
while(
!x.load()
);
int y=n;
assert(y==23);
Thread #1 Thread #2x
1
int n;
(non-atomic )
49
![Page 50: C++ Memory Model · Data Race memory location [intro.memory(1.7)/3] an object of scalar type or a maximal sequence of adjacent non-zero width bit-fields conflicting action [intro.multithread(1.10)/4]](https://reader034.vdocument.in/reader034/viewer/2022050716/5e4247622266a35ec3683b86/html5/thumbnails/50.jpg)
“Synchronizes with” relation:
• Refers to operations at runtime.
• NOT about statements in the source code!
A note of caution!
0n=23;
x.store(1);
x.load();
int y=n;
Thread #1 Thread #2x
1
int n;
(non-atomic )
No synchronization.
Data Race!
50
![Page 51: C++ Memory Model · Data Race memory location [intro.memory(1.7)/3] an object of scalar type or a maximal sequence of adjacent non-zero width bit-fields conflicting action [intro.multithread(1.10)/4]](https://reader034.vdocument.in/reader034/viewer/2022050716/5e4247622266a35ec3683b86/html5/thumbnails/51.jpg)
Is the answer 42 ? Yes.
atomic<bool> f=false;
atomic<bool> g=false;
int n;
Thread #1:
n = 42;
f.store(true, memory_order_release);
Thread #2:
while(!f.load(memory_order_acquire));
g.store(true, memory_order_release);
Thread #3:
while(!g.load(memory_order_acquire));
assert(42 == n);
51
![Page 52: C++ Memory Model · Data Race memory location [intro.memory(1.7)/3] an object of scalar type or a maximal sequence of adjacent non-zero width bit-fields conflicting action [intro.multithread(1.10)/4]](https://reader034.vdocument.in/reader034/viewer/2022050716/5e4247622266a35ec3683b86/html5/thumbnails/52.jpg)
sequentially consistent (SC)
acquire-release
relaxed
consume-release
seq_cst
acquire
release
acq_rel
relaxed
consume
release
acq_rel
*
The consume/release model
52
![Page 53: C++ Memory Model · Data Race memory location [intro.memory(1.7)/3] an object of scalar type or a maximal sequence of adjacent non-zero width bit-fields conflicting action [intro.multithread(1.10)/4]](https://reader034.vdocument.in/reader034/viewer/2022050716/5e4247622266a35ec3683b86/html5/thumbnails/53.jpg)
What does consume mean
• “Light version” of acquire/release
• All Operations in the releasing thread preceding the store-release happen-before an operation X in the consuming thread if X depends on the value loaded.
53
x.store(42, memory_order_release);
x.load(memory_order_consume);
![Page 54: C++ Memory Model · Data Race memory location [intro.memory(1.7)/3] an object of scalar type or a maximal sequence of adjacent non-zero width bit-fields conflicting action [intro.multithread(1.10)/4]](https://reader034.vdocument.in/reader034/viewer/2022050716/5e4247622266a35ec3683b86/html5/thumbnails/54.jpg)
Who has the answer? x->i
struct X { int i; }
int n;
std::atomic<X*> px;
Thread #1:
n = 42;
auto x = new X;
x->i = 42;
px.store(x, memory_order_release);
Thread #2:
X* x;
while(!x=px.load(memory_order_consume));
assert(42 == x->i);
assert(42 == n);
54
Typical examples for “X depends on the value loaded”
• X dereferences a pointer that has been loaded
• X is accessing array at index which has been loaded
Data Race !
![Page 55: C++ Memory Model · Data Race memory location [intro.memory(1.7)/3] an object of scalar type or a maximal sequence of adjacent non-zero width bit-fields conflicting action [intro.multithread(1.10)/4]](https://reader034.vdocument.in/reader034/viewer/2022050716/5e4247622266a35ec3683b86/html5/thumbnails/55.jpg)
Acquire/release provides very strong guarantees.
Do we still need more?
Who asked for sequential consistency ?
55
![Page 56: C++ Memory Model · Data Race memory location [intro.memory(1.7)/3] an object of scalar type or a maximal sequence of adjacent non-zero width bit-fields conflicting action [intro.multithread(1.10)/4]](https://reader034.vdocument.in/reader034/viewer/2022050716/5e4247622266a35ec3683b86/html5/thumbnails/56.jpg)
Dekker’s algorithm revisited
atomic<bool> f1=false;
atomic<bool> f2=false;
Thread #1:
f1.store(true, memory_order_release);
if (!f2.load(memory_order_acquire)) {
// critical section
}
Thread #2:
f2.store(true, memory_order_release);
if (!f1.load(memory_order_acquire)) {
// critical section
}
56
?
![Page 57: C++ Memory Model · Data Race memory location [intro.memory(1.7)/3] an object of scalar type or a maximal sequence of adjacent non-zero width bit-fields conflicting action [intro.multithread(1.10)/4]](https://reader034.vdocument.in/reader034/viewer/2022050716/5e4247622266a35ec3683b86/html5/thumbnails/57.jpg)
57
ff1=true;
if (!f2.load()) {
// critical
// section
}
Thread #1 Thread #2f1
t
f
f2
t
f2=true;
if (!f1.load()) {
// critical
// section
}
Dekker’s algorithm revisited
Oh noes!
![Page 58: C++ Memory Model · Data Race memory location [intro.memory(1.7)/3] an object of scalar type or a maximal sequence of adjacent non-zero width bit-fields conflicting action [intro.multithread(1.10)/4]](https://reader034.vdocument.in/reader034/viewer/2022050716/5e4247622266a35ec3683b86/html5/thumbnails/58.jpg)
sequentially consistent (SC)
acquire-release
relaxed
consume-release
seq_cst
acquire
release
acq_rel
relaxed
consume
release
acq_rel
*
Back to sanity sequential consistency
58
![Page 59: C++ Memory Model · Data Race memory location [intro.memory(1.7)/3] an object of scalar type or a maximal sequence of adjacent non-zero width bit-fields conflicting action [intro.multithread(1.10)/4]](https://reader034.vdocument.in/reader034/viewer/2022050716/5e4247622266a35ec3683b86/html5/thumbnails/59.jpg)
Dekker’s algorithm done right.
atomic<bool> f1=false;
atomic<bool> f2=false;
Thread #1:
f1.store(true, memory_order_seq_cst);
if (!f2.load(memory_order_seq_cst)) {
// critical section
}
Thread #2:
f2.store(true, memory_order_seq_cst);
if (!f1.load(memory_order_seq_cst)) {
// critical section
}
59
![Page 60: C++ Memory Model · Data Race memory location [intro.memory(1.7)/3] an object of scalar type or a maximal sequence of adjacent non-zero width bit-fields conflicting action [intro.multithread(1.10)/4]](https://reader034.vdocument.in/reader034/viewer/2022050716/5e4247622266a35ec3683b86/html5/thumbnails/60.jpg)
• Global, total order of load and store operations
• At any given time, each memory location has only one value*
* assuming there are no data races
60
Dekker’s algorithm done right.
ff1=true;
if (!f2.load()) {
}
Thread #1 Thread #2f1
t
f
f2
tf2=true;
if (!f1.load()) {
}
![Page 61: C++ Memory Model · Data Race memory location [intro.memory(1.7)/3] an object of scalar type or a maximal sequence of adjacent non-zero width bit-fields conflicting action [intro.multithread(1.10)/4]](https://reader034.vdocument.in/reader034/viewer/2022050716/5e4247622266a35ec3683b86/html5/thumbnails/61.jpg)
Use-cases for non-SC atomics
• target platform is ARM (<v8) or PowerPC
• operation counters
• some reference counters• but then you may use std::shared_ptr
• lazy initialization• but for this C++ also brings std::call_once
PROFILE FIRST before meddling with memory_order!
61
![Page 62: C++ Memory Model · Data Race memory location [intro.memory(1.7)/3] an object of scalar type or a maximal sequence of adjacent non-zero width bit-fields conflicting action [intro.multithread(1.10)/4]](https://reader034.vdocument.in/reader034/viewer/2022050716/5e4247622266a35ec3683b86/html5/thumbnails/62.jpg)
Wrap up
• Do not write Data Races!
• The C++ Memory Model gives reasonable guarantees to implement correct, yet performant algorithms.
• It allows us to deviate from sequential consistency if we need to.
62
![Page 63: C++ Memory Model · Data Race memory location [intro.memory(1.7)/3] an object of scalar type or a maximal sequence of adjacent non-zero width bit-fields conflicting action [intro.multithread(1.10)/4]](https://reader034.vdocument.in/reader034/viewer/2022050716/5e4247622266a35ec3683b86/html5/thumbnails/63.jpg)
think-cell
Chausseestraße 8/E
10115 Berlin
Germany
Tel +49-30-666473-10
Fax +49-30-666473-19
www.think-cell.com
![Page 64: C++ Memory Model · Data Race memory location [intro.memory(1.7)/3] an object of scalar type or a maximal sequence of adjacent non-zero width bit-fields conflicting action [intro.multithread(1.10)/4]](https://reader034.vdocument.in/reader034/viewer/2022050716/5e4247622266a35ec3683b86/html5/thumbnails/64.jpg)
Bibliography
• C++ Concurrency in Action – Anthony Williams – 2012
• Atomic Weapons – Herb Sutter – 2012
• Preshing on Programming – Jeff Preshing –http://preshing.com accessed Dec. 2013
• ISO C++ Working Draft N3337 – 2012
• Foundations of the C++ Concurrency Memory Model –H. Boehm, S. V. Adve – 2008
• How to make a Multiprocessor Computer that correctly executes Multiprocess Programs – Leslie Lamport –1979
64
![Page 65: C++ Memory Model · Data Race memory location [intro.memory(1.7)/3] an object of scalar type or a maximal sequence of adjacent non-zero width bit-fields conflicting action [intro.multithread(1.10)/4]](https://reader034.vdocument.in/reader034/viewer/2022050716/5e4247622266a35ec3683b86/html5/thumbnails/65.jpg)
std::atomic<> on x86/x64
65
load store compare_exchange
memory_order_seq_cst
memory_order_acquirememory_order_releasememory_order_acq_rel
memory_order_relaxed
MOVprevent compiler optimizations
MOVprevent compiler optimizations
MOV
(LOCK) XCHGprevent compiler optimizations
MOVprevent compiler optimizations
MOV
LOCK CMPXCHGprevent compiler optimizations
LOCK CMPXCHGprevent compiler optimizations
LOCK CMPXCHG
[ http://www.cl.cam.ac.uk/~pes20/cpp/cpp0xmappings.html ]
![Page 66: C++ Memory Model · Data Race memory location [intro.memory(1.7)/3] an object of scalar type or a maximal sequence of adjacent non-zero width bit-fields conflicting action [intro.multithread(1.10)/4]](https://reader034.vdocument.in/reader034/viewer/2022050716/5e4247622266a35ec3683b86/html5/thumbnails/66.jpg)
std::atomic<> on ARMv7
66
load store compare_exchange
memory_order_seq_cst
memory_order_acquirememory_order_releasememory_order_acq_rel
memory_order_relaxed
ldr ; dmbprevent compiler optimizations
ldr ; dmbprevent compiler optimizations
ldr
dmb ; str ; dmb ;prevent compiler optimizations
dmb ; strprevent compiler optimizations
str
dmb ; LOOP ; isbprevent compiler optimizations
(dmb;) LOOP (;isb)prevent compiler optimizations
LOOP
Subroutine LOOP := _loop:
ldrex roldval, [rptr];mov rres, 0; teq roldval, rold;strexeq rres, rnewval, [rptr];teq rres, 0; bne _loop
[ http://www.cl.cam.ac.uk/~pes20/cpp/cpp0xmappings.html ]