the c/c++ memory model - github pages · ©2019 check point software technologies ltd. 23...
TRANSCRIPT
![Page 1: The C/C++ Memory Model - GitHub Pages · ©2019 Check Point Software Technologies Ltd. 23 •Sequential consistency (SC): Defined in 1979 by Leslie Lamport: “the result of any execution](https://reader033.vdocument.in/reader033/viewer/2022042109/5e892ffb23666b71ab336ec6/html5/thumbnails/1.jpg)
1 ©2019 Check Point Software Technologies Ltd.
Yossi Moalem
The C/C++ Memory Model
![Page 2: The C/C++ Memory Model - GitHub Pages · ©2019 Check Point Software Technologies Ltd. 23 •Sequential consistency (SC): Defined in 1979 by Leslie Lamport: “the result of any execution](https://reader033.vdocument.in/reader033/viewer/2022042109/5e892ffb23666b71ab336ec6/html5/thumbnails/2.jpg)
2 ©2019 Check Point Software Technologies Ltd.
• Part 1 (Boring): some theory
• Part 2 (Cool): consequences
• Part 3 (Important): applications
Part 3.14 (If time permits): a touch on atomics
Agenda
![Page 3: The C/C++ Memory Model - GitHub Pages · ©2019 Check Point Software Technologies Ltd. 23 •Sequential consistency (SC): Defined in 1979 by Leslie Lamport: “the result of any execution](https://reader033.vdocument.in/reader033/viewer/2022042109/5e892ffb23666b71ab336ec6/html5/thumbnails/3.jpg)
3 ©2019 Check Point Software Technologies Ltd.
Some Theory
![Page 4: The C/C++ Memory Model - GitHub Pages · ©2019 Check Point Software Technologies Ltd. 23 •Sequential consistency (SC): Defined in 1979 by Leslie Lamport: “the result of any execution](https://reader033.vdocument.in/reader033/viewer/2022042109/5e892ffb23666b71ab336ec6/html5/thumbnails/4.jpg)
4 ©2019 Check Point Software Technologies Ltd.
The computer we think we program for
CPU RAM
And then came Gordon Moore
![Page 5: The C/C++ Memory Model - GitHub Pages · ©2019 Check Point Software Technologies Ltd. 23 •Sequential consistency (SC): Defined in 1979 by Leslie Lamport: “the result of any execution](https://reader033.vdocument.in/reader033/viewer/2022042109/5e892ffb23666b71ab336ec6/html5/thumbnails/5.jpg)
5 ©2019 Check Point Software Technologies Ltd.
The number of transistors in a dense integrated circuit doubles every two years
CPU became much faster, very fast
Memory speed did not advance so fast
Moore’s Law
![Page 6: The C/C++ Memory Model - GitHub Pages · ©2019 Check Point Software Technologies Ltd. 23 •Sequential consistency (SC): Defined in 1979 by Leslie Lamport: “the result of any execution](https://reader033.vdocument.in/reader033/viewer/2022042109/5e892ffb23666b71ab336ec6/html5/thumbnails/6.jpg)
6 ©2019 Check Point Software Technologies Ltd.
• L1 Cache: 2-4 cycles
32K, for instruction, 32K for Data
Per core, shared between HW threads
• L2 Cache 15-18 cycles
256K, shared for instructions and data
Per CPU
• L3 Cache 30-40 cycles
32M, shared for instructions and data
Per Machine
• Main Memory Over 100 cycles (150-200 and maybe more!)
Cache Hierarchy, The numbers
Faster Larger
![Page 7: The C/C++ Memory Model - GitHub Pages · ©2019 Check Point Software Technologies Ltd. 23 •Sequential consistency (SC): Defined in 1979 by Leslie Lamport: “the result of any execution](https://reader033.vdocument.in/reader033/viewer/2022042109/5e892ffb23666b71ab336ec6/html5/thumbnails/7.jpg)
7 ©2019 Check Point Software Technologies Ltd.
The computer we are actually programming for
CPU Core
RAM
CPU Core
CPU Core
CPU Core
L1I
L2
L2
L3
T1
T2
T1
T2
T1
T2
T1
T2
L1D
L1I
L1D
L1I
L1D
L1I
L1D
sb
sb
sb
sb
Not to scale!
![Page 8: The C/C++ Memory Model - GitHub Pages · ©2019 Check Point Software Technologies Ltd. 23 •Sequential consistency (SC): Defined in 1979 by Leslie Lamport: “the result of any execution](https://reader033.vdocument.in/reader033/viewer/2022042109/5e892ffb23666b71ab336ec6/html5/thumbnails/8.jpg)
8 ©2019 Check Point Software Technologies Ltd.
One picture worth 1000 words One animation worth 100 pictures
CPU
L1
L2
L3
Main memory
To scale!
![Page 9: The C/C++ Memory Model - GitHub Pages · ©2019 Check Point Software Technologies Ltd. 23 •Sequential consistency (SC): Defined in 1979 by Leslie Lamport: “the result of any execution](https://reader033.vdocument.in/reader033/viewer/2022042109/5e892ffb23666b71ab336ec6/html5/thumbnails/9.jpg)
9 ©2019 Check Point Software Technologies Ltd.
• Swapping in/out should be efficient
• Minimize maintenance
• Non-consecutive
• Locality
Cache: Requirements
![Page 10: The C/C++ Memory Model - GitHub Pages · ©2019 Check Point Software Technologies Ltd. 23 •Sequential consistency (SC): Defined in 1979 by Leslie Lamport: “the result of any execution](https://reader033.vdocument.in/reader033/viewer/2022042109/5e892ffb23666b71ab336ec6/html5/thumbnails/10.jpg)
10 ©2019 Check Point Software Technologies Ltd.
• Fixed size block of memory
• Smallest cache-able unit
• When memory required, the whole cache line is swapped in
Cache Line
Main Memory
Cache
![Page 11: The C/C++ Memory Model - GitHub Pages · ©2019 Check Point Software Technologies Ltd. 23 •Sequential consistency (SC): Defined in 1979 by Leslie Lamport: “the result of any execution](https://reader033.vdocument.in/reader033/viewer/2022042109/5e892ffb23666b71ab336ec6/html5/thumbnails/11.jpg)
11 ©2019 Check Point Software Technologies Ltd.
Cache lookup
Cache
Main Memory (Cache lines)
![Page 12: The C/C++ Memory Model - GitHub Pages · ©2019 Check Point Software Technologies Ltd. 23 •Sequential consistency (SC): Defined in 1979 by Leslie Lamport: “the result of any execution](https://reader033.vdocument.in/reader033/viewer/2022042109/5e892ffb23666b71ab336ec6/html5/thumbnails/12.jpg)
12 ©2019 Check Point Software Technologies Ltd.
• Compiler can rearrange the code
Rearrange/remove memory access
Reuse location
• To use the HW better:
Provide better locality
Reduce stale cycles
• Maintain observable state, from the same thread POV
Compiler has freedom!
![Page 13: The C/C++ Memory Model - GitHub Pages · ©2019 Check Point Software Technologies Ltd. 23 •Sequential consistency (SC): Defined in 1979 by Leslie Lamport: “the result of any execution](https://reader033.vdocument.in/reader033/viewer/2022042109/5e892ffb23666b71ab336ec6/html5/thumbnails/13.jpg)
13 ©2019 Check Point Software Technologies Ltd.
Compiler reordering
![Page 14: The C/C++ Memory Model - GitHub Pages · ©2019 Check Point Software Technologies Ltd. 23 •Sequential consistency (SC): Defined in 1979 by Leslie Lamport: “the result of any execution](https://reader033.vdocument.in/reader033/viewer/2022042109/5e892ffb23666b71ab336ec6/html5/thumbnails/14.jpg)
14 ©2019 Check Point Software Technologies Ltd.
• Introduced for MMIO
• Reorder non-volatile and volatile is permitted
• Every access to the variable will be restricted
Volatile
![Page 15: The C/C++ Memory Model - GitHub Pages · ©2019 Check Point Software Technologies Ltd. 23 •Sequential consistency (SC): Defined in 1979 by Leslie Lamport: “the result of any execution](https://reader033.vdocument.in/reader033/viewer/2022042109/5e892ffb23666b71ab336ec6/html5/thumbnails/15.jpg)
15 ©2019 Check Point Software Technologies Ltd.
Compiler Barrier
![Page 16: The C/C++ Memory Model - GitHub Pages · ©2019 Check Point Software Technologies Ltd. 23 •Sequential consistency (SC): Defined in 1979 by Leslie Lamport: “the result of any execution](https://reader033.vdocument.in/reader033/viewer/2022042109/5e892ffb23666b71ab336ec6/html5/thumbnails/16.jpg)
16 ©2019 Check Point Software Technologies Ltd.
• Fetch instruction
• Push to instruction queue
• Wait for the operands
• Dispatched (even before earlier instructions)
• Executes
• Write result to queue
• When its time comes (older results have been written), write result to register file
HW has freedom too!
![Page 17: The C/C++ Memory Model - GitHub Pages · ©2019 Check Point Software Technologies Ltd. 23 •Sequential consistency (SC): Defined in 1979 by Leslie Lamport: “the result of any execution](https://reader033.vdocument.in/reader033/viewer/2022042109/5e892ffb23666b71ab336ec6/html5/thumbnails/17.jpg)
17 ©2019 Check Point Software Technologies Ltd.
• Data availability order, rather then instruction order.
• Reduce memory access wait time
• Several functional units
Out of Order Execution
![Page 18: The C/C++ Memory Model - GitHub Pages · ©2019 Check Point Software Technologies Ltd. 23 •Sequential consistency (SC): Defined in 1979 by Leslie Lamport: “the result of any execution](https://reader033.vdocument.in/reader033/viewer/2022042109/5e892ffb23666b71ab336ec6/html5/thumbnails/18.jpg)
18 ©2019 Check Point Software Technologies Ltd.
Assume finished and answer initialized to 0
CPU Reordering
<Core #2 > while (! finished) { ; } cout << answer;
<Core #1> answer = 42; finished = 1; May reorder May reorder
![Page 19: The C/C++ Memory Model - GitHub Pages · ©2019 Check Point Software Technologies Ltd. 23 •Sequential consistency (SC): Defined in 1979 by Leslie Lamport: “the result of any execution](https://reader033.vdocument.in/reader033/viewer/2022042109/5e892ffb23666b71ab336ec6/html5/thumbnails/19.jpg)
19 ©2019 Check Point Software Technologies Ltd.
• Several instructions execute in parallel
• Threads needs to communicate
• Reads and writes order reasoning
Instruction Reordering
![Page 20: The C/C++ Memory Model - GitHub Pages · ©2019 Check Point Software Technologies Ltd. 23 •Sequential consistency (SC): Defined in 1979 by Leslie Lamport: “the result of any execution](https://reader033.vdocument.in/reader033/viewer/2022042109/5e892ffb23666b71ab336ec6/html5/thumbnails/20.jpg)
20 ©2019 Check Point Software Technologies Ltd.
• The set of allowed reorders
LoadLoad
LoadStore
StoreStore
StoreLoad
• C++ did not define memory model until C++ 11
Language Memory Model
![Page 21: The C/C++ Memory Model - GitHub Pages · ©2019 Check Point Software Technologies Ltd. 23 •Sequential consistency (SC): Defined in 1979 by Leslie Lamport: “the result of any execution](https://reader033.vdocument.in/reader033/viewer/2022042109/5e892ffb23666b71ab336ec6/html5/thumbnails/21.jpg)
21 ©2019 Check Point Software Technologies Ltd.
• Ordering with respect to memory access
• The toolchain will issue the correct fence for this architecture
• If such fence is unavailable, a stronger fence is issued
Full fence
One way fence
• Synchronization mechanisms include the required fence
Memory Fence (memory barrier, machine barrier)
![Page 22: The C/C++ Memory Model - GitHub Pages · ©2019 Check Point Software Technologies Ltd. 23 •Sequential consistency (SC): Defined in 1979 by Leslie Lamport: “the result of any execution](https://reader033.vdocument.in/reader033/viewer/2022042109/5e892ffb23666b71ab336ec6/html5/thumbnails/22.jpg)
22 ©2019 Check Point Software Technologies Ltd.
CPU Reordering, Example Revised
<CPU #2> while ( ! finished.load(memory_order_acquire)) { ;} cout << answer;
<CPU #1> answer = 42; finished.store(1, memory_order_release);
Synchronize with
Sequenced before
Sequenced before
Happens before
![Page 23: The C/C++ Memory Model - GitHub Pages · ©2019 Check Point Software Technologies Ltd. 23 •Sequential consistency (SC): Defined in 1979 by Leslie Lamport: “the result of any execution](https://reader033.vdocument.in/reader033/viewer/2022042109/5e892ffb23666b71ab336ec6/html5/thumbnails/23.jpg)
23 ©2019 Check Point Software Technologies Ltd.
• Sequential consistency (SC): Defined in 1979 by Leslie Lamport: “the result of any execution is the same as if the reads and writes occurred in some order, and the operations of each individual processor appear in this sequence in the order specified by its program”
• Data race: simultaneously accessing object by two threads, and at least one thread is a writer.
• Simultaneously: without happens-before ordering.
SC-DRF
Appearing to execute the program you wrote, as long as you didn’t write a data race.
![Page 24: The C/C++ Memory Model - GitHub Pages · ©2019 Check Point Software Technologies Ltd. 23 •Sequential consistency (SC): Defined in 1979 by Leslie Lamport: “the result of any execution](https://reader033.vdocument.in/reader033/viewer/2022042109/5e892ffb23666b71ab336ec6/html5/thumbnails/24.jpg)
24 ©2019 Check Point Software Technologies Ltd.
data = 42;
t1.store (1);
Transitivity
if (t1.load() )
t2.store(1);
if (t2.load() )
// data == 42 ??
Thread #1 Thread #2 Thread #3
![Page 25: The C/C++ Memory Model - GitHub Pages · ©2019 Check Point Software Technologies Ltd. 23 •Sequential consistency (SC): Defined in 1979 by Leslie Lamport: “the result of any execution](https://reader033.vdocument.in/reader033/viewer/2022042109/5e892ffb23666b71ab336ec6/html5/thumbnails/25.jpg)
25 ©2019 Check Point Software Technologies Ltd.
Bottom line:
• Compiler and CPU may reorder instructions • SC-DRF • Fences restricts the reordering
• Memory is divided into Cache Lines • This is the smallest cacheable unit • Cache line can be placed in a restricted number of “cache slots”
![Page 26: The C/C++ Memory Model - GitHub Pages · ©2019 Check Point Software Technologies Ltd. 23 •Sequential consistency (SC): Defined in 1979 by Leslie Lamport: “the result of any execution](https://reader033.vdocument.in/reader033/viewer/2022042109/5e892ffb23666b71ab336ec6/html5/thumbnails/26.jpg)
26 ©2019 Check Point Software Technologies Ltd.
Consequences
![Page 27: The C/C++ Memory Model - GitHub Pages · ©2019 Check Point Software Technologies Ltd. 23 •Sequential consistency (SC): Defined in 1979 by Leslie Lamport: “the result of any execution](https://reader033.vdocument.in/reader033/viewer/2022042109/5e892ffb23666b71ab336ec6/html5/thumbnails/27.jpg)
27 ©2019 Check Point Software Technologies Ltd.
First seen at http://preshing.com
The Idea
<Thread 1> x = 1; // compiler barrier here rY = y;
<Thread 2> y = 1; // compiler barrier here rX = x;
x, y, rX and rY initialized to zero
![Page 28: The C/C++ Memory Model - GitHub Pages · ©2019 Check Point Software Technologies Ltd. 23 •Sequential consistency (SC): Defined in 1979 by Leslie Lamport: “the result of any execution](https://reader033.vdocument.in/reader033/viewer/2022042109/5e892ffb23666b71ab336ec6/html5/thumbnails/28.jpg)
28 ©2019 Check Point Software Technologies Ltd.
The Idea, cont’d <Main thread> Initialize all semaphores Spawns two threads Do forever: Initialize to zero Post on start semaphores Wait on end semaphore Wait on end semaphore Check if both, rX and rY are zero
<Thread 1> Do forever: Wait on start semaphore #1
x = 1; Compiler Barrier
rY = y; Post on end semaphore
<Thread 2> Do forever: Wait on start semaphore #2
y= 1; Compiler Barrier
rX = x; Post on end semaphore
![Page 29: The C/C++ Memory Model - GitHub Pages · ©2019 Check Point Software Technologies Ltd. 23 •Sequential consistency (SC): Defined in 1979 by Leslie Lamport: “the result of any execution](https://reader033.vdocument.in/reader033/viewer/2022042109/5e892ffb23666b71ab336ec6/html5/thumbnails/29.jpg)
29 ©2019 Check Point Software Technologies Ltd.
Compiled with no optimization:
Results:
Compiled with O3
![Page 30: The C/C++ Memory Model - GitHub Pages · ©2019 Check Point Software Technologies Ltd. 23 •Sequential consistency (SC): Defined in 1979 by Leslie Lamport: “the result of any execution](https://reader033.vdocument.in/reader033/viewer/2022042109/5e892ffb23666b71ab336ec6/html5/thumbnails/30.jpg)
30 ©2019 Check Point Software Technologies Ltd.
• Replacing the compiler barrier with memory fence
• Demonstrating this would be kind’a boring….
Solution
x = 1; atomic_thread_fence(memory_order_seq_cst); rY = y;
![Page 31: The C/C++ Memory Model - GitHub Pages · ©2019 Check Point Software Technologies Ltd. 23 •Sequential consistency (SC): Defined in 1979 by Leslie Lamport: “the result of any execution](https://reader033.vdocument.in/reader033/viewer/2022042109/5e892ffb23666b71ab336ec6/html5/thumbnails/31.jpg)
31 ©2019 Check Point Software Technologies Ltd.
• CPU never sees its own data out of order
• Setting affinity just for this may be a huge overkill
Another approach: CPU affinity
![Page 32: The C/C++ Memory Model - GitHub Pages · ©2019 Check Point Software Technologies Ltd. 23 •Sequential consistency (SC): Defined in 1979 by Leslie Lamport: “the result of any execution](https://reader033.vdocument.in/reader033/viewer/2022042109/5e892ffb23666b71ab336ec6/html5/thumbnails/32.jpg)
32 ©2019 Check Point Software Technologies Ltd.
Count the amount of odds
in a matrix
How hard can it be…
Matrix traversal
![Page 33: The C/C++ Memory Model - GitHub Pages · ©2019 Check Point Software Technologies Ltd. 23 •Sequential consistency (SC): Defined in 1979 by Leslie Lamport: “the result of any execution](https://reader033.vdocument.in/reader033/viewer/2022042109/5e892ffb23666b71ab336ec6/html5/thumbnails/33.jpg)
33 ©2019 Check Point Software Technologies Ltd.
Row by row Vs. col by col
![Page 34: The C/C++ Memory Model - GitHub Pages · ©2019 Check Point Software Technologies Ltd. 23 •Sequential consistency (SC): Defined in 1979 by Leslie Lamport: “the result of any execution](https://reader033.vdocument.in/reader033/viewer/2022042109/5e892ffb23666b71ab336ec6/html5/thumbnails/34.jpg)
34 ©2019 Check Point Software Technologies Ltd.
Is Col by col the worst?
![Page 35: The C/C++ Memory Model - GitHub Pages · ©2019 Check Point Software Technologies Ltd. 23 •Sequential consistency (SC): Defined in 1979 by Leslie Lamport: “the result of any execution](https://reader033.vdocument.in/reader033/viewer/2022042109/5e892ffb23666b71ab336ec6/html5/thumbnails/35.jpg)
35 ©2019 Check Point Software Technologies Ltd.
Reducing matrix size
![Page 36: The C/C++ Memory Model - GitHub Pages · ©2019 Check Point Software Technologies Ltd. 23 •Sequential consistency (SC): Defined in 1979 by Leslie Lamport: “the result of any execution](https://reader033.vdocument.in/reader033/viewer/2022042109/5e892ffb23666b71ab336ec6/html5/thumbnails/36.jpg)
36 ©2019 Check Point Software Technologies Ltd.
• Should be easy
Just make sure we do not have races
Multi-Core
![Page 37: The C/C++ Memory Model - GitHub Pages · ©2019 Check Point Software Technologies Ltd. 23 •Sequential consistency (SC): Defined in 1979 by Leslie Lamport: “the result of any execution](https://reader033.vdocument.in/reader033/viewer/2022042109/5e892ffb23666b71ab336ec6/html5/thumbnails/37.jpg)
37 ©2019 Check Point Software Technologies Ltd.
int counter[NumOfThreads];
for( int p = 0; p < NumOfThreads; ++p )
pool.run( count(p) );
pool.join();
odds = 0;
for( int p = 0; p < NumOfThreads; ++p )
odds += counter[p];
Solution Attempt #1 auto count = [] (int threadNum) {
counter[threadNum] = 0;
int chunkSize = DIM/(threadNum + 1);
int myStart = threadNum * chunkSize;
int myEnd = min( myStart+chunkSize, DIM );
for( int row = myStart; row < myEnd; ++row )
for( int col = 0; col < DIM; ++col )
if( matrix[row*DIM + col] % 2 != 0 )
++counter[threadNum];
} );
![Page 38: The C/C++ Memory Model - GitHub Pages · ©2019 Check Point Software Technologies Ltd. 23 •Sequential consistency (SC): Defined in 1979 by Leslie Lamport: “the result of any execution](https://reader033.vdocument.in/reader033/viewer/2022042109/5e892ffb23666b71ab336ec6/html5/thumbnails/38.jpg)
38 ©2019 Check Point Software Technologies Ltd.
Solution Attempt #1: Results
![Page 39: The C/C++ Memory Model - GitHub Pages · ©2019 Check Point Software Technologies Ltd. 23 •Sequential consistency (SC): Defined in 1979 by Leslie Lamport: “the result of any execution](https://reader033.vdocument.in/reader033/viewer/2022042109/5e892ffb23666b71ab336ec6/html5/thumbnails/39.jpg)
39 ©2019 Check Point Software Technologies Ltd.
And in my run
![Page 40: The C/C++ Memory Model - GitHub Pages · ©2019 Check Point Software Technologies Ltd. 23 •Sequential consistency (SC): Defined in 1979 by Leslie Lamport: “the result of any execution](https://reader033.vdocument.in/reader033/viewer/2022042109/5e892ffb23666b71ab336ec6/html5/thumbnails/40.jpg)
40 ©2019 Check Point Software Technologies Ltd.
![Page 41: The C/C++ Memory Model - GitHub Pages · ©2019 Check Point Software Technologies Ltd. 23 •Sequential consistency (SC): Defined in 1979 by Leslie Lamport: “the result of any execution](https://reader033.vdocument.in/reader033/viewer/2022042109/5e892ffb23666b71ab336ec6/html5/thumbnails/41.jpg)
41 ©2019 Check Point Software Technologies Ltd.
int counter[NumOfThreads];
for( int p = 0; p < NumOfThreads; ++p )
pool.run( count(p) );
pool.join();
odds = 0;
for( int p = 0; p < NumOfThreads; ++p )
odds += counter[p];
Solution Attempt #1 auto count = [] (int threadNum) {
counter[threadNum] = 0; int chunkSize = DIM/(threadNum + 1); int myStart = threadNum * chunkSize; int myEnd = min( myStart+chunkSize, DIM );
int count = 0;
for( int row = myStart; row < myEnd; ++row )
for( int col = 0; col < DIM; ++col )
if( matrix[row*DIM + col] % 2 != 0 )
++count; //++counter[threadNum];
couter[threadNum] = count;
} );
![Page 42: The C/C++ Memory Model - GitHub Pages · ©2019 Check Point Software Technologies Ltd. 23 •Sequential consistency (SC): Defined in 1979 by Leslie Lamport: “the result of any execution](https://reader033.vdocument.in/reader033/viewer/2022042109/5e892ffb23666b71ab336ec6/html5/thumbnails/42.jpg)
42 ©2019 Check Point Software Technologies Ltd.
• Yes…
Could this make a difference??
![Page 43: The C/C++ Memory Model - GitHub Pages · ©2019 Check Point Software Technologies Ltd. 23 •Sequential consistency (SC): Defined in 1979 by Leslie Lamport: “the result of any execution](https://reader033.vdocument.in/reader033/viewer/2022042109/5e892ffb23666b71ab336ec6/html5/thumbnails/43.jpg)
43 ©2019 Check Point Software Technologies Ltd.
Again, in my run
![Page 44: The C/C++ Memory Model - GitHub Pages · ©2019 Check Point Software Technologies Ltd. 23 •Sequential consistency (SC): Defined in 1979 by Leslie Lamport: “the result of any execution](https://reader033.vdocument.in/reader033/viewer/2022042109/5e892ffb23666b71ab336ec6/html5/thumbnails/44.jpg)
44 ©2019 Check Point Software Technologies Ltd.
• Heap allocation, globals, statics
Even from different translation units
• False sharing requires
Several cores accessing the same cache line
Frequently
At least one is writer
Not only arrays
![Page 45: The C/C++ Memory Model - GitHub Pages · ©2019 Check Point Software Technologies Ltd. 23 •Sequential consistency (SC): Defined in 1979 by Leslie Lamport: “the result of any execution](https://reader033.vdocument.in/reader033/viewer/2022042109/5e892ffb23666b71ab336ec6/html5/thumbnails/45.jpg)
45 ©2019 Check Point Software Technologies Ltd.
Applications
![Page 46: The C/C++ Memory Model - GitHub Pages · ©2019 Check Point Software Technologies Ltd. 23 •Sequential consistency (SC): Defined in 1979 by Leslie Lamport: “the result of any execution](https://reader033.vdocument.in/reader033/viewer/2022042109/5e892ffb23666b71ab336ec6/html5/thumbnails/46.jpg)
46 ©2019 Check Point Software Technologies Ltd.
• Characteristics, not existence
• Attempt to maximize cache hits
• All levels of cache hierarchy
• Can be out-performed by cache aware
Cache Oblivious Algorithms
![Page 47: The C/C++ Memory Model - GitHub Pages · ©2019 Check Point Software Technologies Ltd. 23 •Sequential consistency (SC): Defined in 1979 by Leslie Lamport: “the result of any execution](https://reader033.vdocument.in/reader033/viewer/2022042109/5e892ffb23666b71ab336ec6/html5/thumbnails/47.jpg)
47 ©2019 Check Point Software Technologies Ltd.
• Analyzing using big O notation
• Idealized cache model
Ignore cache hierarchy
Ignore replacing policies
Ignore associativity
Analyzing Memory Utilization
![Page 48: The C/C++ Memory Model - GitHub Pages · ©2019 Check Point Software Technologies Ltd. 23 •Sequential consistency (SC): Defined in 1979 by Leslie Lamport: “the result of any execution](https://reader033.vdocument.in/reader033/viewer/2022042109/5e892ffb23666b71ab336ec6/html5/thumbnails/48.jpg)
48 ©2019 Check Point Software Technologies Ltd.
Search a sequence of numbers for the highest number, which is less than X
• Data is searched many time
• Ignore preparation time
How should we store the sequence??
Example: Search
![Page 49: The C/C++ Memory Model - GitHub Pages · ©2019 Check Point Software Technologies Ltd. 23 •Sequential consistency (SC): Defined in 1979 by Leslie Lamport: “the result of any execution](https://reader033.vdocument.in/reader033/viewer/2022042109/5e892ffb23666b71ab336ec6/html5/thumbnails/49.jpg)
49 ©2019 Check Point Software Technologies Ltd.
• log(n) comparisons
• Given the distance, assume that they will require memory access
• log(n)-log(B) memory accesses are required
Attempt #1: Binary Search
![Page 50: The C/C++ Memory Model - GitHub Pages · ©2019 Check Point Software Technologies Ltd. 23 •Sequential consistency (SC): Defined in 1979 by Leslie Lamport: “the result of any execution](https://reader033.vdocument.in/reader033/viewer/2022042109/5e892ffb23666b71ab336ec6/html5/thumbnails/50.jpg)
50 ©2019 Check Point Software Technologies Ltd.
• Set B to our cache-line size
• We will require log B (N) steps
• Each node will be loaded in one cache line
• O(log (N) / log(B)) memory accesses
• But….
We need to know B
Attempt #2: B-Tree
![Page 51: The C/C++ Memory Model - GitHub Pages · ©2019 Check Point Software Technologies Ltd. 23 •Sequential consistency (SC): Defined in 1979 by Leslie Lamport: “the result of any execution](https://reader033.vdocument.in/reader033/viewer/2022042109/5e892ffb23666b71ab336ec6/html5/thumbnails/51.jpg)
51 ©2019 Check Point Software Technologies Ltd.
• Full description and analysis is outside the scope
• Set a fully balanced tree
• Recursively divide it to sub-trees
• Each sub-tree is copied to sequential memory
• Use this to search
Van Emde Boas
![Page 52: The C/C++ Memory Model - GitHub Pages · ©2019 Check Point Software Technologies Ltd. 23 •Sequential consistency (SC): Defined in 1979 by Leslie Lamport: “the result of any execution](https://reader033.vdocument.in/reader033/viewer/2022042109/5e892ffb23666b71ab336ec6/html5/thumbnails/52.jpg)
52 ©2019 Check Point Software Technologies Ltd.
![Page 53: The C/C++ Memory Model - GitHub Pages · ©2019 Check Point Software Technologies Ltd. 23 •Sequential consistency (SC): Defined in 1979 by Leslie Lamport: “the result of any execution](https://reader033.vdocument.in/reader033/viewer/2022042109/5e892ffb23666b71ab336ec6/html5/thumbnails/53.jpg)
53 ©2019 Check Point Software Technologies Ltd.
13
4
2 6
17
15
1 3
19
9 5 18 14 16 20
13 4 5 1 3 6 9 2 17
![Page 54: The C/C++ Memory Model - GitHub Pages · ©2019 Check Point Software Technologies Ltd. 23 •Sequential consistency (SC): Defined in 1979 by Leslie Lamport: “the result of any execution](https://reader033.vdocument.in/reader033/viewer/2022042109/5e892ffb23666b71ab336ec6/html5/thumbnails/54.jpg)
54 ©2019 Check Point Software Technologies Ltd.
• Each section is of size B or less
2 memory accesses per section
• Tree height is log(n)
• Section height between log(B) to log(B)/2
• Max sections we will visit is log(N)/(log(B)/2)
• This will require 4(log(N)/log(B)) memory accesses
Van Emde Boas, intuitive analysis
![Page 55: The C/C++ Memory Model - GitHub Pages · ©2019 Check Point Software Technologies Ltd. 23 •Sequential consistency (SC): Defined in 1979 by Leslie Lamport: “the result of any execution](https://reader033.vdocument.in/reader033/viewer/2022042109/5e892ffb23666b71ab336ec6/html5/thumbnails/55.jpg)
55 ©2019 Check Point Software Technologies Ltd.
class Object { int _pos[2]; int _speed; Model _model; const char _name[NAME_SIZE]; …. int _foo; };
Data Oriented Design
Consider the following class:
![Page 56: The C/C++ Memory Model - GitHub Pages · ©2019 Check Point Software Technologies Ltd. 23 •Sequential consistency (SC): Defined in 1979 by Leslie Lamport: “the result of any execution](https://reader033.vdocument.in/reader033/viewer/2022042109/5e892ffb23666b71ab336ec6/html5/thumbnails/56.jpg)
56 ©2019 Check Point Software Technologies Ltd.
void Object::update( int time)
{
float f = sqrt(
_pos[0] + (time* _speed) +
_pos[1] + (time * _speed ) ) ;
_foo += f;
}
What is the most expensive operation?
1. Load _pos, cache miss, 200 cycles 2. Load speed, same cache line, 3 cycles 3. Multiply and add, twice 5 cycles, twice 4. Square root, 30 cycles 5. Load _foo, cache miss, 200 cycles 6. Add result to _foo , 1 cycle
Total: 450 cycles
_pos
_speed
_model
_name
_foo
Memory
![Page 57: The C/C++ Memory Model - GitHub Pages · ©2019 Check Point Software Technologies Ltd. 23 •Sequential consistency (SC): Defined in 1979 by Leslie Lamport: “the result of any execution](https://reader033.vdocument.in/reader033/viewer/2022042109/5e892ffb23666b71ab336ec6/html5/thumbnails/57.jpg)
57 ©2019 Check Point Software Technologies Ltd.
• 50 out of 450 cycles are real work
Compiler’s domain is the 50 cycles
Not very much…
Can the compiler help?
![Page 58: The C/C++ Memory Model - GitHub Pages · ©2019 Check Point Software Technologies Ltd. 23 •Sequential consistency (SC): Defined in 1979 by Leslie Lamport: “the result of any execution](https://reader033.vdocument.in/reader033/viewer/2022042109/5e892ffb23666b71ab336ec6/html5/thumbnails/58.jpg)
58 ©2019 Check Point Software Technologies Ltd.
class Object {
int _pos[2];
int _speed;
int _foo;
Model _model;
const char _name[NAME_SIZE];
….
};
Back to the Example
![Page 59: The C/C++ Memory Model - GitHub Pages · ©2019 Check Point Software Technologies Ltd. 23 •Sequential consistency (SC): Defined in 1979 by Leslie Lamport: “the result of any execution](https://reader033.vdocument.in/reader033/viewer/2022042109/5e892ffb23666b71ab336ec6/html5/thumbnails/59.jpg)
59 ©2019 Check Point Software Technologies Ltd.
void Object::update( int time)
{
float f = sqrt(
_pos[0] + (time* _speed) +
_pos[1] + (time * _speed ) ) ;
_foo += f;
}
The new cost:
1. Load _pos, cache miss, 200 cycles 2. Load speed, same cache line, 3 cycles 3. Multiply and add, twice 5 cycles, twice 4. Square root, 30 cycles 5. Load _foo, cache miss, 200 cycles 5. Load _foo, 5 cycles 6. Add result to _foo , 1 cycle
Total: 250 cycles
_pos
_speed
_foo
_model
_name
Memory
![Page 60: The C/C++ Memory Model - GitHub Pages · ©2019 Check Point Software Technologies Ltd. 23 •Sequential consistency (SC): Defined in 1979 by Leslie Lamport: “the result of any execution](https://reader033.vdocument.in/reader033/viewer/2022042109/5e892ffb23666b71ab336ec6/html5/thumbnails/60.jpg)
60 ©2019 Check Point Software Technologies Ltd.
• Make continuous, tightly packed, chunks of memory that will be used consecutively.
• Re-group fields according to their usage
• When it is needed, and the transformation on them
DoD in one sentence
![Page 61: The C/C++ Memory Model - GitHub Pages · ©2019 Check Point Software Technologies Ltd. 23 •Sequential consistency (SC): Defined in 1979 by Leslie Lamport: “the result of any execution](https://reader033.vdocument.in/reader033/viewer/2022042109/5e892ffb23666b71ab336ec6/html5/thumbnails/61.jpg)
61 ©2019 Check Point Software Technologies Ltd.
Regroup the data
Can we do better?
class Object {
Model _model;
const char _name[NAME_SIZE];
….
};
class ObjectPosition{
float _pos[2];
float _speed;
int _foo;
};
for ( auto & object : objects ) {
object.update(time);
}
![Page 62: The C/C++ Memory Model - GitHub Pages · ©2019 Check Point Software Technologies Ltd. 23 •Sequential consistency (SC): Defined in 1979 by Leslie Lamport: “the result of any execution](https://reader033.vdocument.in/reader033/viewer/2022042109/5e892ffb23666b71ab336ec6/html5/thumbnails/62.jpg)
62 ©2019 Check Point Software Technologies Ltd.
• Single cache line, multiple objects
• Shared cost
• On average, fetching will cost us about 40 cycles
• Total cost ~90 cycles
The new cost
![Page 63: The C/C++ Memory Model - GitHub Pages · ©2019 Check Point Software Technologies Ltd. 23 •Sequential consistency (SC): Defined in 1979 by Leslie Lamport: “the result of any execution](https://reader033.vdocument.in/reader033/viewer/2022042109/5e892ffb23666b71ab336ec6/html5/thumbnails/63.jpg)
63 ©2019 Check Point Software Technologies Ltd.
• Load the key
• Load the object selectively
Container Searching for a key/condition
Key1 Key 2 Key 3 Key 4 Key 5 Key 6 Key 7 Key 8 Key 9 Key 10
![Page 64: The C/C++ Memory Model - GitHub Pages · ©2019 Check Point Software Technologies Ltd. 23 •Sequential consistency (SC): Defined in 1979 by Leslie Lamport: “the result of any execution](https://reader033.vdocument.in/reader033/viewer/2022042109/5e892ffb23666b71ab336ec6/html5/thumbnails/64.jpg)
64 ©2019 Check Point Software Technologies Ltd.
• shapes is likely to contain:
square, circle, polygon, square, text, apple rectangle….
Polymorphism
for (Shape * currentShape : shapes) { currentShape->draw(); }
![Page 65: The C/C++ Memory Model - GitHub Pages · ©2019 Check Point Software Technologies Ltd. 23 •Sequential consistency (SC): Defined in 1979 by Leslie Lamport: “the result of any execution](https://reader033.vdocument.in/reader033/viewer/2022042109/5e892ffb23666b71ab336ec6/html5/thumbnails/65.jpg)
65 ©2019 Check Point Software Technologies Ltd.
Polymorphism, Resolution
for (Circle* currentCircle : circles) { currentCircle->draw(); } for (Square* currentSquare : squares) { currentSquare->draw(); }
![Page 66: The C/C++ Memory Model - GitHub Pages · ©2019 Check Point Software Technologies Ltd. 23 •Sequential consistency (SC): Defined in 1979 by Leslie Lamport: “the result of any execution](https://reader033.vdocument.in/reader033/viewer/2022042109/5e892ffb23666b71ab336ec6/html5/thumbnails/66.jpg)
66 ©2019 Check Point Software Technologies Ltd.
A touch on atomics
If time permits:
![Page 67: The C/C++ Memory Model - GitHub Pages · ©2019 Check Point Software Technologies Ltd. 23 •Sequential consistency (SC): Defined in 1979 by Leslie Lamport: “the result of any execution](https://reader033.vdocument.in/reader033/viewer/2022042109/5e892ffb23666b71ab336ec6/html5/thumbnails/67.jpg)
67 ©2019 Check Point Software Technologies Ltd.
The Pattern
Singleton* Singleton ::instance () { if ( _instance == nullptr ) { std::lock_guard<std::mutex> lock(_mutex); if (_instance == nullptr) { _instance = new Singleton(); } } return _instance; }
Allocate memory Call C’tor Assign
![Page 68: The C/C++ Memory Model - GitHub Pages · ©2019 Check Point Software Technologies Ltd. 23 •Sequential consistency (SC): Defined in 1979 by Leslie Lamport: “the result of any execution](https://reader033.vdocument.in/reader033/viewer/2022042109/5e892ffb23666b71ab336ec6/html5/thumbnails/68.jpg)
68 ©2019 Check Point Software Technologies Ltd.
Attempt #1: Adding temporary
Singleton * Singleton ::instance () { if ( _instance == nullptr ) { std::lock_guard<std::mutex> lock(_mutex); if (_instance == nullptr) { Singleton * tmp = new Singleton(); _instance = tmp; } } return _instance; }
Optimize out the temporary. Back to square 1.
![Page 69: The C/C++ Memory Model - GitHub Pages · ©2019 Check Point Software Technologies Ltd. 23 •Sequential consistency (SC): Defined in 1979 by Leslie Lamport: “the result of any execution](https://reader033.vdocument.in/reader033/viewer/2022042109/5e892ffb23666b71ab336ec6/html5/thumbnails/69.jpg)
69 ©2019 Check Point Software Technologies Ltd.
• Change tmp to larger scope, say static
Compiler can still detect this
• Define tmp as extern
Can still detect this
Or, place construction after both
• Define helper on other translation unit
Compiler must assume it can throw
No inlining
Link-time inlining kills this attempt
Attempt #2: Outsmart the compiler
![Page 70: The C/C++ Memory Model - GitHub Pages · ©2019 Check Point Software Technologies Ltd. 23 •Sequential consistency (SC): Defined in 1979 by Leslie Lamport: “the result of any execution](https://reader033.vdocument.in/reader033/viewer/2022042109/5e892ffb23666b71ab336ec6/html5/thumbnails/70.jpg)
70 ©2019 Check Point Software Technologies Ltd.
• Qualify tmp and _instance as volatile
All side effects of one volatile must be completed before addressing the other
Attempt #3: Volatile
Singleton * Singleton ::instance () { if ( _instance == nullptr ) { std::lock_guard<std::mutex> lock(_mutex); if ( _instance == nullptr ) { Singleton * volatile tmp = new Singleton(); _instance = tmp; // static Singleton * volatile } } return _instance; }
![Page 71: The C/C++ Memory Model - GitHub Pages · ©2019 Check Point Software Technologies Ltd. 23 •Sequential consistency (SC): Defined in 1979 by Leslie Lamport: “the result of any execution](https://reader033.vdocument.in/reader033/viewer/2022042109/5e892ffb23666b71ab336ec6/html5/thumbnails/71.jpg)
71 ©2019 Check Point Software Technologies Ltd.
Lets inline a constructor:
Attempt #3: Volatile, cont’d
Singleton * Singleton ::instance () { if ( _instance == nullptr ) { std::lock_guard<std::mutex> lock(_mutex); if ( _instance == nullptr ) { Singleton * volatile tmp = new Singleton(); tmp->x = 4 //from the c’tor _instance = tmp; } } return _instance; }
This new instruction may be reordered
![Page 72: The C/C++ Memory Model - GitHub Pages · ©2019 Check Point Software Technologies Ltd. 23 •Sequential consistency (SC): Defined in 1979 by Leslie Lamport: “the result of any execution](https://reader033.vdocument.in/reader033/viewer/2022042109/5e892ffb23666b71ab336ec6/html5/thumbnails/72.jpg)
72 ©2019 Check Point Software Technologies Ltd.
Trying to outsmart the compiler is a bad idea
Conclusion
![Page 73: The C/C++ Memory Model - GitHub Pages · ©2019 Check Point Software Technologies Ltd. 23 •Sequential consistency (SC): Defined in 1979 by Leslie Lamport: “the result of any execution](https://reader033.vdocument.in/reader033/viewer/2022042109/5e892ffb23666b71ab336ec6/html5/thumbnails/73.jpg)
73 ©2019 Check Point Software Technologies Ltd.
Attempt #4: Compiler barrier
Singleton * Singleton::instance () { if (_instance == nullptr) { std::lock_guard<std::mutex> lock(_mutex); if (_instance == nullptr) { Singleton * tmp = new Singleton(); // Compiler Barrier here _instance = tmp; } } return _instance; }
![Page 74: The C/C++ Memory Model - GitHub Pages · ©2019 Check Point Software Technologies Ltd. 23 •Sequential consistency (SC): Defined in 1979 by Leslie Lamport: “the result of any execution](https://reader033.vdocument.in/reader033/viewer/2022042109/5e892ffb23666b71ab336ec6/html5/thumbnails/74.jpg)
74 ©2019 Check Point Software Technologies Ltd.
Game Over!
What about CPU Re-Ordering
![Page 75: The C/C++ Memory Model - GitHub Pages · ©2019 Check Point Software Technologies Ltd. 23 •Sequential consistency (SC): Defined in 1979 by Leslie Lamport: “the result of any execution](https://reader033.vdocument.in/reader033/viewer/2022042109/5e892ffb23666b71ab336ec6/html5/thumbnails/75.jpg)
75 ©2019 Check Point Software Technologies Ltd.
Singleton * Singleton::instance() {
if (_instance == nullptr) {
std::lock_guard<std::mutex> lock(_mutex);
if (_instance == nullptr) {
Singleton * tmp = new Singleton;
std::atomic_thread_fence(std::memory_order_seq_sct);
_instance = tmp;
}
}
return _instance ;
}
Attempt #5: Memory Barrier
Non atomic assignment
![Page 76: The C/C++ Memory Model - GitHub Pages · ©2019 Check Point Software Technologies Ltd. 23 •Sequential consistency (SC): Defined in 1979 by Leslie Lamport: “the result of any execution](https://reader033.vdocument.in/reader033/viewer/2022042109/5e892ffb23666b71ab336ec6/html5/thumbnails/76.jpg)
76 ©2019 Check Point Software Technologies Ltd.
Singleton * Singleton ::instance() {
Singleton * tmp = _instance.load();
if (tmp == nullptr) {
std::lock_guard<std::mutex> lock(_mutex);
tmp = _instance.load();
if (tmp == nullptr) {
tmp = new Singleton;
_instance = tmp;
}
}
return tmp ;
}
Attempt #5: atomic
![Page 77: The C/C++ Memory Model - GitHub Pages · ©2019 Check Point Software Technologies Ltd. 23 •Sequential consistency (SC): Defined in 1979 by Leslie Lamport: “the result of any execution](https://reader033.vdocument.in/reader033/viewer/2022042109/5e892ffb23666b71ab336ec6/html5/thumbnails/77.jpg)
77 ©2019 Check Point Software Technologies Ltd.
• But uses sequential consistency
• Can be expensive
• Can we do better?
This works!
![Page 78: The C/C++ Memory Model - GitHub Pages · ©2019 Check Point Software Technologies Ltd. 23 •Sequential consistency (SC): Defined in 1979 by Leslie Lamport: “the result of any execution](https://reader033.vdocument.in/reader033/viewer/2022042109/5e892ffb23666b71ab336ec6/html5/thumbnails/78.jpg)
78 ©2019 Check Point Software Technologies Ltd.
Singleton * Singleton ::instance() { Singleton * tmp = _instance.load(std::memory_order_acquire); if (tmp == nullptr) { std::lock_guard<std::mutex> lock(_mutex); tmp = _instance.load(memory_order_relaxed); if (tmp == nullptr) { tmp = new Singleton ; _instance.store(tmp, memory_order_release); } } return tmp; }
Attempt #6: acquire-release
![Page 79: The C/C++ Memory Model - GitHub Pages · ©2019 Check Point Software Technologies Ltd. 23 •Sequential consistency (SC): Defined in 1979 by Leslie Lamport: “the result of any execution](https://reader033.vdocument.in/reader033/viewer/2022042109/5e892ffb23666b71ab336ec6/html5/thumbnails/79.jpg)
79 ©2019 Check Point Software Technologies Ltd.
Singleton * Singleton::instance() {
Singleton* tmp = _instance.load(memory_order_relaxed);
if (tmp == nullptr) {
Singleton * newInstance = new Singleton ;
if (! (_instance.compare_exchange_strong( tmp, newInstance,
memory_order_relaxed) ) ) {
delete newInstance;
}
}
return _instance.load(memory_order_relaxed);
}
Attempt #7: do we need the lock?
![Page 80: The C/C++ Memory Model - GitHub Pages · ©2019 Check Point Software Technologies Ltd. 23 •Sequential consistency (SC): Defined in 1979 by Leslie Lamport: “the result of any execution](https://reader033.vdocument.in/reader033/viewer/2022042109/5e892ffb23666b71ab336ec6/html5/thumbnails/80.jpg)
80 ©2019 Check Point Software Technologies Ltd.
C++ 11 states:
Back to the sketching board
If control enters the declaration concurrently while the variable is being initialized, the concurrent execution will wait for completion of the initialization.
Singleton & Singleton::instance() { static Singleton instance; return instance; }
So, the final answer…