improved single global lock fallback for best-effort hardware transactional memory
DESCRIPTION
Improved Single Global Lock Fallback for Best-effort Hardware Transactional Memory. TRANSACT 2014. Irina Calciu Justin Gottschlich Tatiana Shpeisman Gilles Pokam Maurice Herlihy. Multicore Performance Scaling. 2. Hardware Transactional Memory (HTM). - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Improved Single Global Lock Fallback for Best-effort Hardware Transactional Memory](https://reader036.vdocument.in/reader036/viewer/2022062222/568161e8550346895dd2147d/html5/thumbnails/1.jpg)
Improved Single Global Lock Fallback for Best-effort Hardware
Transactional Memory
Irina CalciuJustin GottschlichTatiana Shpeisman
Gilles PokamMaurice Herlihy
TRANSACT 2014
![Page 2: Improved Single Global Lock Fallback for Best-effort Hardware Transactional Memory](https://reader036.vdocument.in/reader036/viewer/2022062222/568161e8550346895dd2147d/html5/thumbnails/2.jpg)
Multicore Performance Scaling
2
![Page 3: Improved Single Global Lock Fallback for Best-effort Hardware Transactional Memory](https://reader036.vdocument.in/reader036/viewer/2022062222/568161e8550346895dd2147d/html5/thumbnails/3.jpg)
Intel’s Haswell TSX: RTM & HLE
3
Low overhead (cache based)
IBM’s Blue Gene/Q & System Z & Power Architecture
Hardware Transactional Memory (HTM)
![Page 4: Improved Single Global Lock Fallback for Best-effort Hardware Transactional Memory](https://reader036.vdocument.in/reader036/viewer/2022062222/568161e8550346895dd2147d/html5/thumbnails/4.jpg)
Haswell RTM
if (_xbegin() == _XBEGIN_STARTED)
_xend()
Speculate Execution
Speculate Execution, without any locks
Read and Write Sets
4
Abort on memory conflict
else
Abort Handler
![Page 5: Improved Single Global Lock Fallback for Best-effort Hardware Transactional Memory](https://reader036.vdocument.in/reader036/viewer/2022062222/568161e8550346895dd2147d/html5/thumbnails/5.jpg)
Haswell RTM
5
_xbegin()
_xend()
Read X
Write Y
Add to Read Set
Add to Write Set
_xbegin()
_xend()
Write X
Write YAdd to Write Set
Make the change to Y visibleCOMMIT
Add to Write SetABORT
if (_xbegin() == _XBEGIN_STARTED)
_xend()
Speculate Execution
![Page 6: Improved Single Global Lock Fallback for Best-effort Hardware Transactional Memory](https://reader036.vdocument.in/reader036/viewer/2022062222/568161e8550346895dd2147d/html5/thumbnails/6.jpg)
Lock Elision
<HLE_Aquire_Prefix> Lock(L)
<HLE_Release_Prefix> Release(L)
Atomic region executed as a transaction or mutually exclusive on L
Execute optimistically, without any locks
Track Read and Write Sets
6
Abort on memory conflict: rollback acquire lock
![Page 7: Improved Single Global Lock Fallback for Best-effort Hardware Transactional Memory](https://reader036.vdocument.in/reader036/viewer/2022062222/568161e8550346895dd2147d/html5/thumbnails/7.jpg)
[Anand Tech]7
![Page 8: Improved Single Global Lock Fallback for Best-effort Hardware Transactional Memory](https://reader036.vdocument.in/reader036/viewer/2022062222/568161e8550346895dd2147d/html5/thumbnails/8.jpg)
Best-effort
OverflowUnsupported InstructionsInterrupts
Conflicts
8
Small & Medium Transactions
Haswell RTM
Needs software fallback
![Page 9: Improved Single Global Lock Fallback for Best-effort Hardware Transactional Memory](https://reader036.vdocument.in/reader036/viewer/2022062222/568161e8550346895dd2147d/html5/thumbnails/9.jpg)
Overview
• Best-effort Hardware Transactional Memory
• Lazy SGL
• Bloom Filter SGL
Description
Correctness
Results
9
![Page 10: Improved Single Global Lock Fallback for Best-effort Hardware Transactional Memory](https://reader036.vdocument.in/reader036/viewer/2022062222/568161e8550346895dd2147d/html5/thumbnails/10.jpg)
Try_SPEC:Wait until Lock is freeTransactional_Read(Lock)If Lock is taken ABORTSpeculate critical sectionEnd speculation
Single Global Lock HyTM (simple and common)
10
EndHW txn
BeginHW txnRead L
Begin SW txn
Acquire L
Release LEnd
SW txn
On_ABORT:If try_lock(Lock)
Critical sectionRelease(Lock)
Else Try_SPEC
Does not abort!
![Page 11: Improved Single Global Lock Fallback for Best-effort Hardware Transactional Memory](https://reader036.vdocument.in/reader036/viewer/2022062222/568161e8550346895dd2147d/html5/thumbnails/11.jpg)
Begin SW txn
Acquire L
Release LEnd
SW txn
BeginHW txnRead L
EndHW txn
(1)
BeginHW txnRead L
EndHW txn
(2)
BeginHW txnRead L
BeginHW txnRead L
EndHW txn
(3) EndHW txn
(4)
XX
X
X
Legend: X = ABORT
Single Global Lock HyTM (simple and common)
Tim
e
11
![Page 12: Improved Single Global Lock Fallback for Best-effort Hardware Transactional Memory](https://reader036.vdocument.in/reader036/viewer/2022062222/568161e8550346895dd2147d/html5/thumbnails/12.jpg)
Begin_HW_TXN (L)
End_HW_TXN (L)
CRITICAL SECTION
Begin_HW_TXN (L)
End_HW_TXN (L)
CRITICAL SECTION
Begin_HW_TXN (L)
End_HW_TXN (L)
CRITICAL SECTION
Acquire(L)
Release(L)
CRITICAL SECTION(SW TXN)
Begin_HW_TXN (L)
End_HW_TXN (L)
CRITICAL SECTION
Begin_HW_TXN (L)
End_HW_TXN (L)
CRITICAL SECTION
Tim
e
Thread 1 Thread 2
Execution Time 1 12
![Page 13: Improved Single Global Lock Fallback for Best-effort Hardware Transactional Memory](https://reader036.vdocument.in/reader036/viewer/2022062222/568161e8550346895dd2147d/html5/thumbnails/13.jpg)
Thread 1 Thread 2
Begin_HW_TXN (L)
End_HW_TXN (L)
CRITICAL SECTION
Begin_HW_TXN (L)
End_HW_TXN (L)
CRITICAL SECTION
Acquire(L)
Release(L)
CRITICAL SECTION(SW TXN)
Begin_HW_TXN (L)
End_HW_TXN (L)
CRITICAL SECTION
Begin_HW_TXN (L)
End_HW_TXN (L)
CRITICAL SECTION
Begin_HW_TXN (L)
End_HW_TXN (L)
CRITICAL SECTION
Execution Time 1
Tim
e
Execution Time 2
13
![Page 14: Improved Single Global Lock Fallback for Best-effort Hardware Transactional Memory](https://reader036.vdocument.in/reader036/viewer/2022062222/568161e8550346895dd2147d/html5/thumbnails/14.jpg)
Try_SPEC:Speculate critical sectionTransactional_Read(Lock)If Lock is taken ABORTEnd speculation
Lazy SGL
1414
Begin SW txn
Acquire L
Release LEnd
SW txn
On_ABORT:If try_lock(Lock)
Critical sectionRelease(Lock)
Else Try_SPEC
Does not abort!
Read LEnd
HW txn
BeginHW txn
![Page 15: Improved Single Global Lock Fallback for Best-effort Hardware Transactional Memory](https://reader036.vdocument.in/reader036/viewer/2022062222/568161e8550346895dd2147d/html5/thumbnails/15.jpg)
Begin SW txn
Acquire L
Release LEnd
SW txn
BeginHW txn
Read LEnd
HW txn(1)
BeginHW txn
Read LEnd
HW txn(2)
BeginHW txn
BeginHW txn
Read LEnd
HW txn(3)
Read LEnd
HW txn(4)
XX
Legend: X = ABORT
COMMITCOMMIT
Lazy SGL
Tim
e
15
![Page 16: Improved Single Global Lock Fallback for Best-effort Hardware Transactional Memory](https://reader036.vdocument.in/reader036/viewer/2022062222/568161e8550346895dd2147d/html5/thumbnails/16.jpg)
Overview
• Best-effort Hardware Transactional Memory
• Lazy SGL
• Bloom Filter SGL
Description
Correctness
Results
16
![Page 17: Improved Single Global Lock Fallback for Best-effort Hardware Transactional Memory](https://reader036.vdocument.in/reader036/viewer/2022062222/568161e8550346895dd2147d/html5/thumbnails/17.jpg)
Transactional Memory Correctness
Transaction 1SW
Transaction 2HW
Tim
e
Order T2 AFTER T1
Order T2 BEFORE T1
COMMIT
COMMIT
17
![Page 18: Improved Single Global Lock Fallback for Best-effort Hardware Transactional Memory](https://reader036.vdocument.in/reader036/viewer/2022062222/568161e8550346895dd2147d/html5/thumbnails/18.jpg)
Thread 1(SW)
Acquire Lock…
X = a
…
Release Lock
TXN_BEGIN
…
X = b…
TXN_END
Thread 2(HW)
Correct: a Actual: b
Tim
e
Case 1: HW begins SW begins HW ends SW ends
X value: a b
Check Lock
ABORT
Correct: a Actual: a
18
![Page 19: Improved Single Global Lock Fallback for Best-effort Hardware Transactional Memory](https://reader036.vdocument.in/reader036/viewer/2022062222/568161e8550346895dd2147d/html5/thumbnails/19.jpg)
Acquire Lock…
X = a
…
Release Lock
TXN_BEGIN…
X = b…
TXN_END
Thread 1(SW)
Thread 2(HW)
Case 2: SW beginsHW beginsHW endsSW ends
Correct: a Actual: b
Tim
e
Correct: a Actual: a
Check Lock
ABORT
X value:
19
![Page 20: Improved Single Global Lock Fallback for Best-effort Hardware Transactional Memory](https://reader036.vdocument.in/reader036/viewer/2022062222/568161e8550346895dd2147d/html5/thumbnails/20.jpg)
Acquire Lock…
X = a…
Release Lock
TXN_BEGIN…
X = b…
TXN_END
Case 3: SW beginsHW beginsSW endsHW ends
Thread 1(SW)
Thread 2(HW)
Tim
eX value: a b
Correct: b Actual: b
Check LockCOMMIT
20
![Page 21: Improved Single Global Lock Fallback for Best-effort Hardware Transactional Memory](https://reader036.vdocument.in/reader036/viewer/2022062222/568161e8550346895dd2147d/html5/thumbnails/21.jpg)
Acquire Lock…
X = a…
Release Lock
TXN_BEGIN
…
X = b…
TXN_END
Case 4: HW beginsSW beginsSW endsHW ends
Thread 1(SW)
Thread 2(HW)
Tim
e
X value:Correct:
b Actual: b
Check Lock
COMMIT
21
![Page 22: Improved Single Global Lock Fallback for Best-effort Hardware Transactional Memory](https://reader036.vdocument.in/reader036/viewer/2022062222/568161e8550346895dd2147d/html5/thumbnails/22.jpg)
22
Thread 1(SW)
X = 5; Y = 6Acquire Lock
…++X
…
++Y…
Release Lock
TXN_BEGIN
…
Z = 1/(Y-X)
…
TXN_END
Thread 2(HW)
Z = 1/0 !!!Tim
e
Hardware Sandboxing
![Page 23: Improved Single Global Lock Fallback for Best-effort Hardware Transactional Memory](https://reader036.vdocument.in/reader036/viewer/2022062222/568161e8550346895dd2147d/html5/thumbnails/23.jpg)
Indirect Jumps
Thread 1(SW)
X = 5; Y = 6Acquire Lock
…++X
…
++Y…
Release Lock
_xbegin
…
if (X == Y) *p = garbagep()
…if (lock) abort_xend
Thread 2(HW)
_xend
Indirect jump to
garbage location
Tim
e
23
![Page 24: Improved Single Global Lock Fallback for Best-effort Hardware Transactional Memory](https://reader036.vdocument.in/reader036/viewer/2022062222/568161e8550346895dd2147d/html5/thumbnails/24.jpg)
Overview
• Best-effort Hardware Transactional Memory
• Lazy SGL
• Bloom Filter SGL
Description
Correctness
Results
24
![Page 25: Improved Single Global Lock Fallback for Best-effort Hardware Transactional Memory](https://reader036.vdocument.in/reader036/viewer/2022062222/568161e8550346895dd2147d/html5/thumbnails/25.jpg)
1 2 4 80
0.5
1
1.5
2
2.5
3
3.5
Ssca2 (small txns)
Threads
Spee
dup
1 2 4 80
0.51
1.52
2.53
3.54
Labyrinth (large txns)
Threads
Spee
dup
25
Intruder (medium txns)
1 2 4 80
0.5
1
1.5
2
2.5
3
TL2
SGL
HLE
E-SGL
L-SGL
Threads
Spee
dup
Better
![Page 26: Improved Single Global Lock Fallback for Best-effort Hardware Transactional Memory](https://reader036.vdocument.in/reader036/viewer/2022062222/568161e8550346895dd2147d/html5/thumbnails/26.jpg)
Improved Lock Acquisition Rate
26
Vacation Low (medium txns)
Kmeans High (small txns)
Intruder (medium txns)
Labyrinth (large txns)
1 2 4 80
5
10
15
20
25
30
Threads
% lo
ck a
cqui
sitio
ns
1 2 4 80
10
20
30
40
50
60
70
Threads
% lo
ck a
cqui
sitio
ns
1 2 4 805
1015202530354045
HLEE-SGLL-SGL
Threads
% lo
ck a
cqui
sitio
ns
1 2 4 80
10
20
30
40
50
60
70
80
HLEE-SGLL-SGL
Threads
% lo
ck a
cqui
sitio
ns
Better
![Page 27: Improved Single Global Lock Fallback for Best-effort Hardware Transactional Memory](https://reader036.vdocument.in/reader036/viewer/2022062222/568161e8550346895dd2147d/html5/thumbnails/27.jpg)
No single thread overhead
27
Slowdown relative to sequential for 1 thread
baye
s
geno
me
intrud
er
km_lo
w
km_h
igh
labyri
nth
vaca
tion_
low
vaca
tion_
high
ssca
2ya
da0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
TL2SGLHLEE-SGLL-SGLSl
owdo
wn
![Page 28: Improved Single Global Lock Fallback for Best-effort Hardware Transactional Memory](https://reader036.vdocument.in/reader036/viewer/2022062222/568161e8550346895dd2147d/html5/thumbnails/28.jpg)
Overview
• Best-effort Hardware Transactional Memory
• Lazy SGL
• Bloom Filter SGL
Description
Correctness
Results
28
![Page 29: Improved Single Global Lock Fallback for Best-effort Hardware Transactional Memory](https://reader036.vdocument.in/reader036/viewer/2022062222/568161e8550346895dd2147d/html5/thumbnails/29.jpg)
Bloom Filters
• Efficient probabilistic data structure to compute fast set intersection
• Can admit false positives
• No false negatives
• Used in TM for Conflict Detection
29
![Page 30: Improved Single Global Lock Fallback for Best-effort Hardware Transactional Memory](https://reader036.vdocument.in/reader036/viewer/2022062222/568161e8550346895dd2147d/html5/thumbnails/30.jpg)
Begin SW txn
Acquire L
Release LEnd
SW txn
BeginHW txn
Read LEnd
HW txn(1)
BeginHW txn
Read LEnd
HW txn(2)
BeginHW txn
BeginHW txn
Read LEnd
HW txn(3)
Read LEnd
HW txn(4)
XX
Legend: X = ABORT
COMMITCOMMIT
Lazy SGL
Tim
e
30
![Page 31: Improved Single Global Lock Fallback for Best-effort Hardware Transactional Memory](https://reader036.vdocument.in/reader036/viewer/2022062222/568161e8550346895dd2147d/html5/thumbnails/31.jpg)
Begin SW txn
Acquire L
Release LEnd
SW txn
BeginHW txn
Check BFEnd
HW txn(1)
BeginHW txn
Check BFEnd
HW txn(2)
BeginHW txn
BeginHW txn
Read LEnd
HW txn(3)
Read LEnd
HW txn(4)
Legend: X = ABORT
COMMITCOMMIT
BF SGL
Tim
e
31
![Page 32: Improved Single Global Lock Fallback for Best-effort Hardware Transactional Memory](https://reader036.vdocument.in/reader036/viewer/2022062222/568161e8550346895dd2147d/html5/thumbnails/32.jpg)
Thread 1(SW)
Acquire Lock…
X = a
…
Release Lock
TXN_BEGIN
…
X = b…
TXN_END
Thread 2(HW)
Correct: a Actual: b
Tim
e
Case 1: HW begins SW begins HW ends SW ends
X value: a b
Check Lock
ABORT
Correct: a Actual: a
Check BF
If BFs intersect: ABORTElse: COMMIT
32
![Page 33: Improved Single Global Lock Fallback for Best-effort Hardware Transactional Memory](https://reader036.vdocument.in/reader036/viewer/2022062222/568161e8550346895dd2147d/html5/thumbnails/33.jpg)
Acquire Lock…
X = a
…
Release Lock
TXN_BEGIN…
X = b…
TXN_END
Thread 1(SW)
Thread 2(HW)
Case 2: SW beginsHW beginsHW endsSW ends
Correct: a Actual: b
Tim
e
Correct: a Actual: a
Check Lock
ABORT
X value:
Check BF
If BFs intersect: ABORTElse: COMMIT 33
![Page 34: Improved Single Global Lock Fallback for Best-effort Hardware Transactional Memory](https://reader036.vdocument.in/reader036/viewer/2022062222/568161e8550346895dd2147d/html5/thumbnails/34.jpg)
Conclusions
• HTMs are becoming more available
• Best-effort – need software fallback
• Eager SGL • simple and fast fallback, • often preferred to more efficient solutions
34
![Page 35: Improved Single Global Lock Fallback for Best-effort Hardware Transactional Memory](https://reader036.vdocument.in/reader036/viewer/2022062222/568161e8550346895dd2147d/html5/thumbnails/35.jpg)
Conclusions
• Lazy SGL • as simple as Eager SGL• more efficient
• Bloom Filter SGL • more accurate conflict detection• Slower
• Can be implemented directly in hardware
35
![Page 36: Improved Single Global Lock Fallback for Best-effort Hardware Transactional Memory](https://reader036.vdocument.in/reader036/viewer/2022062222/568161e8550346895dd2147d/html5/thumbnails/36.jpg)
![Page 37: Improved Single Global Lock Fallback for Best-effort Hardware Transactional Memory](https://reader036.vdocument.in/reader036/viewer/2022062222/568161e8550346895dd2147d/html5/thumbnails/37.jpg)
http://de.sap.info/wp-content/uploads/2012/02/In_Memory_Technologie.jpg
http://www.avoiceformen.com/wp-content/uploads/sites/2/2013/01/Questions.jpg
References
![Page 38: Improved Single Global Lock Fallback for Best-effort Hardware Transactional Memory](https://reader036.vdocument.in/reader036/viewer/2022062222/568161e8550346895dd2147d/html5/thumbnails/38.jpg)
1 2 4 80
0.5
1
1.5
2
2.5
3
Intruder
TL2SGLHLEHyswell
Threads
Spee
dup
1 2 4 80
0.5
1
1.5
2
2.5
3
3.5
Vacation Low
TL2SGLHLEHyswell
Threads
Spee
dup
1 2 4 80
0.5
1
1.5
2
2.5
3
Vacation High
TL2SGLHLEHyswell
Threads
Spee
dup
1 2 4 80
0.5
1
1.5
2
2.5
3
3.5
Genome
TL2SGLHLEHyswell
Threads
Spee
dup
38
Medium transactions
![Page 39: Improved Single Global Lock Fallback for Best-effort Hardware Transactional Memory](https://reader036.vdocument.in/reader036/viewer/2022062222/568161e8550346895dd2147d/html5/thumbnails/39.jpg)
1 2 4 80
0.51
1.52
2.53
3.54
4.5
Kmeans Low
TL2SGLHLEHyswell
Threads
Spee
dup
1 2 4 80
0.51
1.52
2.53
3.54
4.5
Kmeans High
TL2SGLHLEHyswell
Threads
Spee
dup
1 2 4 80
0.5
1
1.5
2
2.5
3
3.5
Ssca2
TL2SGLHLEHyswell
Threads
Spee
dup
39
Small transactions
![Page 40: Improved Single Global Lock Fallback for Best-effort Hardware Transactional Memory](https://reader036.vdocument.in/reader036/viewer/2022062222/568161e8550346895dd2147d/html5/thumbnails/40.jpg)
1 2 4 80
0.5
1
1.5
2
2.5
3
3.5
4
Bayes
TL2SGLHLEHyswell
Threads
Spee
dup
1 2 4 80
0.5
1
1.5
2
2.5
3
3.5
4
Labyrinth
TL2SGLHLEHyswell
Threads
Spee
dup
1 2 4 80
0.2
0.4
0.6
0.8
1
1.2
Yada
TL2SGLHLEHyswell
Threads
Spee
dup
40
Large transactions
![Page 41: Improved Single Global Lock Fallback for Best-effort Hardware Transactional Memory](https://reader036.vdocument.in/reader036/viewer/2022062222/568161e8550346895dd2147d/html5/thumbnails/41.jpg)
bayes genome intruder kmeans low kmeans high
labyrinth ssca2 vacation low
vacation high
yada0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
Speedup over sequential for 8 threads
TL2
SGL
HLE
Hyswell
41
![Page 42: Improved Single Global Lock Fallback for Best-effort Hardware Transactional Memory](https://reader036.vdocument.in/reader036/viewer/2022062222/568161e8550346895dd2147d/html5/thumbnails/42.jpg)
Software Hardware (1) Read(x) Read(x) Not a conflict
(2)Read(x)
Write(x)
Software transaction ordered before hardware transaction -> CORRECT
(3)
Read(x)
Write(x) Hardware abort
(4)Write(x)
Read(x)
Software transaction ordered before hardware transaction -> CORRECT
(5)
Write(x)
Read(x) Hardware abort
(6)Write(x)
Write(x)
Software transaction ordered before hardware transaction -> CORRECT
(7)
Write(x)
Write(x) Hardware abort
42
![Page 43: Improved Single Global Lock Fallback for Best-effort Hardware Transactional Memory](https://reader036.vdocument.in/reader036/viewer/2022062222/568161e8550346895dd2147d/html5/thumbnails/43.jpg)