improvedsingle)globallock) fallback)forbest:effort...
TRANSCRIPT
![Page 1: ImprovedSingle)GlobalLock) Fallback)forBest:effort ...cs.brown.edu/~irina/slides/transact2014_slides.pdf · ImprovedSingle)GlobalLock) Fallback)forBest:effort)Hardware TransactionalMemory](https://reader031.vdocument.in/reader031/viewer/2022022000/5a710de67f8b9aac538c7e7d/html5/thumbnails/1.jpg)
Improved Single Global Lock Fallback for Best-effort Hardware
Transactional Memory
Irina CalciuTatiana ShpeismanGilles PokamMaurice Herlihy
TRANSACT 2014
![Page 2: ImprovedSingle)GlobalLock) Fallback)forBest:effort ...cs.brown.edu/~irina/slides/transact2014_slides.pdf · ImprovedSingle)GlobalLock) Fallback)forBest:effort)Hardware TransactionalMemory](https://reader031.vdocument.in/reader031/viewer/2022022000/5a710de67f8b9aac538c7e7d/html5/thumbnails/2.jpg)
Multicore Performance Scaling
2
![Page 3: ImprovedSingle)GlobalLock) Fallback)forBest:effort ...cs.brown.edu/~irina/slides/transact2014_slides.pdf · ImprovedSingle)GlobalLock) Fallback)forBest:effort)Hardware TransactionalMemory](https://reader031.vdocument.in/reader031/viewer/2022022000/5a710de67f8b9aac538c7e7d/html5/thumbnails/3.jpg)
Intel’s Haswell TSX: RTM & HLE
3
Low overhead (cache based)
IBM’s Blue Gene/Q & System Z & Power Architecture
Hardware Transactional Memory (HTM)
![Page 4: ImprovedSingle)GlobalLock) Fallback)forBest:effort ...cs.brown.edu/~irina/slides/transact2014_slides.pdf · ImprovedSingle)GlobalLock) Fallback)forBest:effort)Hardware TransactionalMemory](https://reader031.vdocument.in/reader031/viewer/2022022000/5a710de67f8b9aac538c7e7d/html5/thumbnails/4.jpg)
Haswell RTM
if (_xbegin() == _XBEGIN_STARTED)
_xend()
Speculate Execution
Speculate Execution, without any locks
Read and Write Sets
4
Abort on memory conflict
else
Abort Handler
![Page 5: ImprovedSingle)GlobalLock) Fallback)forBest:effort ...cs.brown.edu/~irina/slides/transact2014_slides.pdf · ImprovedSingle)GlobalLock) Fallback)forBest:effort)Hardware TransactionalMemory](https://reader031.vdocument.in/reader031/viewer/2022022000/5a710de67f8b9aac538c7e7d/html5/thumbnails/5.jpg)
Haswell RTM
5
_xbegin()
_xend()
Read X
Write Y
Add to Read Set
Add to Write Set
_xbegin()
_xend()
Write X
Write YAdd to Write Set
Make the change to Y visibleCOMMIT
Add to Write SetABORT
if (_xbegin() == _XBEGIN_STARTED)
_xend()
Speculate Execution
![Page 6: ImprovedSingle)GlobalLock) Fallback)forBest:effort ...cs.brown.edu/~irina/slides/transact2014_slides.pdf · ImprovedSingle)GlobalLock) Fallback)forBest:effort)Hardware TransactionalMemory](https://reader031.vdocument.in/reader031/viewer/2022022000/5a710de67f8b9aac538c7e7d/html5/thumbnails/6.jpg)
Lock Elision
<HLE_Aquire_Prefix> Lock(L)
<HLE_Release_Prefix> Release(L)
Atomic region executed as a transaction or mutually exclusive on L
Execute optimistically, without any locks
Track Read and Write Sets
6
Abort on memory conflict: rollback acquire lock
![Page 7: ImprovedSingle)GlobalLock) Fallback)forBest:effort ...cs.brown.edu/~irina/slides/transact2014_slides.pdf · ImprovedSingle)GlobalLock) Fallback)forBest:effort)Hardware TransactionalMemory](https://reader031.vdocument.in/reader031/viewer/2022022000/5a710de67f8b9aac538c7e7d/html5/thumbnails/7.jpg)
[Anand Tech]7
![Page 8: ImprovedSingle)GlobalLock) Fallback)forBest:effort ...cs.brown.edu/~irina/slides/transact2014_slides.pdf · ImprovedSingle)GlobalLock) Fallback)forBest:effort)Hardware TransactionalMemory](https://reader031.vdocument.in/reader031/viewer/2022022000/5a710de67f8b9aac538c7e7d/html5/thumbnails/8.jpg)
Best-effort
OverflowUnsupported InstructionsInterrupts
Conflicts
8
Small & Medium Transactions
Haswell RTM
Needs software fallback
![Page 9: ImprovedSingle)GlobalLock) Fallback)forBest:effort ...cs.brown.edu/~irina/slides/transact2014_slides.pdf · ImprovedSingle)GlobalLock) Fallback)forBest:effort)Hardware TransactionalMemory](https://reader031.vdocument.in/reader031/viewer/2022022000/5a710de67f8b9aac538c7e7d/html5/thumbnails/9.jpg)
Overview
• Best-effort Hardware Transactional Memory
• Lazy SGL
• Bloom Filter SGL
Description
Correctness
Results
9
![Page 10: ImprovedSingle)GlobalLock) Fallback)forBest:effort ...cs.brown.edu/~irina/slides/transact2014_slides.pdf · ImprovedSingle)GlobalLock) Fallback)forBest:effort)Hardware TransactionalMemory](https://reader031.vdocument.in/reader031/viewer/2022022000/5a710de67f8b9aac538c7e7d/html5/thumbnails/10.jpg)
Try_SPEC:Wait until Lock is freeTransactional_Read(Lock)If Lock is taken ABORTSpeculate critical sectionEnd speculation
Single Global Lock HyTM (simple and common)
10
EndHW txn
BeginHW txnRead L
Begin SW txn
Acquire L
Release LEnd
SW txn
On_ABORT:If try_lock(Lock)
Critical sectionRelease(Lock)
Else Try_SPEC
Does not abort!
![Page 11: ImprovedSingle)GlobalLock) Fallback)forBest:effort ...cs.brown.edu/~irina/slides/transact2014_slides.pdf · ImprovedSingle)GlobalLock) Fallback)forBest:effort)Hardware TransactionalMemory](https://reader031.vdocument.in/reader031/viewer/2022022000/5a710de67f8b9aac538c7e7d/html5/thumbnails/11.jpg)
Begin SW txn
Acquire L
Release LEnd
SW txn
BeginHW txnRead L
EndHW txn
(1)
BeginHW txnRead L
EndHW txn
(2)
BeginHW txnRead L
BeginHW txnRead L
EndHW txn
(3) EndHW txn
(4)
XX
X
X
Legend:X = ABORT
Single Global Lock HyTM (simple and common)
Time
11
![Page 12: ImprovedSingle)GlobalLock) Fallback)forBest:effort ...cs.brown.edu/~irina/slides/transact2014_slides.pdf · ImprovedSingle)GlobalLock) Fallback)forBest:effort)Hardware TransactionalMemory](https://reader031.vdocument.in/reader031/viewer/2022022000/5a710de67f8b9aac538c7e7d/html5/thumbnails/12.jpg)
Begin_HW_TXN (L)
End_HW_TXN (L)
CRITICAL SECTION
Begin_HW_TXN (L)
End_HW_TXN (L)
CRITICAL SECTION
Begin_HW_TXN (L)
End_HW_TXN (L)
CRITICAL SECTION
Acquire(L)
Release(L)
CRITICAL SECTION(SW TXN)
Begin_HW_TXN (L)
End_HW_TXN (L)
CRITICAL SECTION
Begin_HW_TXN (L)
End_HW_TXN (L)
CRITICAL SECTION
Time
Thread 1 Thread 2
Execution Time 1 12
![Page 13: ImprovedSingle)GlobalLock) Fallback)forBest:effort ...cs.brown.edu/~irina/slides/transact2014_slides.pdf · ImprovedSingle)GlobalLock) Fallback)forBest:effort)Hardware TransactionalMemory](https://reader031.vdocument.in/reader031/viewer/2022022000/5a710de67f8b9aac538c7e7d/html5/thumbnails/13.jpg)
Thread 1 Thread 2
Begin_HW_TXN (L)
End_HW_TXN (L)
CRITICAL SECTION
Begin_HW_TXN (L)
End_HW_TXN (L)
CRITICAL SECTION
Acquire(L)
Release(L)
CRITICAL SECTION(SW TXN)
Begin_HW_TXN (L)
End_HW_TXN (L)
CRITICAL SECTION
Begin_HW_TXN (L)
End_HW_TXN (L)
CRITICAL SECTION
Begin_HW_TXN (L)
End_HW_TXN (L)
CRITICAL SECTION
Execution Time 1
Time
Execution Time 2
13
![Page 14: ImprovedSingle)GlobalLock) Fallback)forBest:effort ...cs.brown.edu/~irina/slides/transact2014_slides.pdf · ImprovedSingle)GlobalLock) Fallback)forBest:effort)Hardware TransactionalMemory](https://reader031.vdocument.in/reader031/viewer/2022022000/5a710de67f8b9aac538c7e7d/html5/thumbnails/14.jpg)
Try_SPEC:Speculate critical sectionTransactional_Read(Lock)If Lock is taken ABORTEnd speculation
Lazy SGL
1414
Begin SW txn
Acquire L
Release LEnd
SW txn
On_ABORT:If try_lock(Lock)
Critical sectionRelease(Lock)
Else Try_SPEC
Does not abort!
Read LEnd
HW txn
BeginHW txn
![Page 15: ImprovedSingle)GlobalLock) Fallback)forBest:effort ...cs.brown.edu/~irina/slides/transact2014_slides.pdf · ImprovedSingle)GlobalLock) Fallback)forBest:effort)Hardware TransactionalMemory](https://reader031.vdocument.in/reader031/viewer/2022022000/5a710de67f8b9aac538c7e7d/html5/thumbnails/15.jpg)
Begin SW txn
Acquire L
Release LEnd
SW txn
BeginHW txn
Read LEnd
HW txn(1)
BeginHW txn
Read LEnd
HW txn(2)
BeginHW txn
BeginHW txn
Read LEnd
HW txn(3)
Read LEnd
HW txn(4)
XX
Legend:X = ABORT
COMMIT
COMMIT
Lazy SGL
Time
15
![Page 16: ImprovedSingle)GlobalLock) Fallback)forBest:effort ...cs.brown.edu/~irina/slides/transact2014_slides.pdf · ImprovedSingle)GlobalLock) Fallback)forBest:effort)Hardware TransactionalMemory](https://reader031.vdocument.in/reader031/viewer/2022022000/5a710de67f8b9aac538c7e7d/html5/thumbnails/16.jpg)
Overview
• Best-effort Hardware Transactional Memory
• Lazy SGL
• Bloom Filter SGL
Description
Correctness
Results
16
![Page 17: ImprovedSingle)GlobalLock) Fallback)forBest:effort ...cs.brown.edu/~irina/slides/transact2014_slides.pdf · ImprovedSingle)GlobalLock) Fallback)forBest:effort)Hardware TransactionalMemory](https://reader031.vdocument.in/reader031/viewer/2022022000/5a710de67f8b9aac538c7e7d/html5/thumbnails/17.jpg)
Transactional Memory Correctness
Transaction 1SW
Transaction 2HW
Time
Order T2 AFTER T1
Order T2 BEFORE T1
COMMIT
COMMIT
17
![Page 18: ImprovedSingle)GlobalLock) Fallback)forBest:effort ...cs.brown.edu/~irina/slides/transact2014_slides.pdf · ImprovedSingle)GlobalLock) Fallback)forBest:effort)Hardware TransactionalMemory](https://reader031.vdocument.in/reader031/viewer/2022022000/5a710de67f8b9aac538c7e7d/html5/thumbnails/18.jpg)
Thread 1(SW)
Acquire Lock…
X = a
…
Release Lock
TXN_BEGIN
…
X = b…
TXN_END
Thread 2(HW)
Correct: a Actual: b
Time
Case 1: HW begins SW begins HW ends SW ends
X value:
Check Lock
ABORT
18
![Page 19: ImprovedSingle)GlobalLock) Fallback)forBest:effort ...cs.brown.edu/~irina/slides/transact2014_slides.pdf · ImprovedSingle)GlobalLock) Fallback)forBest:effort)Hardware TransactionalMemory](https://reader031.vdocument.in/reader031/viewer/2022022000/5a710de67f8b9aac538c7e7d/html5/thumbnails/19.jpg)
Acquire Lock…
X = a
…
Release Lock
TXN_BEGIN…
X = b…
TXN_END
Thread 1(SW)
Thread 2(HW)
Case 2: SW beginsHW beginsHW endsSW ends
Correct: a Actual: b
Time
Check Lock
ABORT
X value:
19
![Page 20: ImprovedSingle)GlobalLock) Fallback)forBest:effort ...cs.brown.edu/~irina/slides/transact2014_slides.pdf · ImprovedSingle)GlobalLock) Fallback)forBest:effort)Hardware TransactionalMemory](https://reader031.vdocument.in/reader031/viewer/2022022000/5a710de67f8b9aac538c7e7d/html5/thumbnails/20.jpg)
Acquire Lock…
X = a…
Release Lock
TXN_BEGIN…
X = b…
TXN_END
Case 3: SW beginsHW beginsSW endsHW ends
Thread 1(SW)
Thread 2(HW)
TimeX value:
Correct: b Actual: b
Check Lock
COMMIT
20
![Page 21: ImprovedSingle)GlobalLock) Fallback)forBest:effort ...cs.brown.edu/~irina/slides/transact2014_slides.pdf · ImprovedSingle)GlobalLock) Fallback)forBest:effort)Hardware TransactionalMemory](https://reader031.vdocument.in/reader031/viewer/2022022000/5a710de67f8b9aac538c7e7d/html5/thumbnails/21.jpg)
Acquire Lock…
X = a…
Release Lock
TXN_BEGIN
…
X = b…
TXN_END
Case 4: HW beginsSW beginsSW endsHW ends
Thread 1(SW)
Thread 2(HW)
TimeX value:
Correct: b Actual: b
Check Lock
COMMIT21
![Page 22: ImprovedSingle)GlobalLock) Fallback)forBest:effort ...cs.brown.edu/~irina/slides/transact2014_slides.pdf · ImprovedSingle)GlobalLock) Fallback)forBest:effort)Hardware TransactionalMemory](https://reader031.vdocument.in/reader031/viewer/2022022000/5a710de67f8b9aac538c7e7d/html5/thumbnails/22.jpg)
22
Thread 1(SW)
X = 5; Y = 6Acquire Lock
…++X
…
++Y…
Release Lock
TXN_BEGIN
…
Z = 1/(Y-X)
…
TXN_END
Thread 2(HW)
Z = 1/0 !!!Time
Hardware Sandboxing
![Page 23: ImprovedSingle)GlobalLock) Fallback)forBest:effort ...cs.brown.edu/~irina/slides/transact2014_slides.pdf · ImprovedSingle)GlobalLock) Fallback)forBest:effort)Hardware TransactionalMemory](https://reader031.vdocument.in/reader031/viewer/2022022000/5a710de67f8b9aac538c7e7d/html5/thumbnails/23.jpg)
Indirect Jumps
Thread 1(SW)
X = 5; Y = 6Acquire Lock
…++X
…
++Y…
Release Lock
_xbegin
…
if (X == Y) *p = garbagep()
…if (lock) abort_xend
Thread 2(HW)
_xend
Time
23
![Page 24: ImprovedSingle)GlobalLock) Fallback)forBest:effort ...cs.brown.edu/~irina/slides/transact2014_slides.pdf · ImprovedSingle)GlobalLock) Fallback)forBest:effort)Hardware TransactionalMemory](https://reader031.vdocument.in/reader031/viewer/2022022000/5a710de67f8b9aac538c7e7d/html5/thumbnails/24.jpg)
Overview
• Best-effort Hardware Transactional Memory
• Lazy SGL
• Bloom Filter SGL
Description
Correctness
Results
24
![Page 25: ImprovedSingle)GlobalLock) Fallback)forBest:effort ...cs.brown.edu/~irina/slides/transact2014_slides.pdf · ImprovedSingle)GlobalLock) Fallback)forBest:effort)Hardware TransactionalMemory](https://reader031.vdocument.in/reader031/viewer/2022022000/5a710de67f8b9aac538c7e7d/html5/thumbnails/25.jpg)
0
0.5
1
1.5
2
2.5
3
3.5
1 2 4 8
Spee
dup
Threads
Ssca2 (small txns)
00.51
1.52
2.53
3.54
1 2 4 8
Spee
dup
Threads
Labyrinth (large txns)
25
Intruder (medium txns)
0
0.5
1
1.5
2
2.5
3
1 2 4 8
Speedup
Threads
TL2SGLHLEE-SGLL-SGL
Better
![Page 26: ImprovedSingle)GlobalLock) Fallback)forBest:effort ...cs.brown.edu/~irina/slides/transact2014_slides.pdf · ImprovedSingle)GlobalLock) Fallback)forBest:effort)Hardware TransactionalMemory](https://reader031.vdocument.in/reader031/viewer/2022022000/5a710de67f8b9aac538c7e7d/html5/thumbnails/26.jpg)
Improved Lock Acquisition Rate
26
Vacation Low (medium txns)
Kmeans High (small txns)
Intruder (medium txns)
Labyrinth (large txns)
0
5
10
15
20
25
30
1 2 4 8
% lock acquisitions
Threads
0
10
20
30
40
50
60
70
1 2 4 8
% lock acquisitions
Threads
051015202530354045
1 2 4 8
% lock acquisitions
Threads
HLE
E-SGL
L-SGL
0
10
20
30
40
50
60
70
80
1 2 4 8
% lock acquisitions
Threads
HLE
E-SGL
L-SGL
Better
![Page 27: ImprovedSingle)GlobalLock) Fallback)forBest:effort ...cs.brown.edu/~irina/slides/transact2014_slides.pdf · ImprovedSingle)GlobalLock) Fallback)forBest:effort)Hardware TransactionalMemory](https://reader031.vdocument.in/reader031/viewer/2022022000/5a710de67f8b9aac538c7e7d/html5/thumbnails/27.jpg)
No single thread overhead
27
Slowdown relative to sequential for 1 thread
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
Slowdown
TL2SGLHLEE-SGLL-SGL
![Page 28: ImprovedSingle)GlobalLock) Fallback)forBest:effort ...cs.brown.edu/~irina/slides/transact2014_slides.pdf · ImprovedSingle)GlobalLock) Fallback)forBest:effort)Hardware TransactionalMemory](https://reader031.vdocument.in/reader031/viewer/2022022000/5a710de67f8b9aac538c7e7d/html5/thumbnails/28.jpg)
Overview
• Best-effort Hardware Transactional Memory
• Lazy SGL
• Bloom Filter SGL
Description
Correctness
Results
28
![Page 29: ImprovedSingle)GlobalLock) Fallback)forBest:effort ...cs.brown.edu/~irina/slides/transact2014_slides.pdf · ImprovedSingle)GlobalLock) Fallback)forBest:effort)Hardware TransactionalMemory](https://reader031.vdocument.in/reader031/viewer/2022022000/5a710de67f8b9aac538c7e7d/html5/thumbnails/29.jpg)
Bloom Filters
• Efficient probabilistic data structure to compute fast set intersection
• Can admit false positives
• No false negatives
• Used in TM for Conflict Detection
29
![Page 30: ImprovedSingle)GlobalLock) Fallback)forBest:effort ...cs.brown.edu/~irina/slides/transact2014_slides.pdf · ImprovedSingle)GlobalLock) Fallback)forBest:effort)Hardware TransactionalMemory](https://reader031.vdocument.in/reader031/viewer/2022022000/5a710de67f8b9aac538c7e7d/html5/thumbnails/30.jpg)
Begin SW txn
Acquire L
Release LEnd
SW txn
BeginHW txn
Read LEnd
HW txn(1)
BeginHW txn
Read LEnd
HW txn(2)
BeginHW txn
BeginHW txn
Read LEnd
HW txn(3)
Read LEnd
HW txn(4)
XX
Legend:X = ABORT
COMMIT
COMMIT
Lazy SGL
Time
30
![Page 31: ImprovedSingle)GlobalLock) Fallback)forBest:effort ...cs.brown.edu/~irina/slides/transact2014_slides.pdf · ImprovedSingle)GlobalLock) Fallback)forBest:effort)Hardware TransactionalMemory](https://reader031.vdocument.in/reader031/viewer/2022022000/5a710de67f8b9aac538c7e7d/html5/thumbnails/31.jpg)
Begin SW txn
Acquire L
Release LEnd
SW txn
BeginHW txn
Check BFEnd
HW txn(1)
BeginHW txn
Check BFEnd
HW txn(2)
BeginHW txn
BeginHW txn
Read LEnd
HW txn(3)
Read LEnd
HW txn(4)
Legend:X = ABORT
COMMIT
COMMIT
BF SGL
Time
31
![Page 32: ImprovedSingle)GlobalLock) Fallback)forBest:effort ...cs.brown.edu/~irina/slides/transact2014_slides.pdf · ImprovedSingle)GlobalLock) Fallback)forBest:effort)Hardware TransactionalMemory](https://reader031.vdocument.in/reader031/viewer/2022022000/5a710de67f8b9aac538c7e7d/html5/thumbnails/32.jpg)
Thread 1(SW)
Acquire Lock…
X = a
…
Release Lock
TXN_BEGIN
…
X = b…
TXN_END
Thread 2(HW)
Time
Case 1: HW begins SW begins HW ends SW ends
X value:a
Correct: a Actual: a
Check BF
If BFs intersect: ABORTElse: COMMIT
32
![Page 33: ImprovedSingle)GlobalLock) Fallback)forBest:effort ...cs.brown.edu/~irina/slides/transact2014_slides.pdf · ImprovedSingle)GlobalLock) Fallback)forBest:effort)Hardware TransactionalMemory](https://reader031.vdocument.in/reader031/viewer/2022022000/5a710de67f8b9aac538c7e7d/html5/thumbnails/33.jpg)
Acquire Lock…
X = a
…
Release Lock
TXN_BEGIN…
X = b…
TXN_END
Thread 1(SW)
Thread 2(HW)
Case 2: SW beginsHW beginsHW endsSW ends
Correct: a Actual: b
TimeX value:
Check BF
If BFs intersect: ABORTElse: COMMIT 33
![Page 34: ImprovedSingle)GlobalLock) Fallback)forBest:effort ...cs.brown.edu/~irina/slides/transact2014_slides.pdf · ImprovedSingle)GlobalLock) Fallback)forBest:effort)Hardware TransactionalMemory](https://reader031.vdocument.in/reader031/viewer/2022022000/5a710de67f8b9aac538c7e7d/html5/thumbnails/34.jpg)
Conclusions
• HTMs are becoming more available
• Best-effort – need software fallback
• Eager SGL• simple and fast fallback, • often preferred to more efficient solutions
34
![Page 35: ImprovedSingle)GlobalLock) Fallback)forBest:effort ...cs.brown.edu/~irina/slides/transact2014_slides.pdf · ImprovedSingle)GlobalLock) Fallback)forBest:effort)Hardware TransactionalMemory](https://reader031.vdocument.in/reader031/viewer/2022022000/5a710de67f8b9aac538c7e7d/html5/thumbnails/35.jpg)
Conclusions
• Lazy SGL • as simple as Eager SGL• more efficient
• Bloom Filter SGL • more accurate conflict detection• Slower
• Can be implemented directly in hardware
35
![Page 36: ImprovedSingle)GlobalLock) Fallback)forBest:effort ...cs.brown.edu/~irina/slides/transact2014_slides.pdf · ImprovedSingle)GlobalLock) Fallback)forBest:effort)Hardware TransactionalMemory](https://reader031.vdocument.in/reader031/viewer/2022022000/5a710de67f8b9aac538c7e7d/html5/thumbnails/36.jpg)