reduced hardware norec: a safe and scalable hybrid transactional memory alexander matveev nir shavit...
TRANSCRIPT
Reduced Hardware NOrec: A Safe and Scalable
Hybrid Transactional Memory
Alexander MatveevNir Shavit
MIT
Good: Hardware Transactional Memory (HTM)
HTM may always fail due to:1. L1 cache capacity2. Interrupt3. Unsupported instruction
Bad: The HTM is “best-effort”
To ensure progress, we need
a software fallback
Thread 1 Thread 2
1. HTM Start
2. Read lock and check it is free
3. ... code …
4. HTM Commit
1. HTM Start
2. Read lock and check it is free
3. ... code …
4. HTM Commit
No conflict – HTMs commit concurrently
A Possible Solution is:Lock Elision
1. Lock1. Lock
2. Unlock2. Unlock
Thread 1 Thread 2
1. HTM Start
2. Read lock and check it is free
3. ... code …
1. HTM Start
2. Read lock and check it is free
3. ... code …
No concurrency between hardware and software
Thread 3
1. HTM Start
2. Read lock and check it is free
3. ... code …3. ... FAIL … HTM Restart
1. Acquire Lock
2. ... code …
3. Release Lock
4. ... CONFLICT … HTM Restart
4. ... CONFLICT … HTM Restart
Wait for LockWait for Lock
A Possible Solution is:Lock Elision
• Good– Simple: No need to instrument reads and
writes• Bad:
– Serial fallback: A software fallback grabs the global lock and aborts all hardware transactions
A Possible Solution is:Lock Elision
Thread 1 Thread 2
1. HTM Start
2. Read lock and check it is free
3. ... code …
1. HTM Start
2. Read lock and check it is free
3. ... code …
Thread 3
1. HTM Start
2. Read lock and check it is free
3. ... code …3. ... FAIL … HTM Restart
1. STM Start
2. ... code …
3. … more code …
4. ... more code …4. ... more code
STM and HTM execute concurrently
Another Approach is:Hybrid Transactional Memory
• Good– Hardware-Software Concurrency
• Bad:– Complex:1. Hard to coordinate hardware and software
2. Hard to apply to code due to instrumentation
Another Approach is:Hybrid Transactional Memory
Our focus
GCC C/C++ TM helps here a lot
• 2006: First Hybrid TM [DamronFedorovaLevLuchangcoMoirNussbaum]
– Key Idea: Use per location metadata version-locks to coordinate hardware and software
• Bad:– Hardware is slow: on each read/write must
read the version-lock and execute a branch condition check
Hybrid TM History
• 2007: Phased TM [LevMoirNussbaum]
– Key Idea: Use HTM mode or STM mode, but not HTM and STM at the same time
• Bad:– Expensive to switch modes: a single fallback
must stop all hardware
Hybrid TM History
• 2011: Hybrid Norec (state-of-the-art) [DalessandroCarougeWhiteLevMoirScottSpear]
– Key Idea: No metadata + global clock for coordination
Hybrid TM History
• Good– No metadata: Efficient for low concurrency
• Bad:– Limited Scalability: too much aborts due to
global clock updates• A software write must abort all hardware• A hardware write must abort all software
Hybrid NOrec
Hybrid NOrec
Slow-Path: Software
Read X (pure)Lock clock
ABORTX = 4
Fast-Path: Hardware
Unlock clock
Read clock Read clock
Read clockRead X
Read clock
RESTART
Update clock
Read X (verify clock)
Read X:check clock =>
changed => restart/revalidate
• 2011: Hybrid NOrec 2 [RiegelMarlierNowackFelberFetzer]
– Key Idea: Use non-speculative reads inside HTM to verify the global clock and avoid unnecessary aborts
• Bad:– HTM of Intel and IBM has no support for non-
speculative reads
A Possible Solution
• 2014: Invyswell Hybrid [CalciuGottschlichShpeismanPokamHerlihy]
– Key Idea: Allow unsafe concurrency between hardware and software, and use the HTM sandboxing to detect and handle errors
A Recent Approach
Invyswell
Slow-Path: Software
Read X (NEW)
Lock clock
X = 4 (NEW)
Read Y (OLD)
Func(X, Y): UnsafeHopes HTM aborts
Y = 8 (NEW)
Unlock clock
Update clock
Fast-Path: Hardware
NO ABORT
FUTURE
• Good– Much less aborts than Hybrid Norec
• Bad:– Unfortunately, HTM sandboxing may miss
errors, so a corrupted transactions may commit and crash the system:
– This problem was shown in a recent work: “Pitfalls of Lazy Subscription” by [DiceHarrisKoganLevMoir]
Invyswell
• 2015: RH NOrec [MatveevShavit]
– Key Idea: Use a “mixed” fallback path, that uses both software and short hardware transactions
Our New Approach
RH NOrecSlow-Path: Software
Read X (NEW)
Lock clock
X = 4 (NEW)
Read Y (OLD)
Func(X, Y): UnsafeHopes HTM aborts
Y = 8 (NEW)
Unlock clock
Update clock
Fast-Path: Hardware
X = 4 (HIDDEN)
Y = 8 (HIDDEN)
HTM
X and Y both OLD or both NEW – not a mix
Read X (OLD)
Read Y (OLD)
Func(X, Y) Safe!
A Writes are speculative (invisible)
Mixed Slow-Path
• Key Point 1: Execute software writes in a short hardware transaction – No need to abort hardware transactions– Full safety
• In practice this works well– Due to the 80:20 rule: a typical operation has
80% reads and 20% writes
RH NOrec
• Key Point 2: Execute a maximal amount of initial software reads in a read-only hardware transaction – Allows to defer the global clock read, and
significantly reduce the software restarts/revalidations
RH NOrec
HTM start
…reads/writes…
Update clock
HTM commit
Fast-Path: Hardware Mixed Path
Read clock
RESTARTRead some X:check clock =>
changed => restart/revalidate
… reads in software …(verifies clock)
HTM start
…reads/writes…
Update clock
HTM commit
HTM start
…reads in HTM… (pure/direct)
Read clock
HTM commit
HTM Prefix
Fast-Path: Hardware Mixed Path
NO ABORTNO ABORT
HTM start
…reads/writes…
Update clock
HTM commit
HTM start
…reads in HTM… (pure/direct)
Read clock
HTM commit
HTM Prefix
…reads in software…
HTM start
HTM commit
HTM Postfix
Lock clock
…writes in HTM…
Unlock clock
HTM start
Update clock
HTM commitNO ABORTNO ABORT
…reads/writes…
Throughput on 8-core Intel (GCC C/C++)
1 2 4 6 8 10 12 14 160.00E+00
1.00E+08
2.00E+08
3.00E+08
4.00E+08
5.00E+08
6.00E+08
7.00E+08
Lock ElisionRH-NORecTL2HY-NORec
Red-Black Tree (10K)10% mutations
1 2 4 6 8 10 12 14 160.00E+00
5.00E+07
1.00E+08
1.50E+08
2.00E+08
2.50E+08
3.00E+08
3.50E+08
4.00E+08
4.50E+08
Lock ElisionRH-NORecTL2HY-NORec
Red-Black Tree (10K)40% mutations
1 2 4 6 8 10 12 14 160.00E+00
2.00E+05
4.00E+05
6.00E+05
8.00E+05
1.00E+06
1.20E+06
1.40E+06
Lock Elision RH-NORecTL2 HY-NORecNORec
Vacation Database (STAMP - Low)
1 2 4 6 8 10 12 14 160.00E+00
5.00E+05
1.00E+06
1.50E+06
2.00E+06
2.50E+06
3.00E+06
3.50E+06
4.00E+06
Lock Elision RH-NORecTL2 HY-NORecNORec
Intruder Detection (STAMP)
1 2 4 6 8 10 12 14 160.00E+001.00E+052.00E+053.00E+054.00E+055.00E+056.00E+057.00E+058.00E+059.00E+051.00E+06
Lock Elision RH-NORecTL2 HY-NORecNORec
Genome Sequencing (STAMP)
1 2 4 6 8 10 12 14 160.00E+00
5.00E+05
1.00E+06
1.50E+06
2.00E+06
2.50E+06
3.00E+06
3.50E+06
4.00E+06
4.50E+06
Lock Elision RH-NORecTL2 HY-NORecNORec
SSCA2 (STAMP)
• RH Norec: a new Hybrid TM that is safe and scalable
• Key Idea: Use a “mixed” fallback path that uses two short hardware transactions:1. HTM Prefix: Executes a maximal amount of
initial reads – defers the global clock read2. HTM Postfix: Executes the software writes –
preserves safety and allows hardware-software concurrency
Conclusion
Thank You