the complexity of transactional memory & what to do about it hagit attiya technion & epfl
TRANSCRIPT
The Complexity of Transactional Memory & What to Do About It
Hagit AttiyaTechnion & EPFL
The Challenge of Concurrent Programming
A multi-core revolution is underwayExploit the power of concurrent
computing, by restructuring applications
Writing concurrent applications is harder than sequential programming
Transactional Memory (TM)
A way to deal with the difficulty of writing concurrent applications.
In its simplest form, just wrap code begin / end transaction
TM synchronizes memory accesses so that each transaction seems to execute sequentially and in isolation
begin-transaction
-------------------------------------------------------------end transaction
A Brief History of TMTM originally suggested as hardware
platform [Herlihy and Moss 1993]
Software transactional memory (STM), essentially optimized multi-word synchronization (static)[Shavit & Touitou 1995]
Popularization in the programming languages & architecture communities [Rajwar 2002]
First made dynamic only with a weaker liveness condition (obstruction-freedom)[Herlihy, Luchnagco, Moir and Schrer 2003]
The Promise
TM will track memory accesses and will allow transactions to proceed concurrently, if they are not conflicting
Optimism vs. pessimismbegin-transactionend-transaction
lock (entry)unock (exit)
2-3 Levels of AbstractionTransactions, each a sequence
of operations accessing data items, by a single thread
Operations – on data items: E.g., Read and
Write– TryCommit / TryAbort
Data set = Read set Write set
Primitives on base objects (load, store, CAS)
read read write tryC
More ModelingData representation for
transactions and data items using base objects
Algorithms for operations, applying primitives to base objects– load, store, CAS, DCAS
Asynchronous processes invoke these procedures
Lead to interleaved executions, in the standard sense
STM--------------------
SafetySerializability: transactions appear
to execute sequentiallyStrict serializability: preserves the
order of non-overlapping transactions [Papadimitriou 1979]
Opacity: even transactions that later abort are (strictly) serializable [Guerraoui, Kapalka POPL 2008]– Also support for operations other than
read and write.
Snapshot isolation
serializability
strictserializability
opacity
snapshotisolation
The Many Faces of Progress
TM may abort transactions, in case of conflicts
Could admit trivial implementationsSeveral progress properties
When locking is not allowed:• Wait-freedom• Obstruction-freedom
Progress for Lock-Based TM
Better performance with locks [Dice, Shalev, Shavit DISC 2006]
Weakly progressive: a transaction aborts only if it has conflicts [Guerraoui, Kapalka POPL 2009]
Strongly progressive: at least one of the transactions involved in the conflict commits
Minimally progressive: a transaction commits if it runs alone, with no pending transactions
Multi-version permissive: only an update transaction that conflicts with another update transaction aborts[Perlman, Fan, Keidar PODC 2010]
Read-only transactions always commit
minimallyprogressive
weaklyprogressive
obstructionfree
multi-valuedpermissive
stronglyprogressive
waitfree
Minimally progressive TMs solve consensus for at most two processes [Guerraoui, Kapalka SPAA 2008]
Their consensus number is 2Holds for obstruction-free and weakly
progressive
Key step: equivalence with a consensus object that fails in a very clean manner[A, Guerraoui, Hendler, Kuznetsov PODC 2006]
propose
decide(v) / fail
The Consensus Number of TM
Invisible ReadsOptimize read-only transactions, which in
principle, need not modify the shared memory
Invisible reads: Read operations do not store Read-only transactions do not store at all
Semi-visible read operations store some information, but not very detailedE.g., [Dice, Matveev, Shavit Transact 2010]
Oblivious STM [A & Hillel DISC 2010]
Step Complexity Lower Bound
[Guerraoui, Kapalka PPoPP 2008]
A read operation has O( | read set | ) step complexity, in an STM that is– single version– with invisible reads– weakly progressive
Predicting TM ScalabilityUnrelated transactions progress independently even
if they are concurrent
Represent relations between transactions by a conflict graph:– Vertices represent transactions,– Edges connect transactions that share a data item
T1{A,B,C}, T2{A,D}, T3{D,E}, T4{F,L}, T5{L}, T6{J}
Disjoint access transactions are not connected in the graph
Strictly disjoint access transactions are not adjacent
T4
T5
T1
T6
T2
T3
Disjoint Access Parallelism
TM is DAP: Two transactions concurrently contend on the same base object, only if they are not disjoint-access
~ [Israeli and Rappoport PODC 1995]
Similar definition for strict DAP
T4
T5
T1
T6
T2
T3
access the same base object, at least one a store
Achieving Disjoint-Access Parallelism
No obstruction-free and strict DAP STM [Guerraoui, Kapalka 2008]
But there is obstruction-free and DAP STM [Herlihy, Luchnagco, Moir and Schrer 2003]
Not if read-only transactions are invisible and always succeed to commit [A, Hillel, Milani SPAA 2009]
Achieving DAP
[A, Hillel, Milani SPAA 2009]
Holds for strict serializability and opacityAlso for serializability and snapshot
isolation (under a slightly stronger notion of DAP)
A read-only transaction have O( | read set | ) stores when the STM is – MV-permissive (read-only transactions commit) – DAP
PrivatizationApply loads and stores to the
underlying data (un instrumented access)
Avoids transactional overhead
[Spear, Marathe, Dalessandro, Scott 2007]
[Shpeisman, Menon, Adl-Tabatabai, Balensiefer, Grossman, Hudson, Moore, Saha 2007]
STM
Cost of PrivatizationCannot be achieved without prior
privatization [Guerraoui, Henzinger, Kapalka, Singh SPAA 2010] [A, Hillel DISC 2010]
Must invoke a privatizing transaction or a privatizing barrier[Dice, Matvev Shavit Transact 2010]
STM
Unless parallelism is reduced or detailed information is kept, privatization cost is linear in the number of privatized items[A, Hillel DISC 2010]
And a few more results…
So, In Theory
TM cannot efficiently provide clean semanticseither weaken the consistency semantics or compromise the progress guarantees
Limited scalability & significant cost
TM is not an expressive programming idiom
But In Practice, We are Fine, No?
Not really…Worst-case lower bounds are not for corner
cases– likely to happen in practice– hard to program around them
Implementation-focused research seems to be hitting the same wall [Cascaval, Blundell, Michael, Cain, Wu, Chiras, Chatterjee 2008]
Design choices compromise either simplicity – Elastic STM [Felber, Gramoli, Guerraoui, DISC 2009]
Or scalability– Single-lock STMs
[Olszewski, Cutler, Steffan] [Dalessandro, Spear, Scott]
A Post-TM EraTM cannot make programs run correctly and
efficiently, without programmer’s awareness
Stop hiding the realities of concurrency • Expose a cleaner model of a multi-core that
does not hide tradeoffs• Provide additional methodologies and tools
Multitude of approaches– I will discuss two
Approach I: Optimizing Coarse-Grain Programming
For applications with moderate amount of contention (say <32 threads), the overhead of managing the memory can outweigh synchronization cost
Access the data mostly “in exclusion”
Combining: The thread winning the lock carries out many of the pending operations [Hendler, Incze, Shavit, Tzafrir SPAA 2010]
Without locking: optimize the memory utilization of Herlihy's universal construction [Chuong, Ellen, Ramachandran SPAA 2010]
Approach II: Programming with Mini-Transactions
Extension of DCAS or kCAS (for small k’s) or multi-location variant of LL/SC [PowerPC, DEC Alpha]
– Short – Works on a small, static data set– Simple functionality– No I/O, out-of-core memory accesses,
etc.
May fail spuriously
Mini-Transactions
Lower bounds use• large, dynamic
data sets
• long transactions
• accessed w/ arbitrary operations and unrestricted calculations
Mini-transactions• small, static data
sets
• short transactions
• simple functionality, e.g., arithmetic, comparison, and memory access
Mini-Transactions & HTM
Mini-transaction are almost provided by recent hardware TM proposals – AMD Advanced Synchronization Facility
[2009] – Sun [Chaudhry, Cypher, Ekman, Karlsson, Landin, Yip,
Zeffer, and Tremblay Micro 2009]
Best-effort: transactions can be aborted for reasons other than conflicts– TLB misses, interrupts, certain function-
call sequences, division instructions
Algorithmic Challenges• Mini-transactions provide a significant handle on
the difficult task of writing concurrent applications– DCAS is already a big help [A, Hillel, 2006, 2009]– Experience with hardware TM support
[Dice, Lev, Marathe, Moir, Olszewski, Nussbaum SPAA 2010] [Carouge, Spear, DISC 2010]
• Design algorithms accommodating the best-effort nature of mini-transactions
• Avoid sure killers• Work around the small data sets
– amorphous data parallelism [Pingali, Kulkarni, Nguyen, Burtscher, Mendez-Lojo, Prountzos, Sui, Zhong 2009]
Programming Support
Creating patterns for employing mini-transactions, hopefully, encapsulated within programming language support
Cleanly combine with native (un instrumented) access to the locations accessed by mini-transactions– Beware of privatization scenarios
Summary
• Facilitate the design of efficient and correct concurrent applications, in the post-TM era.– Capitalize on lessons learned and wide
interest in TM– Multitude of approaches
• Specifically, develop a model, algorithms and programming patterns that for best-effort mini-transactions
Thank you!