the complexity of transactional memory & what to do about it hagit attiya technion & epfl

The Complexity of Transactional Memory & What to Do About It

Hagit AttiyaTechnion & EPFL

The Challenge of Concurrent Programming

A multi-core revolution is underwayExploit the power of concurrent

computing, by restructuring applications

Writing concurrent applications is harder than sequential programming

Transactional Memory (TM)

A way to deal with the difficulty of writing concurrent applications.

In its simplest form, just wrap code begin / end transaction

TM synchronizes memory accesses so that each transaction seems to execute sequentially and in isolation

begin-transaction

-------------------------------------------------------------end transaction

A Brief History of TMTM originally suggested as hardware

platform [Herlihy and Moss 1993]

Software transactional memory (STM), essentially optimized multi-word synchronization (static)[Shavit & Touitou 1995]

Popularization in the programming languages & architecture communities [Rajwar 2002]

First made dynamic only with a weaker liveness condition (obstruction-freedom)[Herlihy, Luchnagco, Moir and Schrer 2003]

The Promise

TM will track memory accesses and will allow transactions to proceed concurrently, if they are not conflicting

Optimism vs. pessimismbegin-transactionend-transaction

lock (entry)unock (exit)

2-3 Levels of AbstractionTransactions, each a sequence

of operations accessing data items, by a single thread

Operations – on data items: E.g., Read and

Write– TryCommit / TryAbort

Data set = Read set Write set

Primitives on base objects (load, store, CAS)

read read write tryC

More ModelingData representation for

transactions and data items using base objects

Algorithms for operations, applying primitives to base objects– load, store, CAS, DCAS

Asynchronous processes invoke these procedures

Lead to interleaved executions, in the standard sense

STM--------------------

SafetySerializability: transactions appear

to execute sequentiallyStrict serializability: preserves the

order of non-overlapping transactions [Papadimitriou 1979]

Opacity: even transactions that later abort are (strictly) serializable [Guerraoui, Kapalka POPL 2008]– Also support for operations other than

read and write.

Snapshot isolation

serializability

strictserializability

opacity

snapshotisolation

The Many Faces of Progress

TM may abort transactions, in case of conflicts

Could admit trivial implementationsSeveral progress properties

When locking is not allowed:• Wait-freedom• Obstruction-freedom

Progress for Lock-Based TM

Better performance with locks [Dice, Shalev, Shavit DISC 2006]

Weakly progressive: a transaction aborts only if it has conflicts [Guerraoui, Kapalka POPL 2009]

Strongly progressive: at least one of the transactions involved in the conflict commits

Minimally progressive: a transaction commits if it runs alone, with no pending transactions

Multi-version permissive: only an update transaction that conflicts with another update transaction aborts[Perlman, Fan, Keidar PODC 2010]

Read-only transactions always commit

minimallyprogressive

weaklyprogressive

obstructionfree

multi-valuedpermissive

stronglyprogressive

waitfree

Minimally progressive TMs solve consensus for at most two processes [Guerraoui, Kapalka SPAA 2008]

Their consensus number is 2Holds for obstruction-free and weakly

progressive

Key step: equivalence with a consensus object that fails in a very clean manner[A, Guerraoui, Hendler, Kuznetsov PODC 2006]

propose

decide(v) / fail

The Consensus Number of TM

Invisible ReadsOptimize read-only transactions, which in

principle, need not modify the shared memory

Invisible reads: Read operations do not store Read-only transactions do not store at all

Semi-visible read operations store some information, but not very detailedE.g., [Dice, Matveev, Shavit Transact 2010]

Oblivious STM [A & Hillel DISC 2010]

Step Complexity Lower Bound

[Guerraoui, Kapalka PPoPP 2008]

A read operation has O( | read set | ) step complexity, in an STM that is– single version– with invisible reads– weakly progressive

Predicting TM ScalabilityUnrelated transactions progress independently even

if they are concurrent

Represent relations between transactions by a conflict graph:– Vertices represent transactions,– Edges connect transactions that share a data item

T1{A,B,C}, T2{A,D}, T3{D,E}, T4{F,L}, T5{L}, T6{J}

Disjoint access transactions are not connected in the graph

Strictly disjoint access transactions are not adjacent

T4

T5

T1

T6

T2

T3

Disjoint Access Parallelism

TM is DAP: Two transactions concurrently contend on the same base object, only if they are not disjoint-access

~ [Israeli and Rappoport PODC 1995]

Similar definition for strict DAP

T4

T5

T1

T6

T2

T3

access the same base object, at least one a store

Achieving Disjoint-Access Parallelism

No obstruction-free and strict DAP STM [Guerraoui, Kapalka 2008]

But there is obstruction-free and DAP STM [Herlihy, Luchnagco, Moir and Schrer 2003]

Not if read-only transactions are invisible and always succeed to commit [A, Hillel, Milani SPAA 2009]

Achieving DAP

[A, Hillel, Milani SPAA 2009]

Holds for strict serializability and opacityAlso for serializability and snapshot

isolation (under a slightly stronger notion of DAP)

A read-only transaction have O( | read set | ) stores when the STM is – MV-permissive (read-only transactions commit) – DAP

PrivatizationApply loads and stores to the

underlying data (un instrumented access)

Avoids transactional overhead

[Spear, Marathe, Dalessandro, Scott 2007]

[Shpeisman, Menon, Adl-Tabatabai, Balensiefer, Grossman, Hudson, Moore, Saha 2007]

STM

Cost of PrivatizationCannot be achieved without prior

privatization [Guerraoui, Henzinger, Kapalka, Singh SPAA 2010] [A, Hillel DISC 2010]

Must invoke a privatizing transaction or a privatizing barrier[Dice, Matvev Shavit Transact 2010]

STM

Unless parallelism is reduced or detailed information is kept, privatization cost is linear in the number of privatized items[A, Hillel DISC 2010]

And a few more results…

So, In Theory

TM cannot efficiently provide clean semanticseither weaken the consistency semantics or compromise the progress guarantees

Limited scalability & significant cost

TM is not an expressive programming idiom

But In Practice, We are Fine, No?

Not really…Worst-case lower bounds are not for corner

cases– likely to happen in practice– hard to program around them

Implementation-focused research seems to be hitting the same wall [Cascaval, Blundell, Michael, Cain, Wu, Chiras, Chatterjee 2008]

Design choices compromise either simplicity – Elastic STM [Felber, Gramoli, Guerraoui, DISC 2009]

Or scalability– Single-lock STMs

[Olszewski, Cutler, Steffan] [Dalessandro, Spear, Scott]

A Post-TM EraTM cannot make programs run correctly and

efficiently, without programmer’s awareness

Stop hiding the realities of concurrency • Expose a cleaner model of a multi-core that

does not hide tradeoffs• Provide additional methodologies and tools

Multitude of approaches– I will discuss two

Approach I: Optimizing Coarse-Grain Programming

For applications with moderate amount of contention (say <32 threads), the overhead of managing the memory can outweigh synchronization cost

Access the data mostly “in exclusion”

Combining: The thread winning the lock carries out many of the pending operations [Hendler, Incze, Shavit, Tzafrir SPAA 2010]

Without locking: optimize the memory utilization of Herlihy's universal construction [Chuong, Ellen, Ramachandran SPAA 2010]

Approach II: Programming with Mini-Transactions

Extension of DCAS or kCAS (for small k’s) or multi-location variant of LL/SC [PowerPC, DEC Alpha]

– Short – Works on a small, static data set– Simple functionality– No I/O, out-of-core memory accesses,

etc.

May fail spuriously

Mini-Transactions

Lower bounds use• large, dynamic

data sets

• long transactions

• accessed w/ arbitrary operations and unrestricted calculations

Mini-transactions• small, static data

sets

• short transactions

• simple functionality, e.g., arithmetic, comparison, and memory access

Mini-Transactions & HTM

Mini-transaction are almost provided by recent hardware TM proposals – AMD Advanced Synchronization Facility

[2009] – Sun [Chaudhry, Cypher, Ekman, Karlsson, Landin, Yip,

Zeffer, and Tremblay Micro 2009]

Best-effort: transactions can be aborted for reasons other than conflicts– TLB misses, interrupts, certain function-

call sequences, division instructions

Algorithmic Challenges• Mini-transactions provide a significant handle on

the difficult task of writing concurrent applications– DCAS is already a big help [A, Hillel, 2006, 2009]– Experience with hardware TM support

[Dice, Lev, Marathe, Moir, Olszewski, Nussbaum SPAA 2010] [Carouge, Spear, DISC 2010]

• Design algorithms accommodating the best-effort nature of mini-transactions

• Avoid sure killers• Work around the small data sets

– amorphous data parallelism [Pingali, Kulkarni, Nguyen, Burtscher, Mendez-Lojo, Prountzos, Sui, Zhong 2009]

Programming Support

Creating patterns for employing mini-transactions, hopefully, encapsulated within programming language support

Cleanly combine with native (un instrumented) access to the locations accessed by mini-transactions– Beware of privatization scenarios

Summary

• Facilitate the design of efficient and correct concurrent applications, in the post-TM era.– Capitalize on lessons learned and wide

interest in TM– Multitude of approaches

• Specifically, develop a model, algorithms and programming patterns that for best-effort mini-transactions

Thank you!

the complexity of transactional memory & what to do about it hagit attiya technion & epfl

Documents