region-based dynamic separation for stm haskell · stm haskell. dynamic separation is a recent...

Draft copy. Please do not distribute.

Region-Based Dynamic Separation for STM Haskell

Laura Effinger-Dean Dan GrossmanUniversity of Washington

{effinger,djg}@cs.washington.edu

AbstractWe present a design and implementation of dynamic separation inSTM Haskell. Dynamic separation is a recent approach to softwaretransactional memory (STM) that achieves strongly-atomic seman-tics with performance comparable to that of a weakly-atomic STM.STM Haskell, a lazy-versioning STM library for Haskell, previ-ously supported strongly-atomic semantics via static separation,and we have found dynamic separation to be a natural extensionof the library’s interface.

Our implementation of dynamic separation has two novel exten-sions. First, we improve support for mutable data structures by pro-viding protection regions, special objects that hold the protectionstate for multiple references. Second, we expand the set of protec-tion states so that the interface is more expressive.

We give a formal semantics for region-based dynamic separa-tion and prove that a weak lazy-versioning STM supports strongatomicity (for a semantics without regions). We have evaluated theperformance of our system on a suite of STM Haskell programs.We also discuss an alternate implementation that detects inconsis-tent protection states at run time.

1. IntroductionDesktop multiprocessors have become standard, but writing shared-memory software remains a challenge. It is difficult to synchro-nize shared-memory access among threads using conventional con-structs like locks. Transactions offer a key simplifying abstraction:a transaction behaves as though it executes without interleavingfrom other threads. Language support for transactions on top of animplementation that uses software transactional memory (STM) isan attractive trend and the subject of ongoing research.

Haskell’s purely-functional core and monadic type system areremarkably well-suited to the STM programming model. STMHaskell, an implementation of transactions for Concurrent Haskell,is beautifully designed, allowing transactions to be composed at runtime in sequence or as alternatives [11]. Transactions read and writemutable variables called TVars, but Haskell’s type system preventsother side effects (e.g., to IORefs).

STM Haskell’s division of mutable variables into transactionaland nontransactional types neatly sidesteps a tricky issue in STMsemantics and implementations: weak vs. strong atomicity. In so-called weakly-atomic implementations, nontransactional reads andwrites bypass the STM entirely. If such accesses conflict with

[Copyright notice will appear here once ’preprint’ option is removed.]

transactional accesses, they can compromise high-level semanticguarantees such as atomicity and isolation in surprising ways (seeSection 2.2 for an example). Yet using the STM mechanism forall memory accesses (to ensure strong atomicity) can be expensiveand/or introduce problems with legacy code.

Solutions to this dilemma provide the performance of weakatomicity and the semantics of strong atomicity. The static separa-tion of transactional and nontransactional memory in STM Haskellsuffices, as has been proven in more formal settings [2, 16], but isconservative. Several common concurrency idioms, such as initial-izing a mutable structure outside a transaction or privatizing datathat was previously thread-shared, require dynamically changingwhether a reference is used inside or outside transactions.

Dynamic separation [4, 1] addresses this limitation directly byhaving programmers make explicit calls to change the “protectionstate” of references so that the STM is properly notified. For exam-ple, when a thread publicizes a reference, it must first call a primi-tive that changes the reference’s protection state to protected—thatis, only accessible from transactions. While these state changes in-teract with the STM, they should be relatively rare compared tonontransactional reads and writes.

In this paper, we achieve two high-level design goals. First, weenrich STM Haskell with dynamic separation. Because it inher-ently has less compile-time checking than static separation, we alsopreserve static separation, letting programmers choose which theywant — or use both in the same program. Crucially, our monadicinterface allows code reuse in which library writers can providecode that can be used under either discipline. We present our li-brary design, give a formal semantics, prove dynamic separation issound, and use dynamic separation on some small programs.

Second, a key innovation in our work is support for regions:groups of locations with the same sharing state that can have theirstate changed with a single, constant-time operation on a special re-gion object. In prior work, protection-state changes applied to onlya single reference, which we show is at odds with efficiency andabstraction. Regions are a natural addition to the language designbut required us to make careful modifications to the low-level STMimplementation. The need for regions is not specific to Haskell:defining them in the context of STM Haskell’s precise semanticsand type system should positively influence their adoption in othersettings, just as the original STM Haskell has proven influential.

The more specific contributions of this work are as follows:

• We present the first implementation of dynamic separation in alazy-versioning STM. (STM Haskell uses lazy versioning; priordynamic-separation work assumed eager updates.)

• We define an interface that allows static and dynamic separationto exist side-by side within Haskell’s type system.

• We introduce regions to hold the protection state for all refer-ences in that region. This change lets mutable structures sharea single state among all their objects, saving tedious and expen-sive data-structure traversals.

1 2010/6/14

• We have a richer set of protection states than prior work. Wesupport a thread-local state for references that currently cannotbe read or written by other threads, and a read-only state forreferences that are currently immutable.

• We define a high-level operational semantics and formalize thenotion of dynamic separation with respect to this semantics. Wealso define a lower-level semantics to model a lazy-versioningSTM. We prove the equivalence of simplified versions of thesetwo semantics (omitting regions) for programs that obey dy-namic separation.

• We discuss the details of our STM Haskell implementation, aswell as an alternate implementation that detects inconsistentprotection states during nontransactional reads and writes.

• We evaluate our implementation on some small programs.

2. Language DesignIn this section, we review STM Haskell’s design and the semanticproblems introduced by weak atomicity. We then give a high-leveloverview of our extensions to STM Haskell to support dynamicseparation.

2.1 STM HaskellConcurrent Haskell supports software memory transactions withSTM Haskell, a library with a composable interface for creatingmodular transactions [11]. When choosing a system in which toimplement our extensions to dynamic separation, we found thatSTM Haskell had a number of desirable properties, including anelegant interface, a relatively small and extensible implementation,and a set of benchmarks [17].

STM Haskell’s programming interface appears in Figure 1. Acrucial feature of this interface is that all operations yield resultsin the STM monad, and therefore have no effect until explicitly ex-ecuted with atomically. Although Haskell supports regular mu-table references via the IORef type, transactions use a special typeof reference called a TVar, or transactional variable. TVars are cre-ated, read and written with newTVar, readTVar, and writeTVar,respectively. Transactions generally consist of one or more TVaroperations sequenced together with monadic bind, like so:

increment :: TVar Int -> STM ()increment t = do {

x <- readTVar t;writeTVar t (x + 1) }

Note that calling increment on a TVar t does not increment t.Rather, it produces an STM action, which may then be wrapped withatomically to produce an IO action.

The advantage of placing STM operations in a separate monadis composability: client programmers can combine STM operationsexported by different libraries to form larger transactions. Thisfunctionality is similar to that provided by nested transactions,but the authors of STM Haskell have carefully thought out thesemantics for exceptions and blocking (manual transaction abortvia retry) in relation to composed transactions. STM Haskellalso provides orElse, an operator for choice between two nestedtransactions. orElseM1 M2 runs M2 if M1 results in retry.

2.2 Semantic Problems with Weak AtomicityBy dividing mutable references into TVars and IORefs, which canbe accessed only inside or outside transactions, respectively, STMHaskell provides strongly-atomic transaction semantics via staticseparation. The type system statically prevents transactions fromracing with nontransactions, eliminating the semantic problems ofweak atomicity. We might naively wish to improve STM Haskell’sexpressivity by adding weak atomicity to the interface:

data STM ainstance Monad STM

data TVar anewTVar :: a -> STM (TVar a)readTVar :: TVar a -> STM awriteTVar :: TVar a -> a -> STM ()

retry :: STM aorElse :: STM a -> STM a -> STM a

atomically :: STM a -> IO a

Figure 1. STM Haskell’s programming interface.

Initially x holds 0 and x_sh holds True.

Thread 1:

atomically (writeTVar x_sh False);

r1 <- readTVarIO x;r2 <- readTVarIO x;

Thread 2:

atomically (do {sh <- readTVar x_sh;if shthen writeTVar x 1else return ()

});

Can r1 6= r2?

Figure 2. Buffered writes example in STM Haskell.

readTVarIO :: TVar a -> IO awriteTVarIO :: TVar a -> a -> IO ()

readTVarIO and writeTVarIO read and write TVars without us-ing a transaction. Depending on how they are implemented, thesefunctions may undermine the semantics of transactions.

Consider the program in Figure 2. x_sh holds a boolean thatindicates whether x is shared. Thread 1 enters a transaction, setsx_sh to false, ends its transaction, and reads x twice. Meanwhile,thread 2 updates x inside a transaction, first checking to see if xis shared by reading x_sh. Under strongly-atomic semantics, itshould always be the case that thread 1 sees two identical valuesfor x: if thread 1’s transaction runs first, then both reads of x yield0; if thread 2’s transaction runs first, both reads yield 1.

This example exposes buffered writes, a known problem inlazy-versioning STMs with weak atomicity [20]. In lazy-versioningSTMs, a transaction buffers its writes in a private log. After thetransaction commits (i.e., verifies that it saw a consistent viewof memory), it writes the values in its log out to memory in anarbitrary order. The STM mechanism ensures that transactions willalways see a consistent state, but if nontransactional reads andwrites do not interact with the STM, they may observe the stateof a partially committed transaction.

The critical scenario for Figure 2 is as follows. Thread 2 startsits transaction, reads True from x_sh, and buffers its update to x.The transaction successfully commits, but does not yet write backthe new value for x. Thread 1 starts its transaction, buffers a write tox_sh, and tries to commit. Thread 2 has not finished its write to x,but thread 1’s transaction did not access x, so thread 1 successfullycommits. Thread 1 completes its transaction by writing back Falseto x_sh, then reads 0 from x. Thread 2 writes 1 to x, and finallythread 1 reads 1—a different value—from x.

This example and others [20] demonstrate that unprotectedreads and writes break the high-level semantics of transactions,even for programs that appear to be free of data races. Implemen-tation details that should normally be invisible to programmers,such as buffered writes in lazy-versioning STMs, may interact with

2 2010/6/14

data DSTM ainstance Monad DSTM

-- DVars reside in the new DSTM monaddata DVar anewDVar :: a -> DSTM (DVar a)readDVar :: DVar a -> DSTM awriteDVar :: DVar a -> a -> DSTM ()

-- Primitives to change a DVar’s protection stateprotectDVar :: DVar a -> IO ()unprotectDVar :: DVar a -> IO ()makeReadOnlyDVar :: DVar a -> IO ()makeThreadLocalDVar :: DVar a -> IO ()

-- Functions to allow DSTM code in STM or IO monadprotected :: DSTM a -> STM aunprotected :: DSTM a -> IO a

Figure 3. Interface for dynamic separation with per-DVar protec-tion states (extends Figure 1).

weak operations in unexpectedly subtle ways. We could solve thisproblem by having readTVarIO and writeTVarIO interact withthe STM (e.g., by calling atomically), or simply not provid-ing these nontransactional operations (enforcing static separation).This paper focuses on another approach: dynamic separation.

2.3 Dynamic SeparationDynamic separation is a recent approach for providing strongly-atomic semantics with the performance of a weak implementation[4, 1]. In dynamic separation, the programmer explicitly makesa call to the STM whenever a reference changes its “protectionstate.” For example, in Figure 2, x goes from being “protected”(shared between multiple threads and accessed only in transactions)to “unprotected” (inaccessible from transactions).

In Figure 2, we can recover strong semantics by calling a newfunction, unprotectTVarIO, on x after thread 1’s transactioncompletes but before the first readTVarIO. This call interacts withthe STM, waiting until all transactions currently accessing its ar-gument have completed. Recall the problematic scenario for thisexample that we described in Section 2.2: the error arose becauseone of thread 1’s nontransactional reads took place before thread2 had finished updating x. The call to unprotectTVarIO solvesthis problem by forcing thread 1 to wait for thread 2’s transactionto complete before continuing. The threads still need to coordi-nate access to x using x_sh; the call to unprotectTVarIO simplyeliminates any race conditions caused by the weak implementation.

The discussion thus far considered a hypothetical implemen-tation that allows unprotected access to TVars via readTVarIOand writeTVarIO. Our actual implementation takes a different ap-proach: we introduce a new type of reference, dynamic variables.DVars may be accessed inside or outside a transaction and havea dynamic protection state. Because TVars and DVars are distincttypes, programmers can use both dynamic and static separation inSTM programs. Figure 3 extends the interface for STM Haskellwith support for dynamic separation. The interface in Figure 3 doesnot include protection regions, which we will add in Section 2.5.Programmers can create, read, and write DVars with the functionsnewDVar, readDVar, and writeDVar, respectively. The interfacealso includes four primitives for changing the protection state of aDVar, which will be discussed in Section 2.4.

Operations on DVars produce results in the DSTM (dynamicSTM) monad. DSTM code may be executed in two ways: (1) aspart of a transaction, by wrapping a DSTM term with protected,or (2) as nontransactional code in the IO monad, by wrapping

a DSTM term with unprotected. Because protected returns aresult in the STM monad, DSTM code may be composed withother transactions in sequence, or as alternatives with orElse. Iflibraries export functions with DSTM return types, then clients canchoose whether or not to execute the functions as transactions.

Implementing this interface correctly is nontrivial, as discussedin Section 4. The present section concentrates on the programmer-level interface for our extensions.

2.4 Read-Only and Thread-LocalThe previous formalization of dynamic separation [4] supportedtwo protection states: protected (only accessible in transactions)and unprotected (only accessible outside transactions). The C# im-plementation [1] also has a read-only state, which allows referencesto be read by transactional and nontransactional code, but this statewas not explored in the formalization.

These three states—protected, unprotected, and read-only—share an important property: if accesses to a DVar obey the restric-tions of its protection state, then it is impossible for a transactionto race with a nontransaction on that DVar. Protected and unpro-tected DVars may only be accessed inside and outside transactions,respectively, and read-only DVars are clearly safe from data races.This safety property is the crucial information we need for strongsemantics with a weak implementation.

Our system supports protected, unprotected, and read-onlystates, as well as a thread-local state. To be more precise, thethread-local state is a family of related protection states, one foreach thread. Thread-local DVars may be read and written inside oroutside a transaction, but only by the thread that owns that DVar.A thread-local DVar is not confined to the owner’s scope; in fact,other threads may hold references to the DVar. The protection stateis a policy, not a program invariant: if a transaction attempts toaccess a DVar that is local to a different thread, the transaction willabort. Note that the thread-local state obeys the safety property dis-cussed above, because a thread cannot race with itself, so programsthat comply with the protection states of all DVars will continue tohave strongly-atomic semantics.

One drawback of supporting this fourth state is that each thread-local DVar must keep track of which thread owns that reference(e.g., using a thread ID field), increasing the space requirementsfor DSTM programs. However, this space requirement affects onlyDVars (actually DRgns, which we will introduce shortly).

In Figure 3, makeReadOnlyDVar sets a DVar’s protection stateto be read-only and makeThreadLocalDVar sets a DVar’s protec-tion state to be local to the current thread. The current protectionstate of a DVar is the state most recently assigned to it.

2.5 Region-Based Protection StatesPrior work on dynamic separation assumed that each reference’sprotection state was independent of the protection states of all otherreferences. This per-reference approach, while useful for many pro-grams, is not ideal for a critical application: shared data struc-tures that consist of many related references. If protection statesare per-reference, changing the protection state of a data structurerequires iterating through the entire structure, which is tedious andinefficient. It would be useful to assign protection states on a per-structure basis, rather than a per-reference basis.

For example, suppose we implement a binary tree using DVars:

data BinaryTreeNode =Node { val :: Int,

left :: DVar BinaryTreeNode,right :: DVar BinaryTreeNode }

| Nil

3 2010/6/14

data DSTM ainstance Monad DSTM

-- Regions and protection state changesdata DRgnnewDRgn :: DSTM DRgnprotectDRgn :: DRgn -> IO ()unprotectDRgn :: DRgn -> IO ()makeReadOnlyDRgn :: DRgn -> IO ()makeThreadLocalDRgn :: DRgn -> IO ()

-- Dynamic variables (now region-allocated)data DVar anewDVar :: a -> DRgn -> DSTM (DVar a)readDVar :: DVar a -> DSTM awriteDVar :: DVar a -> a -> DSTM ()

-- Functions to allow DSTM code in STM or IO monadprotected :: DSTM a -> STM aunprotected :: DSTM a -> IO a

Figure 4. Interface for dynamic separation with region-based pro-tection states (replaces Figure 3).

To protect all of the DVars in the tree, we would have to traverse theentire tree, protecting each DVar one at a time. This code is tediousto write and takes time proportional to the number of nodes.

Our solution is to introduce special objects called regions. Aregion holds the protection state for a group of DVars. Every DVaris associated with (allocated in) a region. The partition of DVarsinto regions is specified by the programmer at allocation time. Forshared data structures such as the binary tree, programmers cancreate a single region object at initialization, and allocate all of thestructure’s DVars in that region. The programmer must keep trackof the structure’s region for use in allocations (e.g., by storing itin a field at the root of the tree). Changing the protection state ofan entire data structure is now a simple O(1) operation: callingprotectDRgn (for example) on the structure’s region.

Figure 4 shows the final version of our DSTM interface, elimi-nating per-DVar dynamic separation in favor of protection regions.We introduce a new datatype, DRgn, and primitives for changing theprotection state of a DRgn. newDVar now takes the DRgn in which toallocate the DVar in addition to the DVar’s initial value. Of course,it is still possible to simulate the interface given in Figure 3 bycreating a DRgn that contains a single DVar. We chose this designbecause it was sufficiently flexible for the applications we wantedto address, and omitting per-DVar protection states simplified theinterface and the implementation.

2.6 Design AlternativesWe briefly discuss a few design decisions we made while evolvingto our final design for dynamic separation in STM Haskell.

• We considered letting references have multiple compatible pro-tection states. For example, it is reasonable for a reference tobe read-only and thread-local. In general, mutability, thread-locality, and accessibility in transactions could be orthogonalfields of a protection state. This flexibility requires more primi-tives and more run-time space, and is of unclear utility.

• We could provide a primitive to change a DVar’s region after itis allocated. While semantically fine, it complicates the imple-mentation and has not yet proven useful.

• We use protection states only to provide dynamic separation,but thread-local and read-only states can also be used as “hints”to improve the performance of the STM implementation.

3. Formal SemanticsIn this section we give a high-level operational semantics for STMHaskell with region-based dynamic separation. This semanticsdraws on and extends ideas from several sources, including thesemantics for dynamic separation in an eager-update STM [4], theAtoms Family languages [16], and our own work on modelingrelaxed memory models [9].

Our goal is to prove that a lazy-versioning STM is correct forprograms that obey dynamic separation. To this end, we establishthe equivalence of two sets of semantics. The Strong semanticsimplements strong atomicity (Section 3.4), and the Weak semanticsimplements weak atomicity in a lazy-versioning STM (Section3.5). Although these two systems are not equivalent in general,they are equivalent for programs that obey dynamic separation.We have proved this equivalence for a version of the language thatdoes not include regions (Section 3.7), but we present the semanticswith regions for completeness. Unlike the STM Haskell semantics[11], the operations in a transaction do not occur in a single stepin the Strong semantics. This is so we can more easily prove anequivalence between this semantics and the Weak semantics.

We use three sets of semantics to define our system. First, theprogram semantics (Section 3.3) defines rewrite rules for Haskellterms. Next we have two heap semantics, Strong and Weak, whichrewrite the global heap. Separating the program and heap semanticsmeans that we can reuse the program syntax and semantics, whichis elegant and avoids errors due to redundancy.

We omit several STM Haskell features, including retry,TVars, orElse, and exceptions.

3.1 SyntaxThe syntax for programs and heaps is given in Figure 5. We assumethat there is a finite map type k ⇒ v, where k is the key type and vis the value type. The empty map is []. The map created by addingthe mapping (k, v) to the map m is m[k 7→ v]. The value (if any)for key k inm ism(k). The map created by removing the mappingfor key k is m/k. The domain of a map m is dom(m).

DVar, DRgn and thread identifiers are represented by d, r andt, respectively. Values V and terms M represent normal Haskellexpressions. In addition to the DSTM interface, we include aninAtomically run-time form to distinguish between active andinactive transactions. We assume there is a function eval that eval-uates a term to a value. We use evaluation contexts E to executeon the left side of a bind or under an atomically, unprotectedor protected. Finally, the program state P is a pool of threads,which we represent as a finite map from thread IDs to terms.

There are several pieces of syntax used for heaps at runtime.The transaction state s is · if no transaction is active or commit-ting, t if thread t is in an active transaction, and commit(t) ifthread t is committing its transaction. (The commit state is usedonly in the Weak semantics.) As described in Section 2.5, eachDVar is allocated in a DRgn, so the store S maps DVar locationsto DRgn/value pairs. p represents the protection state for a region.The four possible protection states are pr (protected), unpr (unpro-tected), ro (read-only) and tl(t) (local to thread t). A region map Rmaps DRgns to protection states. A Strong heap H consists of threecomponents: a transaction state s, a store S, and a region map R.

In the Weak semantics, transactions lazily update a transactionlogL that maps DVars to values. Each entry in the log also has a tago that records whether the value is read-only (R) or modified (W). AWeak heap H therefore consists of four components: a transactionstate s, a store S, a region map R, and a transaction log L.

3.2 Actions and EffectsActions a (Figure 5) describe program steps. Available actionsinclude creating, reading or writing a DVar, creating a DRgn or

4 2010/6/14

DVar d ∈ LocationDRgn r ∈ Region

Thread ID t ∈ ThreadID

Value V ::= () | d | r | t | \x->M| returnM | M >>=M

| forkIOM | atomicallyM| newDRgn | newDVar r M| readDVar d | writeDVar d M| protectedM | unprotectedM| protectDRgn r

| unprotectDRgn r

| makeReadOnlyDRgn r

| makeThreadLocalDRgn r

| inAtomicallyM

Term M ::= x | V | M M | . . .Optional term M ::= · | M

Context E ::= [·] | E >>=M | inAtomically E| protected E | unprotected E

Program state P : t⇒M

Transaction state s ::= · | t | commit(t)

Store S : d⇒ (r,M)

Protection state p ::= pr | unpr | ro | tl(t)

Region Map R : r ⇒ p

Operation o ::= R | W

Log L : d⇒ (o,M)

Strong heap H ::= (s, S,R)

Weak heap H ::= (s, S,R, L)

Action a ::= pure | fork(t)

| new(d, r,M) | newRgn(r)

| read(d,M) | write(d,M)

| makeP(r) | makeU(r)

| makeRO(r) | makeTL(r)

| begin | end

Effect σ ::= (t, a) | ε

Figure 5. Program and heap syntax.

setting a DRgn’s protection state, spawning a new thread, beginninga transaction, or ending a transaction. An effect σ is either a pair ofa thread ID and an action, or the empty effect ε. Transitions in oursemantics emit effects. The series of effects emitted by a runningprogram is similar to an execution trace.

Effects are used to connect the heap and program semantics ofa system. Each side of the semantics must agree on the effect foreach step. For example, a program step might appear to read a valueout of thin air, but the value is supplied by the heap semantics.Symmetrically, the heap semantics takes write steps using valuessupplied by the program semantics.The empty effect is used forheap steps (such as committing a logged write) that are invisible tothe program.

3.3 Program SemanticsWe separate the program semantics out as a helper judgment in Fig-ure 6. The single-threaded judgment (↪→) rewrites terms, emittingan action a describing the action performed, as well as an optionalforked thread M . The multithreaded judgment (⇁) transitions theprogram state P and emits an action a along with the ID t of thethread that performed the action.

The program semantics is mostly straightforward. The syntaxof evaluation contexts allows transactional steps to occur underprotected and inAtomically terms; the heap semantics controlswhen threads are allowed to begin and end transactions. The NEW-VAR rule emits an action that includes the name of the new DVar aswell as its region and its initial value. The inclusion of the DRgn re-flects the interface for newDVar. The READ and WRITE rules bothemit the value being read or written. There are four pure rules: eval-uating a term to a value, simplifying a monadic bind, and returninga value from a protected or unprotected expression.

3.4 Strong SemanticsThe Strong heap semantics is given in Figure 7. The semantics is ahigh-level specification of strong atomicity: while a transaction isexecuting, no other threads may perform read or write operations.For the Weak semantics (Section 3.5), we will relax this restrictionso that threads may access memory even if a transaction is execut-ing. Both semantics allow allocation, pure and fork steps to occur atany time, although a thread performing a fork may not be in a trans-action. A key restriction in both semantics is that protection statechange operations (e.g., makeP) must occur when no transactionis active. This restriction reflects the intuition that these operationsessentially act as barriers that prevent interference between transac-tions and non-transactions. In the real implementation, we requirethese operations to lock whatever region they are accessing, ratherthan waiting for all transactions to complete.

The heap judgment (⇀) rewrites Strong heaps. At most onethread can be in a transaction at a time. The transaction states tracks which (if any) thread has an active transaction: if notransaction is active, the transaction state is ·, and if thread t isexecuting a transaction, the transaction state is t. The active(s, t)judgment ensures that other threads cannot perform read (READ)or write (WRITE) actions while a transaction is executing.

As with the program semantics, each step of the Strong heapsemantics emits a non-empty effect (i.e., a thread ID and an action).The system judgment (→) allows both the program and the heap toprogress together. The single MUTUAL rule combines the heap andprogram semantics, requiring them to agree on an effect in orderfor both sides of the semantics to progress.

3.5 Weak SemanticsFigure 8 gives an alternate semantics for the heap which wecall the Weak semantics. The Weak heap semantics is looselybased on the STM Haskell implementation, and implements alazy-versioning STM. Transactions load values into a transactionlog (via the TXREAD1 rule) and modify only the logged copies(TXWRITE). Once a DVar has been added to the log, all readsof that DVar for the remainder of the current transaction use thelogged value (TXREAD2). After the transaction completes (END),it commits all of the values in the log, writing back modified entries(COMMITW) but removing read-only entries without updating thestore (COMMITR). When the log is empty, the transaction statereverts to ·, allowing other threads to begin executing transactions(ENDCOMMIT).

The active judgment holds if the thread is not currently commit-ting a transaction. Threads that are committing transactions maynot execute any steps, including pure and new steps. The unpro-tected judgment holds if a thread is not executing or committing

5 2010/6/14

Ma↪−→M ′; M

EVALevalJMK = V M 6= V

E[M ]pure↪−−→ E[V ]; ·

RETURN

E[returnM ′>>=M ]

pure↪−−→ E[M M ′]; ·

FORK

E[forkIOM ]fork(t)↪−−−→ E[return t];M

NEWVAR

E[newDVar r M ]new(d,r,M)↪−−−−−−−→ E[return d]; ·

READ

E[readDVar d]read(d,M)↪−−−−−−→ E[returnM ]; ·

WRITE

E[writeDVar d M ]write(d,M)↪−−−−−−→ E[return ()]; ·

NEWRGN

E[newDRgn]newRgn(r)↪−−−−−−→ E[return r]; ·

MAKEP

E[protectDRgn r]makeP(r)↪−−−−−→ E[return ()]; ·

MAKEU

E[unprotectDRgn r]makeU(r)↪−−−−−→ E[return ()]; ·

MAKERO

E[makeReadOnlyDRgn r]makeRO(r)↪−−−−−−→ E[return ()]; ·

MAKETL

E[makeThreadLocalDRgn r]makeTL(r)↪−−−−−−→ E[return ()]; ·

BEGIN

E[atomicallyM ]begin↪−−→ E[inAtomicallyM ]; ·

END

E[inAtomically (returnM)]end↪−→ E[returnM ]; ·

PRET

E[protected (returnM)]pure↪−−→ E[returnM ]; ·

URET

E[unprotected (returnM)]pure↪−−→ E[returnM ]; ·

P(t,a)−−−⇁ P ′

PRGMSTEP

P (t) = M Ma↪−→M ′; ·

P(t,a)−−−⇁ P [t 7→M ′]

PRGMFORK

P (t) = M t′ 6∈ dom(P ) Mfork(t′)↪−−−−→M ′;M ′′

P(t,a)−−−⇁ P [t 7→M ′][t′ 7→M ′′]

Figure 6. Program semantics.

a transaction. Threads can only execute FORK, nontransactionalREAD and nontransactional WRITE steps if they are unprotected.

Like the Strong semantics, at most one transaction is runningat a time. Unlike the Strong semantics, nontransactional reads andwrites may interleave with transactional operations. This meansthat the program could potentially observe behaviors that are notallowed by the Strong heap semantics. Although the real STMHaskell implementation allows multiple transactions to executesimultaneously, the case in which there is just one transaction ata time is still difficult to prove results about (because of conflictswith nontransactional code). Because only one transaction runs ata time and there is no retry, no transactions ever fail.

In the Weak semantics, we now have several rules which pro-duce empty effects. Specifically, the three commit rules are not vis-ible to the program; the program simply observes that the transac-tion ends. The system judgment (→) now has a second rule thatreflects this change: the HEAP rule allows the heap to take emptysteps while the program remains unchanged.

3.6 Separation DisciplineThe Weak semantics as-is does not provide strong atomicity toall programs. We use the Strong semantics to define the dynamicseparation discipline [4]. Programs conforming to this disciplineare guaranteed to execute with strongly-atomic semantics.

Definition 1 (Separation Discipline). A program P0 obeys the sep-aration discipline (separation(P0)) if the following four require-ments hold (in the Strong semantics):

1. If H0;P0 −→∗ (t, S,R);P(t,read(d,M))−−−−−−−−→ (t, S,R);P ′, S(d) =

(r,M) and R(r) = p, then checkProt(pr, p, t,R).

2. IfH0;P0 −→∗ (t, S,R);P(t,write(d,M))−−−−−−−−→ (t, S′, R);P ′, S(d) =

(r,M) and R(r) = p, then checkProt(pr, p, t,W).

3. If H0;P0 −→∗ (·, S,R);P(t,read(d,M))−−−−−−−−→ (·, S,R);P ′, S(d) =

(r,M) and R(r) = p, then checkProt(unpr, p, t,R).

4. IfH0;P0 −→∗ (·, S,R);P(t,write(d,M))−−−−−−−−→ (·, S′, R);P ′, S(d) =

(r,M) and R(r) = p, then checkProt(unpr, p, t,W).

checkProt(π, p, t, o) is defined as follows, with π ∈ {pr, unpr}:

checkProt(pr, pr, t, o) checkProt(unpr, unpr, t, o)

checkProt(π, ro, t,R) checkProt(π, tl(t), t, o)

These four requirements formally define the valid protectionstates for the four types of DVar operations in our language: un-protected reads, unprotected writes, protected reads, and protectedwrites. The checkProt(π, p, t, o) judgment checks that an opera-tion is allowed by the DVar’s region’s current protection state, de-pending on whether the thread is in a transaction (pr) or not (unpr).

3.7 EquivalenceWe have completed a proof of equivalence in Coq1 between theStrong and Weak semantics for a simpler version of the semantics,

1 http://wasp.cs.washington.edu/wasp_scat.html

6 2010/6/14

active(s, t)ACTIVENONE

active(·, t)

ACTIVETHIS

active(t, t)

H(t,a)−−−⇀ H ′

PURE

(s, S,R)(t,pure)−−−−⇀ (s, S,R)

NEWd 6∈ dom(S)

(s, S,R)(t,new(d,r,M))−−−−−−−−−⇀ (s, S[d 7→ (r,M)], R)

NEWRGNr 6∈ dom(R)

(s, S,R)(t,newRgn(r))−−−−−−−−⇀ (s, S,R[r 7→ pr])

READactive(s, t) S(d) = (r,M)

(s, S,R)(t,read(d,M))−−−−−−−−⇀ (s, S,R)

WRITEactive(s, t) S(d) = (r,M)

(s, S,R)(t,write(d,M′))−−−−−−−−−⇀ (s, S[d 7→ (r,M ′)], R)

FORKs 6= t′

(s, S,R)(t,fork(t′))−−−−−−⇀ (s, S,R)

MAKEP

(·, S,R)(t,makeP(r))−−−−−−−⇀ (·, S,R[r 7→ pr])

MAKEU

(·, S,R)(t,makeU(r))−−−−−−−⇀ (·, S,R[r 7→ unpr])

MAKERO

(·, S,R)(t,makeRO(r))−−−−−−−−⇀ (·, S,R[r 7→ ro])

MAKETL

(·, S,R)(t,makeTL(r))−−−−−−−−⇀ (·, S,R[r 7→ tl(t)])

BEGIN

(·, S,R)(t,begin)−−−−−⇀ (t, S,R)

END

(t, S,R)(t,end)−−−−⇀ (·, S,R)

H;P(t,a)−−−→ H ′;P ′

MUTUAL

H(t,a)−−−⇀ H ′ P

(t,a)−−−⇁ P ′

H;P(t,a)−−−→ H ′;P ′

Figure 7. Strong semantics.

without regions. In other words, each location has its own protec-tion state in place of a region pointer, and the heap does not includea region map. The theorem statement is as follows:

Theorem 1. If P0 obeys the dynamic separation discipline and(·, [], []);P0 −→∗ (s, S, L);P in the Weak semantics, then thereexist s′ and S′ such that (·, []);P0 −→∗ (s′, S′);P in the Strongsemantics.

The definition of a store has changed: a store is a map fromDVars to protection state/term pairs (d ⇒ (p,M)). The semanticshas been modified not to use regions, but many details are the sameas presented. The other direction is much simpler to prove andholds for all programs, not just those that obey dynamic separation:

Theorem 2. For all P0, if (·, []);P0 −→∗ (s, S);P in the Strongsemantics. then there exist s′, S′ and L′ such that (·, [], []);P0 −→∗(s′, S′, L′);P in the Weak semantics.

The proof of Theorem 1 is similar to proofs for an eager-updateSTM [2, 16]. We use an intermediate semantics in which no otherthreads can execute while a transaction is active, but transactionsstill use a lazy-versioning implementation. This semantics, calledStrongLazy, can be defined by slightly modifying the Weak seman-tics such that the active and unprotected judgments are more strict.Specifically, we delete the ACTIVEOTHER, ACTIVECOMMIT, UN-PROTECTEDOTHER and UNPROTECTEDCOMMIT rules from Fig-ure 8. The proof is technically interesting, because we must reordernontransactional pure, new, read, and write steps such that theydo not execute during a transaction’s execution or its commit. Suchreordering is sound only if the program obeys the dynamic sepa-ration discipline. We inductively prove soundness by extending theseparation discipline to the StrongLazy and Weak semantics.

watch_queue

current_value

TVar TSO TSO

...

...

Figure 9. TVar structures in STM Haskell.

A proof incorporating regions is in progress. We expect theaddition of regions will introduce few, if any, technical wrinkles.

4. Haskell ImplementationPerhaps the largest contribution of our work is an implementationof dynamic separation in STM Haskell. This section describes theoriginal STM Haskell implementation (Section 4.1), and our mod-ifications to support dynamic separation (Section 4.2). Our defaultimplementation detects inconsistent protection states only in trans-actions; we describe an alternate implementation that detects incon-sistent protection states for both transactional and nontransactionalaccesses in Section 4.3.

4.1 STM HaskellSTM Haskell is a lazy-versioning, lazy-conflict-detection STM.The implementation consists of several thousand lines of C codeand a few hundred lines of C-- [18] that glue the Haskell interface tothe low-level C code. We briefly review the original implementationin order to provide background for our implementation details.

7 2010/6/14

active(s, t)ACTIVENONE

active(·, t)

ACTIVETHIS

active(t, t)

ACTIVEOTHERt 6= t′

active(t, t′)

ACTIVECOMMITt 6= t′

active(commit(t), t′)

unprotected(s, t)UNPROTECTEDNONE

unprotected(·, t)

UNPROTECTEDOTHERt 6= t′

unprotected(t, t′)

UNPROTECTEDCOMMITt 6= t′

unprotected(commit(t), t′)

H σ−⇀ H′

PUREactive(s, t)

(s, S,R, L)(t,pure)−−−−⇀ (s, S,R, L)

NEWactive(s, t) d 6∈ dom(S)

(s, S,R, L)(t,new(d,r,M))−−−−−−−−−⇀ (s, S[d 7→ (r,M)], R, L)

NEWRGNactive(s, t) r 6∈ dom(R)

(s, S,R, L)(t,newRgn(r))−−−−−−−−⇀ (s, S,R[r 7→ pr], L)

READunprotected(s, t) S(d) = (r,M)

(s, S,R, L)(t,read(d,M))−−−−−−−−⇀ (s, S,R, L)

WRITEunprotected(s, t) S(d) = (r,M)

(s, S,R, L)(t,write(d,M′))−−−−−−−−−⇀ (s, S[d 7→ (r,M ′)], L)

FORKunprotected(s, t)

(s, S,R, L)(t,fork(t′))−−−−−−⇀ (s, S,R, L)

MAKEP

(·, S,R, L)(t,makeP(r))−−−−−−−⇀ (·, S,R[r 7→ pr], L)

MAKEU

(·, S,R, L)(t,makeU(r))−−−−−−−⇀ (·, S,R[r 7→ unpr], L)

MAKERO

(·, S,R, L)(t,makeRO(r))−−−−−−−−⇀ (·, S,R[r 7→ ro], L)

MAKETL

(·, S,R, L)(t,makeTL(r))−−−−−−−−⇀ (·, S,R[r 7→ tl(t)], L)

BEGIN

(·, S, L)(t,begin)−−−−−⇀ (t, S, L)

TXREAD1S(d) = (r,M) d 6∈ dom(L)

(t, S,R, L)(t,read(d,M))−−−−−−−−⇀ (t, S,R, L[d 7→ (R,M)])

TXREAD2L(d) = (o,M)

(t, S,R, L)(t,read(d,M))−−−−−−−−⇀ (t, S,R, L)

TXWRITE1

(t, S,R, L)(t,write(d,M))−−−−−−−−⇀ (t, S,R, L[d 7→ (W,M)])

END

(t, S,R, L)(t,end)−−−−⇀ (commit(t), S, L,)

COMMITRL(d) = (R, p,M)

(commit(t), S,R, L)ε−⇀ (commit(t), S,R, L/d)

COMMITWS(d) = (r,M) L(d) = (W,M ′)

(commit(t), S,R, L)ε−⇀ (commit(t), S[d 7→ (r,M ′)], R, L/d)

ENDCOMMIT

(commit(t), S,R, [])ε−⇀ (·, S,R, [])

H;Pσ−→ H;P

MUTUAL

H1(t,a)−−−⇀ H2 P1

(t,a)−−−⇁ P2

H1;P1(t,a)−−−→ H2;P2

HEAP

H1ε−⇀ H2

H1;Pε−→ H2;P

Figure 8. Weak semantics.

8 2010/6/14

Algorithm 1 commitTransaction(TRecHeader *trec)

1: for all entries e in trec do2: current value := try lock tvar(e->tvar, trec)3: if (current value 6= e->expected value) then4: unlock all TVars locked so far5: return false6: end if7: end for8: for all entries e in trec do9: Wake all threads on e->tvar->watch queue

10: unlock tvar(e->tvar, e->new value)11: end for12: return true

TVars TVars are C structs with the layout depicted in Figure 9.A TVar holds a pointer to the TVar’s current value and a pointer toa watch queue of thread state objects (TSOs) that need notificationwhen the TVar’s value is updated. The current_value pointeralso doubles as a lock: if the pointer’s target is a transaction record,the TVar is locked by that transaction; else the TVar is unlocked.

Transaction records While a thread is executing a transaction,it keeps a private record (Figure 10) of all TVars read or writtenduring the transaction. The record, or TRec, logs for each TVar arecord entry containing the TVar’s expected value (the value theTVar held when first touched by the transaction) and the TVar’snew value (the current value in the transaction). This new value iswhat will be written back to the TVar if and when the transactionsuccessfully commits. Record entries are stored in chunks of 16 toincrease spatial locality. The TRec also stores the current transac-tion state, such as active (currently running), waiting (blocked afterretrying), or condemned (doomed to abort), as well as a pointerto the enclosing transaction (if the transaction is nested under anorElse).

Transaction commit Algorithm 1 gives pseudocode for commit-ting a transaction. The function returns true if the commit succeeds,in which case the thread continues execution, and false if conflictsare detected, in which case the transaction restarts. The loop at line1 acquires the lock for every TVar accessed by the transaction,aborting if any TVar’s current value does not match the TRec’sexpected value for that TVar. Because a TVar’s current valuefield doubles as a lock, try lock tvar returns the current value,and unlock tvar takes a new value for the TVar as an argument.At line 8, the transaction has successfully acquired all TVars andverified that their values matched the TRec’s expected values, so itproceeds to write back the new values to each TVar, waking anythreads on the TVars’ watch queues.

The real commitTransaction function treats read-only TVars(that is, TVars for which expected value = new value) differ-ently in order to support read/read parallelism. We omit discussionof this feature due to space limitations.

Retry and wait queues If a transaction encounters a retry, andnone of the TVars the transaction has read or written have changedsince the transaction began, the transaction’s thread adds itself tothe wait queues of its TVars and blocks. When the thread is awak-ened by another commit, it checks each of the TVars it touched andrestarts if any of their values have changed.

4.2 Dynamic SeparationIn our implementation of dynamic separation, nontransactionalDVar reads and writes are “weak”: they do not interact with theSTM. This means the DVar is not locked during these operations.If the program does not obey the dynamic separation discipline de-

lock

region

watch_queue

current_value

TVar TSO TSO

...

lock

watch_queue

current_tso

current_state

DRgn

TSO TSO

...

Figure 11. TVar and DRgn structures for dynamic separation.

fined in Section 3.6, these weak operations may violate the atom-icity and isolation of transactions. To preserve strong semanticsin programs that obey the discipline, functions like protectDRgnand unprotectDRgn interact with the STM. Specifically, suchfunctions lock their DRgn argument, waiting until any transactionsusing that DRgn (or any of its DVars) have completed or aborted.

TVars and DVars For convenience, the implementation uses thesame C struct for both TVars and DVars; we will refer to suchstructs as TVars in the remainder of this section even though theymay be DVars. We have added two fields to this structure: a pointerto a DRgn and an explicit lock field (Figure 11). Because nontrans-actional TVar operations do not lock the TVar, we have imple-mented the lock as an integer in a different field to avoid unex-pectedly returning a TRec from readDVar, which would violateHaskell’s type safety, or unlocking a TVar during writeDVar.

Regions Regions are represented with the DRgn struct in Figure11. Like TVars, DRgns hold a lock and a watch queue of threadswaiting for the DRgn’s state to change. Unlike TVars, DRgns do nothold a pointer to a value; instead, they hold an integer representingthe current protection state and an optional pointer to a thread stateobject (for the thread-local state). It is often helpful to think ofDRgns as TVars whose “value” is a protection state.

Transaction records Transaction records now include, in addi-tion to the set of accessed TVars, a set of accessed DRgns (see Fig-ure 12). The record entry for a DRgn records the expected state (theprotection state the DRgn held when first touched). A boolean fieldin the TRecHeader indicates whether the transaction has retrieddue to an inconsistency in protection state; this field is used whendeciding whether to restart a transaction.

Transactional operations Algorithm 2 gives pseudocode for thereadDVar function (writeDVar is similar). When a thread per-forms readDVar, it checks the current TSO to see if it is inside atransaction. If not, it returns the TVar’s current value; if so, it incor-porates the TVar and the TVar’s region into its transaction record,then checks the transaction record to see if the region’s protectionstate allows access (i.e., protected, read-only, or local to the currentthread). If this check fails, the thread behaves as if it encountered aretry: it adds the TSO to the wait queues of all touched TVars andDRgns, then blocks.

Transaction commit We omit pseudocode for transaction commitbecause it is very similar to Algorithm 1. We now check the DRgns

9 2010/6/14

enclosing_trec

state

current_chunk

TRecHeader

prev_chunk

TRecChunk

next_entry_idxexpected_value

TRecEntry

new_value

tvarexpected_value

TRecEntry

new_value

tvar...

Figure 10. TRec structures in STM Haskell.

protection_check_failed

enclosing_trec

state

current_rgn_chunk

current_chunk

TRecHeader

prev_chunk

TRecChunk

next_entry_idxexpected_value

TRecEntry

new_value

tvarexpected_value

TRecEntry

new_value

tvar...

prev_chunk

TRecRgnChunk

next_entry_idxexpected_state

TRecRgnEntry

expected_tso

drgnexpected_state

TRecRgnEntry

expected_tso

drgn...

Figure 12. TRec structures for dynamic separation.

Algorithm 2 readDVar(TVar *tvar, TSO *tso)

1: trec := tso->trec2: if trec = null then3: return tvar->current value4: else5: drgn := tvar->region6: Merge tvar into trec->current chunk7: Merge drgn into trec->current rgn chunk8: if trec’s protection state for drgn is consistent then9: return trec’s value for tvar

10: else11: trec->protection check failed := true12: Add tso to wait queues of all touched TVars and DRgns13: Abort transaction and block14: end if15: end if

touched by the transaction in addition to the TVars. The commitfails (and the transaction restarts) if any TVar values or DRgnprotection states have changed since the transaction began, or ifany TVars or DRgns are locked. Else we update the touched TVarsand wake any threads on their watch queues.

To avoid false contention, we actually use reader/writer locksfor DRgns. This ensures that transactions that have read or writtendifferent TVars in the same region will be able to commit withoutconflicts.

Protection state changes Changing a DRgn’s protection state isstraightforward: acquire a writer lock on the DRgn, make the up-date, wake any threads waiting on that DRgn, and unlock the DRgn.Because transactions acquire a reader lock on all DRgns whoseTVars they touch, the protection-state change will block until anyconcurrently committing transactions have committed or aborted.

Retry and watch queues When a thread wakes up after retrying(either due to inconsistent protection state or an explicit call toretry), it, as before, checks all TVars the transaction accessed andrestarts the transaction if any of the TVars’ values have changed.In addition, if the protection check failed field in the TRec

lock

region

watch_queue

cache_value

TVar

touched_tvars

lock

watch_queue

current_tso

current_state

DRgn

TVar

...

value

cached_tso

cached_state

CacheValue

TVar

Figure 13. TVar and DRgn structs for checked dynamic separation.

is true, the thread checks the DRgns the transaction accessed andrestarts if any of the DRgns’ protection states have changed.

4.3 Checked Dynamic SeparationThis section discusses an alternate design for implementing dy-namic separation in which nontransactional reads and writes fail ifthey encounter an inconsistent protection state. This change makesdebugging programs that violate the separation discipline easier.Because a major motivation for dynamic separation is the efficiencyof nontransactional reads and writes, it is important that these oper-ations interact with the STM as little as possible. In particular, mostnontransactional accesses should not access the TVar’s DRgn.

The key idea of this alternate implementation, which we callchecked dynamic separation, is that we store a cached copy of thecurrent protection state of a TVar’s DRgn with the TVar (see Figure13). If a thread executing a nontransactional read or write finds an

10 2010/6/14

1 thread 2 threads 4 threadsBenchmark STM DS Speedup STM DS Speedup STM DS Speedup

TCache 1.30s 1.26s 1.0 1.82s 1.80s 1.0 2.50s 2.50s 1.0Parallel Sudoku 1.74s 1.75s 1.0 1.07s 1.14s 0.94 0.633s 0.625s 1.0

Blockworld (CCHR) 6.26s 6.93s 0.90 4.23s 4.27s 0.99 7.39s 8.20s 0.90Prime (CCHR) 28.1s 37.8s 0.74 15.0s 20.1s 0.75 10.6s 13.3s 0.80

Sudoku (CCHR) 0.342s 0.357s 0.96 0.473s 0.528s 0.90 0.591s 0.363s 1.6Union Find (CCHR) 1.08s 1.27s 0.854 0.808s 0.857s 0.94 0.531s 0.560s 0.95

Shared Int 0.0824s 0.0950s 0.87 0.126s 0.162s 0.78 0.239s 0.517s 0.46Linked List (build) 0.828s 0.236s 3.5 0.783s 0.210s 3.7 0.766s 0.218s 3.5

Linked List (access) 2.67s 2.67s 1.0 2.06s 2.01s 1.0 1.56s 1.69s 0.92Linked List (total) 3.50s 2.91s 1.2 2.84s 2.22s 1.3 2.32s 1.91s 1.2Binary Tree (build) 3.88s 0.453s 8.6 3.84s 0.427s 9.0 3.87s 0.451s 8.6

Binary Tree (access) 18.0s 17.6s 1.0 10.1s 9.91s 1.0 6.14s 5.79s 1.1Binary Tree (total) 21.9s 18.1s 1.2 13.9s 10.3s 1.3 10.0s 6.24s 1.6

Hash Table 2.64s 1.78s 1.5 2.13s 1.21s 1.8 1.86s 0.929s 2.0

Figure 14. Benchmark results for the original STM Haskell implementation and our implementation of dynamic separation.

empty cache, it takes a “slow path”: it updates the TVar’s cache,then adds the TVar to the DRgn’s list of touched tvars. Once thecache is up-to-date, the access throws an exception if the protectionstate is inconsistent; else it completes normally. When a threadupdates a DRgn’s protection state, it invalidates the cache of eachTVar listed in the touched tvars field.

The key implementation detail is how to avoid having nontrans-actional accesses lock the TVar on each access. Naively check-ing the cached protection state before reading (writing) the TVar’svalue introduces a race condition: the protection state could changeafter we read the cached copy but before we read (write) the value.Therefore, we store the cached state and the value in a separate im-mutable data structure (CacheValue in Figure 13), reducing read-ing the cached state and the current value to a single pointer read.We use atomic compare-and-swap whenever we need to update thecached protection state or the TVar’s value. The disadvantage ofthis approach, a common optimistic-concurrency technique, is thatall updates require memory allocation.

5. EvaluationWe have evaluated our implementation on a suite of STM Haskellprograms collected by others [17]. Figure 14 gives the results of ourbenchmarks. For each benchmark, we ran two versions, one usingthe original STM Haskell (STM) and one using dynamic separation(DS). The machine we used was a 4-core 2.8GHz Intel Xeon with16GB of RAM, running Linux. The speedup for each benchmark isthe time for the STM version divided by the time for the dynamicseparation version. A speedup greater than 1 means the dynamicseparation version outperformed the STM version.

The first six lines in Figure 14 are realistic STM Haskell ap-plications (those marked CCHR are built on the same constrainthandling library called CCHR). For these programs, we changedevery TVar access to a DVar access using protected. This changedoes not exploit the strengths of dynamic separation; rather, we aremeasuring the added overhead of regions and protection states. Theresults show that the dynamic separation version often performs thesame as the STM Haskell version, occasionally doing worse butnever getting a speedup lower than 0.74.

The bottom half of Figure 14 is a set of microbenchmarks thatwe examined more closely. Shared Int is a pathological case of highcontention: all threads repeatedly try to update a single TVar/DVar.In this case, we perform significantly worse than STM Haskell,with speedups as low as 0.46 for 4 threads.

The next two benchmarks implement an ordered linked list anda binary search tree using TVars. For each benchmark, we changedthe TVars to DVars, and put the entire structure in a single region.We also divided the benchmark into two phases: build and access.In the access phase, the threads perform a total of 25,000 inserts,deletes, and lookups, at rates of 10%, 10%, and 80% respectively.The build phase is a single-threaded initialization, which inserts2500 elements into the structure. For the STM Haskell version, wechose to implement this initialization as one large transaction, butdoing each insert as a separate transaction had roughly the sameperformance. The dynamic separation version does the initializa-tion without using a transaction, then protects the structure’s re-gion before spawning threads for the access phase. We achievespeedups between 3.5 and 9.0 for the build phase, confirming thatunprotected operations can save significant time over transactions.For the access phase, our implementation is dead even with STMHaskell, giving a total speedup between 1.2 and 1.6.

The final line in Figure 14 is a new microbenchmark, adaptedfrom an existing hash table implementation [14], that is designed todemonstrate the strengths of dynamic separation. We implementeda hash table that stores key/value pairs in an array of TVars/DVars.The array dynamically resizes whenever an insert causes the num-ber of elements in the table to equal the number of slots in the array.The resizing requires all keys in the table to be rehashed. This re-hash operation touches all the TVars/DVars in the array, leading tohigh contention (in fact, we found that hash tables with more than10,000 or so elements fail to make progress). To solve this con-tention problem, we added a conceptual lock, implemented witha TVar, that signals whether a rehash is in progress. All hash ta-ble operations check the rehash lock before accessing the array,and block if a rehash is in progress. This implementation is moresophisticated than a simple mutual exclusion lock: when a rehashis not in progress, inserts, deletes and lookups operate in parallelusing STM. In the dynamic separation version, all DVars in thetable are allocated in the same region. The rehash operation ac-quires the rehash lock, then unprotects the table’s region and doesits work without using a transaction. When the rehash completes,it re-protects the table’s region and releases the rehash lock. Be-cause all other operations check the lock’s state before accessingthe table, the program as a whole obeys the dynamic separationdiscipline. The actual benchmarking code performs 100,000 totaloperations, divided evenly among inserts, deletes, and lookups. Dy-namic separation is very well-suited to this application, achievingspeedups of up to 2.0. This speedup is entirely due to doing re-hashes outside of a transaction.

11 2010/6/14

We have not run a full set of benchmarks on our implementationof checked dynamic separation, but experimentation indicates thatthe cost of checking is relatively low. For example, with the hashtable benchmark, we see speedups between 0.8 and 1.0 comparedto the unchecked implementation.

6. Related WorkClearly this work builds most closely on STM Haskell [11, 10]and dynamic separation [4, 1] by incorporating an extension of thelatter into the former. Prior work on dynamic separation has been inthe context of C# and Automatic Mutual Exclusion [13]. The keyformal results for dynamic separation and earlier work on staticseparation [2, 16] involve demonstrating that weakly-atomic andstrongly-atomic STMs are indistinguishable for programs obeyingthe separation discipline.

While separation puts additional burden on programmers (ei-ther to obey a static discipline or insert the right protection-stateoperations for a dynamic discipline), it avoids semantic problemswith weakly-atomic systems. As several prior papers have cata-logued [12, 15, 20, 21], naively bypassing STM mechanisms fornontransactional memory accesses leads to semantic anomalies thatcannot be explained in terms of data races. As a result, the initiallycoarse and imprecise term of “weak” atomicity [7] has been re-fined with more precise notions of what semantic guarantees a TMsystem provides [15, 8]. Our work has used dynamic separation toensure transactions always behave as though no other computationis interleaved. Using variants of separation to achieve weaker-but-well-defined semantics would be interesting future work.

Grouping data into regions so that the protection state of thegroup can change with a single operation is related to an enor-mous body of work in ownership type systems, region-based mem-ory management, etc. Of particular relevance is recent work onSharC [5, 6], a system that detects concurrency errors in lock-basedC programs. Like us, the authors use grouping to aggregate statechanges. Many of the protections states in SharC have analogues inour work. However, implementation details are essentially differentdue to the speculative (abort-and-retry) nature of STM.

Some strongly-atomic STM systems [3, 19, 20] have includedcompiler and/or run-time optimizations to reduce the cost of non-transactional memory accesses in some common cases. Techniquescorrespond to our protection states, i.e., optimizations exist forthread-local data, read-only data, or data not accessed in transac-tions. Automation can lead to conservatism. For example, thread-local can be approximated by determining a location is reachablefrom only one thread whereas programmers may know a locationis accessed by only one thread despite being reachable from many.Using these techniques to infer some protection state changes ispotential future work.

7. ConclusionDynamic separation is a flexible and explicit approach to managingtransactional and nontransactional access to shared memory. Wehave designed, implemented, and formalized a dynamic-separationextension to STM Haskell. Our most important innovation has beenusing regions to change the protection states for multiple objectswith one operation. More generally, Haskell has provided an idealsetting to validate and precisely define our work. We hope ourwork will not only enrich STM Haskell but also influence otherlanguages seeking well-defined meaning for efficient transactions.

References[1] M. Abadi, A. Birrell, T. Harris, J. Hsieh, and M. Isard. Implemen-

tation and use of transactional memory with dynamic separation. InInternational Conference on Compiler Construction, 2009.

[2] M. Abadi, A. Birrell, T. Harris, and M. Isard. Semantics oftransactional memory and automatic mutual exclusion. In ACMSIGPLAN-SIGACT Symposium on Principles of ProgrammingLanguages, 2008.

[3] M. Abadi, T. Harris, and M. Mehrara. Transactional memory withstrong atomicity using off-the-shelf memory protection hardware. InACM SIGPLAN Symposium on Principles and Practice of ParallelProgramming, 2009.

[4] M. Abadi, T. Harris, and K. F. Moore. A model of dynamicseparation for transactional memory. In International Conferenceon Concurrency Theory, 2008.

[5] Z. R. Anderson, D. Gay, R. Ennals, and E. Brewer. SharC: Checkingdata sharing strategies for multithreaded C. In ACM SIGPLANConference on Programming Language Design and Implementation,2008.

[6] Z. R. Anderson, D. Gay, and M. Naik. Lightweight annotations forcontrolling sharing in concurrent data structures. In ACM SIGPLANConference on Programming Language Design and Implementation,2009.

[7] C. Blundell, E. C. Lewis, and M. M. K. Martin. Subtletiesof transactional memory atomicity semantics. IEEE ComputerArchitecture Letters, 5(2), Nov. 2006.

[8] L. Dalessandro and M. L. Scott. Strong isolation is a weak idea. InACM SIGPLAN Workshop on Languages, Compilers, and HardwareSupport for Transactional Computing, 2009.

[9] L. Effinger-Dean and D. Grossman. http://wasp.cs.washington.edu/wasp_memmodel.html.

[10] T. Harris, S. Marlow, and S. Peyton Jones. Haskell on a shared-memory multiprocessor. In ACM SIGPLAN Haskell Workshop, 2005.

[11] T. Harris, S. Marlow, S. Peyton Jones, and M. Herlihy. Composablememory transactions. In ACM SIGPLAN Symposium on Principlesand Practice of Parallel Programming, 2005.

[12] R. Hudson, B. Saha, A.-R. Adl-Tabatabai, and B. Hertzberg. McRT-Malloc: A scalable transactional memory allocator. In InternationalSymposium on Memory Management, 2006.

[13] M. Isard and A. Birrell. Automatic mutual exclusion. In 11thWorkshop on Hot Topics in Operating Systems, 2007.

[14] E. Kmett. The Comonad.Reader. http://comonad.com/reader/.

[15] V. Menon, S. Balensiefer, T. Shpeisman, A.-R. Adl-Tabatabai, R. L.Hudson, B. Saha, and A. Welc. Practical weak-atomicity semanticsfor Java STM. In ACM Symposium on Parallellism in Algorithms andArchitectures, 2008.

[16] K. F. Moore and D. Grossman. High-level small-step operationalsemantics for transactions. In ACM SIGPLAN-SIGACT Symposiumon Principles of Programming Languages, 2008.

[17] C. Perfumo, N. Sonmez, S. Stipic, O. Unsal, A. Cristal, T. Harris,and M. Valero. The limits of software transactional memory (STM):Dissecting Haskell STM applications on a many-core environment.In ACM International Conference on Computing Frontiers, 2008.

[18] S. Peyton Jones, T. Nordin, and D. Oliva. C--: A portable assemblylanguage. In Workshop on Implementing Functional Languages,1997.

[19] F. T. Schneider, V. Menon, T. Shpeisman, and A.-R. Adl-Tabatabai.Dynamic optimization for efficient strong atomicity. In ACMConference on Object-Oriented Programming, Systems, Languages,and Applications, 2008.

[20] T. Shpeisman, V. Menon, A.-R. Adl-Tabatabai, S. Balensiefer,D. Grossman, R. Hudson, K. F. Moore, and B. Saha. Enforcingisolation and ordering in STM. In ACM SIGPLAN Conference onProgramming Language Design and Implementation, 2007.

[21] M. F. Spear, V. J. Marathe, L. Dalessandro, and M. L. Scott.Privatization techniques for software transactional memory. TechnicalReport 915, Computer Science Dept., Univ. of Rochester, 2007.

12 2010/6/14

region-based dynamic separation for stm haskell · stm haskell. dynamic separation is a recent...

Documents