12 example

8/13/2019 12 Example

1/34

CS-510

Transactional Memory: ArchitecturalSupport for Lock-Free Data Structures

By Maurice Herlihy and J. Eliot B. Moss1993

Presented by Steve CowardPSU Fall 2011 CS-510

11/02/2011

Slide content heavily borrowed from Ashish Jha

PSU SP 2010 CS-51005/04/2010

8/13/2019 12 Example

2/34

2CS-510

Agenda

Lock-based synchronization

Non-blocking synchronization TM (Transactional Memory) concept

A HW-based TM implementation

Core additions

ISA additions

Transactional cache

Cache coherence protocol changes

Test Methodology and results

Blue Gene/Q

Summary

8/13/2019 12 Example

3/34

3CS-510

Generally easy to use, except not composable

Generally easy to reason about

Does not scale well due to lock arbitration/communication

Pessimistic synchronization approach

Uses Mutual Exclusion Blocking, only ONE process/thread can execute at a time

Lock-based synchronization

Lo-priority

Holds Lock X

Hi-priorityPre-emption Cant proceed

Priority Inversion

Holds Lock X

De-scheduled Cant proceed

Ex: Quantum expiration,Page Fault, other interrupts

Proc A Proc B

Convoying

Cant proceed

DeadlockCant proceed

Get Lock X

Get Lock X

Holds Lock Y

Get Lock X

Holds Lock X

Get Lock Y

High context

switch overhead

Med-priority

Proc C

8/13/2019 12 Example

4/34

4CS-510

Lock-Free Synchronization

Non-Blocking - optimistic, does not use mutual exclusion

Uses RMW operations such as CAS, LL&SC

limited to operations on single-word or double-words

Avoids common problems seen with conventional techniquessuch as Priority inversion, Convoying and Deadlock

Difficult programming logic

In absence of above problems and as implemented in SW,lock-free doesnt perform as well as a lock-based approach

8/13/2019 12 Example

5/34

5CS-510

Non-Blocking Wish List

Simple programming usage model

Avoids priority inversion, convoying anddeadlock

Equivalent or better performance than lock-based approach

Less data copying

No restrictions on data set size or contiguity

Composable

Wait-free

Enter Transactional Memory (TM)

8/13/2019 12 Example

6/34

6CS-510

What is a transaction (tx)? A tx is a finite sequence of revocable operations executed by a

process that satisfies two properties:

Serializable Steps of one tx are not seen to be interleaved with the steps

of another tx

Txs appear to all processes to execute in the same order

Atomic

Each tx either aborts or commits Abort causes all tentative changes of tx to be discarded

Commit causes all tentative changes of tx to be madeeffectively instantaneously globally visible

This paper assumes that a process executes at most one tx at a time

Txs do not nest (but seems like a nice feature) Hence, these Txs are not composable

Txs do not overlap (seems less useful)

8/13/2019 12 Example

7/347CS-510

What is Transactional Memory?

Transactional Memory (TM) is a lock-free, non-blocking concurrency control mechanism based on

txs that allows a programmer to define customizedread-modify-write operations that apply tomultiple, independently chosen memory locations

Non-blockingMultiple txs optimistically execute CS in parallel

(on diff CPUs)

If a conflict occurs only one can succeed,others can retry

8/13/2019 12 Example

8/34

8CS-510

Basic Transaction Concept

beginTx:[A]=1;[B]=2;[C]=3;x=_VALIDATE();

Proc A

beginTx:

z=[A];y=[B];[C]=y;x=_VALIDATE();

Proc B

Atomicity

ALLor

NOTHING

IF (x)_COMMIT();

ELSE_ABORT();

GOTO beginTx;

//_COMMIT -instantaneously make allabove changes visible to allProcs

IF (x)_COMMIT();

ELSE_ABORT();

GOTO beginTx;

// _ABORT - discard allabove changes, may tryagain

True concurrency

How is validity determined?

Optimistic execution

Serialization ensured if only one tx commits and others abort

Linearization ensured by (combined) atomicity of validate, commit, and abort operations

How to COMMIT?

How to ABORT?

Changes must be revocable!

8/13/2019 12 Example

9/34

9CS-510

TM vs. SW Non-Blocking

LT or LTX// READ set of locations

VALIDATE// CHECK consistency of READ values

ST// MODIFY set of locations

COMMIT

Succeed

Fail

Fail

CriticalSection

// HW makes changes PERMANENT

Pass

// TMWhile (1) {

curCnt = LTX(&cnt);

if (Validate()) {

int c = curCnt+1;

ST(&cnt, c);

if (Commit())

return;}

}

// Non-BlockWhile (1) {

curCnt = LL(&cnt);

// Copy object

if (Validate(&cnt)) {

int c = curCnt + 1;

if (SC(&cnt, c))

return;}

}

// Do work

8/13/2019 12 Example

10/34

10CS-510

HW or SW TM? TM may be implemented in HW or SW

Many SW TM library packages exist

C++, C#, Java, Haskell, Python, etc. 2-3 orders of magnitude slower than other synchronization

approaches

This paper focuses on HW implementation

HW offers significantly greater performance than SW forreasonable cost

Minor tweaks to CPUs* Core ISA Caches Bus and Cache coherency protocols

Leverage cache-coherency protocol to maintain tx memoryconsistency

But HW implementation w/o SW cache overflow handling isproblematical

* See Blue Gene/Q

8/13/2019 12 Example

11/34

11CS-510

Core: TM Updates

Each CPU maintains two additional StatusRegister bits

TACTIVE

flag indicating whether a transaction is in progresson this cpu

Implicitly set upon entering transaction

TSTATUS

flag indicating whether the active transaction hasconflicted with another transaction

TACTIVE TSTATUS MEANING

False DC No tx active

True False Orphan tx - executing, conflict detected, will abort

True True Active tx - executing, no conflict yet detected

8/13/2019 12 Example

12/34

12CS-510

ISA: TM Memory Operations LT: load from shared memory to register

ST: tentative write of register to shared memory which

becomes visible upon successful COMMIT LTX: LT + intent to write to same location later

A performance optimization for early conflict detection

ISA: TM Verification Operations VALIDATE: validate consistency of read set

Avoid misbehaved orphan

ABORT: unconditionally discard all updates

Used to handle extraordinary circumstances

COMMIT: attempt to make tentative updates permanent

8/13/2019 12 Example

13/34

13CS-510

TM Conflict Detection

LOAD and STORE are supported but do not affect txs READ or WRITE set

Why would a STORE be performed within a tx?

Left to implementation Interaction between tx and non-tx operations to same address

Is generally a programming error

Consider LOAD/STORE as committed txs with conflict potential, otherwise non-linearizable outcome

LTreg, [MEM]

LTXreg, [MEM]ST[MEM], reg

// Tx READ with intent to WRITE later// Tx WRITE to local cache, value globally

// visible only after COMMIT

// pure Tx READ

Tx Dependencies Definition Tx Abort Condition Diagram*

READ-SET

WRITE-SETDATASET

DATA-SETUpdated?

WRITE-SET

Read by any other Tx?

COMMIT//WRITE-SET visibleto other processes

ABORT//Discard changes

to WRITE-SET

N

N

Y

Y

LOAD reg, [MEM]STORE[MEM], reg

// non-Tx READ// non-Tx WRITE - careful!

*Subject to arbitration

(as in Reader/Writer paradigm)

8/13/2019 12 Example

14/34

14CS-510

Core

TM Cache Architecture

Two primary, mutually exclusive caches In absence of TM, non-Tx ops uses same caches, control logic and coherency protocols as non-Tx architecture

To avoid impairing design/performance of regular cache

To prevent regular cache set size from limiting max tx size

Accessed sequentially based on instruction type!

Tx Cache

Fully-associative

Otherwise how would tx address collisions be handled?

Single-cycle COMMIT and ABORT

Small - size is implementation dependent

Intel Core i5 has a first level TLB with 32 entries! Holds all tentative writes w/o propagating to other caches or memory

May be extended to at act as Victim Cache

Upon ABORT

Modified lines set to INVALID state

Upon COMMIT

Lines can be snooped by other processors

Lines WB to memory upon replacement

1stlevel Cache

L2D L3D

2ndlevel 3rdlevel Main Mem

1 Clk

4 Clk

Core

L1D

Direct-mappedExclusive

2048 lines x 8B

Tx Cache

Fully-associativeExclusive

64 lines x 8B

L2D L3D

8/13/2019 12 Example

15/34

15CS-510

HW TM Leverages Cache Protocol M - cache line only in current cache and

is modified

E - cache line only in current cache andis not modified

S - cache line in current cache andpossibly other caches and is notmodified

I - invalid cache line

Tx commit logic will need to detect thefollowing events (akin to R/W locking)

Local read, remote write

S -> I

E -> I

Local write, remote write

M -> I

Local write, remote read

M, snoop read, write back -> S

Works with bus-based (snoopy cache) ornetwork-based (directory) architectures

http://thanglongedu.org/showthread.php?2226-Nehalem/page2

8/13/2019 12 Example

16/34

16CS-510

Protocol: Cache States

Every Tx Op allocates 2 CL entries (2 cycle allocation)

Single CL entry also possible

Effectively makes tx cache twice as big

Two entry scheme has optimizations

Allows roll back (and roll forward) w/o bus traffic

LT allocates one entry (if LTX used properly)

Second LTX cycle can be hidden on cache hit

Authors decision appears to be somewhat arbitrary

A dirty value originally read must either be

WB to memory, or

Allocated to XCOMMIT entry as its old entry

avoids WBs to memory and improves performance

XCOMMIT

XABORT New Value

Old value

Tx Cache Line Entry, States and Replacement

Tx Op Allocation

2 lines

COMMIT ABORT

CL Entry & States

CL Replacement EMPTYsearch No

replace

Yes

NORMALNo

replace

XCOMMIT

WB if DIRTY then replace

Cache Line StatesName Access Shared? Modified?INVALID none

VALID R Yes No

DIRTY R, W No Yes

RESERVED R, W No No

Tx TagsName MeaningEMPTY contains no data

NORMAL contains committed data

XCOMMIT discard on commit

XABORT discard on abort

EMPTY

NORMAL New Value

Old value

New Value

Old valueNORMAL

EMPTY

NoABORT Tx/Trap to SW

Yes Yes

8/13/2019 12 Example

17/34

17CS-510

ISA: TM Verification Operations

Orphan = TACTIVE==TRUE && TSTATUS==FALSE Tx continues to execute, but will fail at commit

Commit does not force write back to memory Memory written only when CL is evicted or invalidated

Conditions for calling ABORT Interrupts

Tx cache overflow

VALIDATE [Mem]

ReturnTRUE TSTATUS

TSTATUS=TRUETACTIVE=FALSE

FALSE

ABORT


For ALL entries1. Drop XABORT2. Set XCOMMITto NORMAL

For ALL entries1. Drop XCOMMIT2. Set XABORTto NORMAL

ReturnFALSE

COMMIT

TSTATUSABORTReturnFALSE


TRUE

ReturnTRUE

8/13/2019 12 Example

18/34

18CS-510

TM Memory Access

IsXABORT DATA?

LT reg, [Mem]//search Tx cache

IsNORMAL DATA?

XABORT NORMAL DATAXCOMMIT DATA

T_READ cycleXABORT DATAXCOMMIT DATA

//ABORT TxFor ALL entries1. Drop XABORT2. Set XCOMMIT to NORMAL

TSTATUS=FALSE

ReturnDATA

ReturnarbitraryDATA

Y

Y

OK

BUSY

CL State as Goodmans protocol for LOAD (Valid) CL State as Goodmans protocol for LOAD (Reserved)

ST to Tx cache only!!!

IsXABORT DATA?

LTX reg, [Mem]//search Tx cache

IsNORMAL DATA?

XABORT NORMAL DATAXCOMMIT DATA

T_RFO cycleXABORT DATAXCOMMIT DATA


TSTATUS=FALSE

ReturnDATA

ReturnarbitraryDATA

//Tx cache miss

Y

Y

OK

BUSY

//Tx cache miss

IsXABORT DATA?

ST [Mem], reg

IsNORMAL DATA?

XCOMMIT NORMAL DATAXABORT NEW DATA

T_RFO cycleXCOMMIT DATAXABORT NEW DATA


TSTATUS=FALSE

ReturnarbitraryDATA

//Tx cache miss

Y

OK

BUSY

CL State as Goodmans protocol for STORE (Dirty)

Bus Cycles

Name Kind Meaning New AccessREAD regular read value shared

RFO regular read value exclusive

WRITE both write back exclusiveT-READ Tx read value sharedT-RFO Tx read value exclusive

BUSY Tx refuse access unchanged

XCOMMIT

XABORT New Value

Old value

Tx Op Allocation

For reference

XCOMMIT DATAXABORT NEW DATA

Y

Tx requests REFUSED by BUSY response

Tx aborts and retries (after exponentialbackoff?)

Prevents deadlock or continual mutual aborts

Exponential backoff not implemented in HW Performance is parameter sensitive

Benchmarks appear not to be optimized

8/13/2019 12 Example

19/34

19CS-510

TM Snoopy Cache Actions

Both Regular and Tx Cache snoop on the bus

Main memory responds to all L1 read misses

Main memory responds to cache line replacement WRITE s

If TSTATUS==FALSE, Tx cache acts as Regular cache (for NORMAL entries)

Line Replacement Action - Regular or Tx $Bus Request

WRITE write data on Bus, writ ten to Memory

Action

Response to SNOOP Actions - Regular $

Bus Request Current State Next State/Action

Regular RequestRead VALID/DIRTY/RESV. VALID/Return Data

RFO VALID/DIRTY/RESV. INVALID/Return Data

Tx Request

T_Read VALID VALID/Return Data

T_Read DIRTY/RESV. VALID/Return Data

T_RFO VALID/DIRTY/RESV. INVALID/Return Data

Response to SNOOP Actions - Tx $

Bus Request Tx Tag Current State Next State/Action

Regular RequestRead Normal VALID/DIRTY/RESV. VALID/Return Data

RFO Normal VALID/DIRTY/RESV. INVALID/Return Data

Tx Request

T_Read Normal/Xcommit/Xabort VALID VALID/Return Data

T_Read Normal/Xcommit/Xabort DIRTY/RESV. NA/BUSY SIGNAL

T_RFO Normal/Xcommit/Xabort DIRTY/RESV. NA/BUSY SIGNAL

8/13/2019 12 Example

20/34

20CS-510

Test Methodology

TM implemented in Proetus - execution driven simulator from MIT Two versions of TM implementation

Goodmans snoopy protocol for bus-based arch

Chaiken directory protocol for (simulated) Alewife machine 32 Processors

memory latency of 4 clock cycles

1stlevel cache latency of 1 clock cycles 2048x8B Direct-mapped regular cache 64x8B fully-associative Tx cache

Strong Memory Consistency Model

Compare TM to 4 different implementation Techniques SW

TTS (test-and-test-and-set) spinlock with exponential backoff [TTS Lock] SW queuing [MCS Lock]

Process unable to lock puts itself in the queue, eliminating poll time

HW LL/SC (LOAD_LINKED/STORE_COND) with exponential backoff [LL/SC Direct/Lock]

HW queuing [QOSB] Queue maintenance incorporated into cache-coherency protocol

Goodmans QOSB protocol - head in memory elements in unused CLs

Benchmarks

Counting LL/SC directly used on the single-word counter variable

Producer & Consumer

Doubly-Linked List

All benchmarks do fixed amount of work

8/13/2019 12 Example

21/34

21CS-510

Counting Benchmark

N processes increment shared counter 2^16/n times, n=1 to 32

Short CS with 2 shared-mem accesses, high contention

In absence of contention, TTS makes 5 references to mem for each increment RD + test-and-set to acquire lock + RD and WR in CS + release lock

TM requires only 3 mem accesses RD & WR to counter and then COMMIT (no bus traffic)

void process (int work)

{int success = O, backof f = BACKOFF_MIN;unsigned wait;

while (success < work) {ST(&counter, LTX(&counter) + 1) ;if (COMMIT()) {

success++;

backof f = BACKOFF_MIN;} else {

wait = randomo % (01

8/13/2019 12 Example

22/34

22CS-510

Counting Results

LL/SC outperforms TM LL/SC applied directly to counter variable, no explicit commit required

For other benchmarks, adv lost as shared object spans multiple words only way to use LL/SC is as a spin lock

TM has higher thruput than all other mechanisms at most levels of concurrency TM uses no explicit locks and so fewer accesses to memory (LL/SC -2, TM-3,TTS-5)

Total cyclesneeded tocomplete thebenchmark

ConcurrentProcesses

BUS NW

SOURCE:Figure copiedfrom paper

TTS Lock

MCS Lock - SWQ

QOSB - HWQ

TM

LL/SC Direct

TTS Lock

TM

LL/SC Direct

MCS Lock - SWQ

QOSB - HWQ

8/13/2019 12 Example

23/34

23CS-510

Prod/Cons Benchmark

N processes share a bounded buffer, initially empty Half produce items, half consume items

Benchmark finishes when 2^16 operations have completed

typedef struct { Word deqs; Word enqs; Word items [QUEUE_SI ZE ] ; } queue;

unsigned queue_deq (queue *q) {

unsigned head, tail, result, wait;

unsigned backoff = BACKOFF_MIN

while (1) {result = QUEUE_EMPTY ;

tail = LTX(&q-> enqs) ;

head = LTX( &q->deqs ) ;

if (head ! = tail) { /* queue not empty? */

result = LT(&q->items [head % QUEUE_SIZE] ) ;

ST(&q->deqs, head + 1) ; /* advance counter */

}if (COMMIT() ) break; .

wait = randomo % (01 backof f */

while (wait--) ;

if (backoff < BACKOFF_MAX) backof f ++;

}

return result;

}

SOURCE:from paper

d/ lCycles needed

8/13/2019 12 Example

24/34

24CS-510

Prod/Cons Results

In Bus arch, almost flat thruput for all TM yields higher thruput but not as dramatic as counting benchmark

In NW arch, all thruputs suffer as contention increases TM suffers the least and wins

yto complete theBenchmark

ConcurrentProcesses

BUS NW


N

QOSB - HWQ

TTS Lock

TM

LL/SC Direct

MCS Lock - SWQ

TM

QOSB - HWQ

LL/SC Direct

MCS Lock - SWQ

TTS Lock

bl i k d i h k

8/13/2019 12 Example

25/34

25CS-510

Doubly-Linked List Benchmark

N processes share a DL list anchored by Head & Tail pointers Process Dequeues an item by removing the item pointed by tail and then Enqueues it by threading it onto the list as head

Process that removes last item sets both Head & Tail to NULL

Process that inserts item into an empty list sets both Head & Tail to point to the new item

Benchmark finishes when 2^16 operations have completed

SOURCE:from paper

typedef struct list_elem {

struct list elem *next; /* next to dequeue */

struct list elem *prev; /* previously enqueued */

int value;

} entry;

shared entry *Head, *Tail;

void list_enq(entry* new) {

entry *old tail;

unsigned backoff = BACKOFF_MIN;

unsigned wait;

new->next = new->prev = NULL;

while (TRUE) {

old_tail = (entry*)LTX(&Tail);

if (VALIDATE()) {

ST(&new->prev, old tail);if (old_tail == NULL) ST(&Head, new);

else ST(&old tail->next, new);

ST(&Tail, new);

if (COMMIT()) return;

}

wait = randomo % (01

8/13/2019 12 Example

26/34

26CS-510

Doubly-Linked List Results

Concurrency difficult to exploit by conventional means State dependent concurrency is not simple to recognize using locks

Enquerers dont know if it must lock tail-ptr until after it has locked head-ptr & vice-versa for dequeuers Queue non-empty: each Tx modifies head or tail but not both, so enqueuers can (in principle) execute

without interference from dequeuers and vice-versa

Queue Empty: Tx must modify both pointers and enqueuers and dequeuers conflict

Locking techniques uses only single lock Lower thruput as single lock prohibits overlapping of enqueues and dequeues

TM naturally permits this kind of parallelism

to complete theBenchmark

ConcurrentProcesses

BUS NW


MCS Lock - SWQ

MCS Lock - SWQ

TMTM

QOSB - HWQ

QOSB - HWQ

TTS Lock

TTS Lock

LL/SC Direct

LL/SC Direct

8/13/2019 12 Example

27/34

27CS-510

Blue Gene/Q Processor First Tx Memory HW implementation

Used in Sequoia supercomputer built by IBM for LawrenceLivermore Labs

Due to be completed in 2012

Sequoia is 20 petaflop machine

Blue Gene/Q will have 18 cores One dedicated to OS tasks

One held in reserve (fault tolerance?)

4 way hyper-threaded, 64-bit PowerPC A2 based

Sequoia may use up to 100k Blue Genes

1.6 GHz, 205 Gflops, 55W, 1.47 B transistors, 19 mm sq

8/13/2019 12 Example

28/34

28CS-510

TM on Blue Gene/Q Transactional Memory only works intra chip

Inter chip conflicts not detected

Uses a tag scheme on the L2 cache memory Tags detect load/store data conflicts in tx

Cache data has version tag

Cache can store multiple versions of same data

SW commences tx, does its work, then tells HW to attemptcommit

If unsuccessful, SW must re-try

Appears to be similar to approach on Suns Rock processor

Ruud Haring on TM: a lot of neat trickery, sheer

genius Full implementation much more complex than paper suggests?

Blue Gene/Q is the exception and does not mark widescale acceptance of HW TM

8/13/2019 12 Example

29/34

29CS-510

Pros & Cons Summary Pros

TM matches or outperforms atomic update locking techniques for simple benchmarks

Uses no locks and thus has fewer memory accesses

Avoids priority inversion, convoying and deadlock

Easy programming semantics Complex NB scenarios such as doubly-linked list more realizable through TM

Allow true concurrency and hence highly scalable (for smaller Tx sizes)

Cons

TM can not perform undoable operations including most I/O

Single cycle commit and abort restrict size of 1stlevel cache and hence Tx size

Is it good for anything other than data containers?

Portability is restricted by transactional cache size

Still SW dependent

Algorithm tuning benefits from SW based adaptive backoff

Tx cache overflow handling

Longer Tx increases the likelihood of being aborted by an interrupt or scheduling conflict

Tx should be able to complete within one scheduling time slot

Weaker consistency models require explicit barriers at start and end, impacting performance

Other complications make it more difficult to implement in HW

Multi-level caches

Nested Transactions (required for composability) Cache coherency complexity on many-core SMP and NUMA archs

Theoretically subject to starvation

Adaptive backoff strategy suggested fix - authors used exponential backoff

Else queuing mechanism needed

Poor debugger support

S

8/13/2019 12 Example

30/34

30CS-510

Summary TM is a novel multi-processor architecture which allows easy lock-free multi-

word synchronization in HW

Leveraging concept of Database Transactions

Overcoming single/double-word limitation Exploiting cache-coherency mechanisms

Wish Granted? Comment

Simple programming model Yes Very elegant

Avoids priority inversion,convoying and deadlock

Yes Inherent in NB approach

Equivalent or betterperformance than lock-based approach

Yes For very small and short txs

No restrictions on data setsize or contiguity

No Limited by practicalconsiderations

Composable No Possible but would addsignificant HW complexity

Wait-free No Possible but would addsignificant HW complexity

8/13/2019 12 Example

31/34

31CS-510

References

M.P. Herlihy and J.E.B. Moss. TransactionalMemory: Architectural support for lock-freedata structures. Technical Report 92/07,Digital Cambridge Research Lab, OneKendall Square, Cambridge MA 02139,December 1992.

Appendix

8/13/2019 12 Example

32/34

32CS-510

AppendixLinearizability: HerlihysCorrectness Condition

Invocation of an object/function followed by a response

History - sequence of invocations and responses made by a set of threads

Sequential history - a history where each invocation is followed immediately by its response

Serializable - if history can be reordered to form sequential history which is consistent with sequentialdefinition of the object

Linearizable - serializable history in which each response that preceded invocation in history must alsoprecede in sequential reordering

Object is linearizable if all of its usage histories may be linearized.

A invokes lock | B invokes lock | A fails | B succeeds

An example history

Reordering 1 - A sequential history but not serializable reordering

Reordering 2 - A linearizable reordering

A invokes lock | A fails | B invokes lock | B succeeds

B invokes lock | B succeeds | A invokes lock | A fails

Appendix

8/13/2019 12 Example

33/34

33CS-510

AppendixGoodmans Write-Once Protocol

D - line present only in onecache and differs from main

memory. Cache mustsnoop for read requests.

R - line present only in onecache and matches mainmemory.

V - line may be present inother caches and matchesmain memory.

I - Invalid.

Writes to V,I are write thru

Writes to D,R are write back

Appendix

8/13/2019 12 Example

34/34

34CS 510

AppendixTx Memory and dB Txs

Differences with dB Txs

Disk vs. memory - HW better for Txs than dB systems

Txs do not need to be durable (post termination)

dB is a closed system; Tx interacts with non-Tx operations

Tx must be backward compatible with existing programming environment

Success in database transaction field does not translate to transactionalmemory

12 example

Documents