12 example
TRANSCRIPT
-
8/13/2019 12 Example
1/34
CS-510
Transactional Memory: ArchitecturalSupport for Lock-Free Data Structures
By Maurice Herlihy and J. Eliot B. Moss1993
Presented by Steve CowardPSU Fall 2011 CS-510
11/02/2011
Slide content heavily borrowed from Ashish Jha
PSU SP 2010 CS-51005/04/2010
-
8/13/2019 12 Example
2/34
2CS-510
Agenda
Lock-based synchronization
Non-blocking synchronization TM (Transactional Memory) concept
A HW-based TM implementation
Core additions
ISA additions
Transactional cache
Cache coherence protocol changes
Test Methodology and results
Blue Gene/Q
Summary
-
8/13/2019 12 Example
3/34
3CS-510
Generally easy to use, except not composable
Generally easy to reason about
Does not scale well due to lock arbitration/communication
Pessimistic synchronization approach
Uses Mutual Exclusion Blocking, only ONE process/thread can execute at a time
Lock-based synchronization
Lo-priority
Holds Lock X
Hi-priorityPre-emption Cant proceed
Priority Inversion
Holds Lock X
De-scheduled Cant proceed
Ex: Quantum expiration,Page Fault, other interrupts
Proc A Proc B
Convoying
Cant proceed
DeadlockCant proceed
Get Lock X
Get Lock X
Holds Lock Y
Get Lock X
Holds Lock X
Get Lock Y
High context
switch overhead
Med-priority
Proc C
-
8/13/2019 12 Example
4/34
4CS-510
Lock-Free Synchronization
Non-Blocking - optimistic, does not use mutual exclusion
Uses RMW operations such as CAS, LL&SC
limited to operations on single-word or double-words
Avoids common problems seen with conventional techniquessuch as Priority inversion, Convoying and Deadlock
Difficult programming logic
In absence of above problems and as implemented in SW,lock-free doesnt perform as well as a lock-based approach
-
8/13/2019 12 Example
5/34
5CS-510
Non-Blocking Wish List
Simple programming usage model
Avoids priority inversion, convoying anddeadlock
Equivalent or better performance than lock-based approach
Less data copying
No restrictions on data set size or contiguity
Composable
Wait-free
Enter Transactional Memory (TM)
-
8/13/2019 12 Example
6/34
6CS-510
What is a transaction (tx)? A tx is a finite sequence of revocable operations executed by a
process that satisfies two properties:
Serializable Steps of one tx are not seen to be interleaved with the steps
of another tx
Txs appear to all processes to execute in the same order
Atomic
Each tx either aborts or commits Abort causes all tentative changes of tx to be discarded
Commit causes all tentative changes of tx to be madeeffectively instantaneously globally visible
This paper assumes that a process executes at most one tx at a time
Txs do not nest (but seems like a nice feature) Hence, these Txs are not composable
Txs do not overlap (seems less useful)
-
8/13/2019 12 Example
7/347CS-510
What is Transactional Memory?
Transactional Memory (TM) is a lock-free, non-blocking concurrency control mechanism based on
txs that allows a programmer to define customizedread-modify-write operations that apply tomultiple, independently chosen memory locations
Non-blockingMultiple txs optimistically execute CS in parallel
(on diff CPUs)
If a conflict occurs only one can succeed,others can retry
-
8/13/2019 12 Example
8/34
8CS-510
Basic Transaction Concept
beginTx:[A]=1;[B]=2;[C]=3;x=_VALIDATE();
Proc A
beginTx:
z=[A];y=[B];[C]=y;x=_VALIDATE();
Proc B
Atomicity
ALLor
NOTHING
IF (x)_COMMIT();
ELSE_ABORT();
GOTO beginTx;
//_COMMIT -instantaneously make allabove changes visible to allProcs
IF (x)_COMMIT();
ELSE_ABORT();
GOTO beginTx;
// _ABORT - discard allabove changes, may tryagain
True concurrency
How is validity determined?
Optimistic execution
Serialization ensured if only one tx commits and others abort
Linearization ensured by (combined) atomicity of validate, commit, and abort operations
How to COMMIT?
How to ABORT?
Changes must be revocable!
-
8/13/2019 12 Example
9/34
9CS-510
TM vs. SW Non-Blocking
LT or LTX// READ set of locations
VALIDATE// CHECK consistency of READ values
ST// MODIFY set of locations
COMMIT
Succeed
Fail
Fail
CriticalSection
// HW makes changes PERMANENT
Pass
// TMWhile (1) {
curCnt = LTX(&cnt);
if (Validate()) {
int c = curCnt+1;
ST(&cnt, c);
if (Commit())
return;}
}
// Non-BlockWhile (1) {
curCnt = LL(&cnt);
// Copy object
if (Validate(&cnt)) {
int c = curCnt + 1;
if (SC(&cnt, c))
return;}
}
// Do work
-
8/13/2019 12 Example
10/34
10CS-510
HW or SW TM? TM may be implemented in HW or SW
Many SW TM library packages exist
C++, C#, Java, Haskell, Python, etc. 2-3 orders of magnitude slower than other synchronization
approaches
This paper focuses on HW implementation
HW offers significantly greater performance than SW forreasonable cost
Minor tweaks to CPUs* Core ISA Caches Bus and Cache coherency protocols
Leverage cache-coherency protocol to maintain tx memoryconsistency
But HW implementation w/o SW cache overflow handling isproblematical
* See Blue Gene/Q
-
8/13/2019 12 Example
11/34
11CS-510
Core: TM Updates
Each CPU maintains two additional StatusRegister bits
TACTIVE
flag indicating whether a transaction is in progresson this cpu
Implicitly set upon entering transaction
TSTATUS
flag indicating whether the active transaction hasconflicted with another transaction
TACTIVE TSTATUS MEANING
False DC No tx active
True False Orphan tx - executing, conflict detected, will abort
True True Active tx - executing, no conflict yet detected
-
8/13/2019 12 Example
12/34
12CS-510
ISA: TM Memory Operations LT: load from shared memory to register
ST: tentative write of register to shared memory which
becomes visible upon successful COMMIT LTX: LT + intent to write to same location later
A performance optimization for early conflict detection
ISA: TM Verification Operations VALIDATE: validate consistency of read set
Avoid misbehaved orphan
ABORT: unconditionally discard all updates
Used to handle extraordinary circumstances
COMMIT: attempt to make tentative updates permanent
-
8/13/2019 12 Example
13/34
13CS-510
TM Conflict Detection
LOAD and STORE are supported but do not affect txs READ or WRITE set
Why would a STORE be performed within a tx?
Left to implementation Interaction between tx and non-tx operations to same address
Is generally a programming error
Consider LOAD/STORE as committed txs with conflict potential, otherwise non-linearizable outcome
LTreg, [MEM]
LTXreg, [MEM]ST[MEM], reg
// Tx READ with intent to WRITE later// Tx WRITE to local cache, value globally
// visible only after COMMIT
// pure Tx READ
Tx Dependencies Definition Tx Abort Condition Diagram*
READ-SET
WRITE-SETDATASET
DATA-SETUpdated?
WRITE-SET
Read by any other Tx?
COMMIT//WRITE-SET visibleto other processes
ABORT//Discard changes
to WRITE-SET
N
N
Y
Y
LOAD reg, [MEM]STORE[MEM], reg
// non-Tx READ// non-Tx WRITE - careful!
*Subject to arbitration
(as in Reader/Writer paradigm)
-
8/13/2019 12 Example
14/34
14CS-510
Core
TM Cache Architecture
Two primary, mutually exclusive caches In absence of TM, non-Tx ops uses same caches, control logic and coherency protocols as non-Tx architecture
To avoid impairing design/performance of regular cache
To prevent regular cache set size from limiting max tx size
Accessed sequentially based on instruction type!
Tx Cache
Fully-associative
Otherwise how would tx address collisions be handled?
Single-cycle COMMIT and ABORT
Small - size is implementation dependent
Intel Core i5 has a first level TLB with 32 entries! Holds all tentative writes w/o propagating to other caches or memory
May be extended to at act as Victim Cache
Upon ABORT
Modified lines set to INVALID state
Upon COMMIT
Lines can be snooped by other processors
Lines WB to memory upon replacement
1stlevel Cache
L2D L3D
2ndlevel 3rdlevel Main Mem
1 Clk
4 Clk
Core
L1D
Direct-mappedExclusive
2048 lines x 8B
Tx Cache
Fully-associativeExclusive
64 lines x 8B
L2D L3D
-
8/13/2019 12 Example
15/34
15CS-510
HW TM Leverages Cache Protocol M - cache line only in current cache and
is modified
E - cache line only in current cache andis not modified
S - cache line in current cache andpossibly other caches and is notmodified
I - invalid cache line
Tx commit logic will need to detect thefollowing events (akin to R/W locking)
Local read, remote write
S -> I
E -> I
Local write, remote write
M -> I
Local write, remote read
M, snoop read, write back -> S
Works with bus-based (snoopy cache) ornetwork-based (directory) architectures
http://thanglongedu.org/showthread.php?2226-Nehalem/page2
-
8/13/2019 12 Example
16/34
16CS-510
Protocol: Cache States
Every Tx Op allocates 2 CL entries (2 cycle allocation)
Single CL entry also possible
Effectively makes tx cache twice as big
Two entry scheme has optimizations
Allows roll back (and roll forward) w/o bus traffic
LT allocates one entry (if LTX used properly)
Second LTX cycle can be hidden on cache hit
Authors decision appears to be somewhat arbitrary
A dirty value originally read must either be
WB to memory, or
Allocated to XCOMMIT entry as its old entry
avoids WBs to memory and improves performance
XCOMMIT
XABORT New Value
Old value
Tx Cache Line Entry, States and Replacement
Tx Op Allocation
2 lines
COMMIT ABORT
CL Entry & States
CL Replacement EMPTYsearch No
replace
Yes
NORMALNo
replace
XCOMMIT
WB if DIRTY then replace
Cache Line StatesName Access Shared? Modified?INVALID none
VALID R Yes No
DIRTY R, W No Yes
RESERVED R, W No No
Tx TagsName MeaningEMPTY contains no data
NORMAL contains committed data
XCOMMIT discard on commit
XABORT discard on abort
EMPTY
NORMAL New Value
Old value
New Value
Old valueNORMAL
EMPTY
NoABORT Tx/Trap to SW
Yes Yes
-
8/13/2019 12 Example
17/34
17CS-510
ISA: TM Verification Operations
Orphan = TACTIVE==TRUE && TSTATUS==FALSE Tx continues to execute, but will fail at commit
Commit does not force write back to memory Memory written only when CL is evicted or invalidated
Conditions for calling ABORT Interrupts
Tx cache overflow
VALIDATE [Mem]
ReturnTRUE TSTATUS
TSTATUS=TRUETACTIVE=FALSE
FALSE
ABORT
TSTATUS=TRUETACTIVE=FALSE
For ALL entries1. Drop XABORT2. Set XCOMMITto NORMAL
For ALL entries1. Drop XCOMMIT2. Set XABORTto NORMAL
ReturnFALSE
COMMIT
TSTATUSABORTReturnFALSE
TSTATUS=TRUETACTIVE=FALSE
TRUE
ReturnTRUE
-
8/13/2019 12 Example
18/34
18CS-510
TM Memory Access
IsXABORT DATA?
LT reg, [Mem]//search Tx cache
IsNORMAL DATA?
XABORT NORMAL DATAXCOMMIT DATA
T_READ cycleXABORT DATAXCOMMIT DATA
//ABORT TxFor ALL entries1. Drop XABORT2. Set XCOMMIT to NORMAL
TSTATUS=FALSE
ReturnDATA
ReturnarbitraryDATA
Y
Y
OK
BUSY
CL State as Goodmans protocol for LOAD (Valid) CL State as Goodmans protocol for LOAD (Reserved)
ST to Tx cache only!!!
IsXABORT DATA?
LTX reg, [Mem]//search Tx cache
IsNORMAL DATA?
XABORT NORMAL DATAXCOMMIT DATA
T_RFO cycleXABORT DATAXCOMMIT DATA
//ABORT TxFor ALL entries1. Drop XABORT2. Set XCOMMIT to NORMAL
TSTATUS=FALSE
ReturnDATA
ReturnarbitraryDATA
//Tx cache miss
Y
Y
OK
BUSY
//Tx cache miss
IsXABORT DATA?
ST [Mem], reg
IsNORMAL DATA?
XCOMMIT NORMAL DATAXABORT NEW DATA
T_RFO cycleXCOMMIT DATAXABORT NEW DATA
//ABORT TxFor ALL entries1. Drop XABORT2. Set XCOMMIT to NORMAL
TSTATUS=FALSE
ReturnarbitraryDATA
//Tx cache miss
Y
OK
BUSY
CL State as Goodmans protocol for STORE (Dirty)
Bus Cycles
Name Kind Meaning New AccessREAD regular read value shared
RFO regular read value exclusive
WRITE both write back exclusiveT-READ Tx read value sharedT-RFO Tx read value exclusive
BUSY Tx refuse access unchanged
XCOMMIT
XABORT New Value
Old value
Tx Op Allocation
For reference
XCOMMIT DATAXABORT NEW DATA
Y
Tx requests REFUSED by BUSY response
Tx aborts and retries (after exponentialbackoff?)
Prevents deadlock or continual mutual aborts
Exponential backoff not implemented in HW Performance is parameter sensitive
Benchmarks appear not to be optimized
-
8/13/2019 12 Example
19/34
19CS-510
TM Snoopy Cache Actions
Both Regular and Tx Cache snoop on the bus
Main memory responds to all L1 read misses
Main memory responds to cache line replacement WRITE s
If TSTATUS==FALSE, Tx cache acts as Regular cache (for NORMAL entries)
Line Replacement Action - Regular or Tx $Bus Request
WRITE write data on Bus, writ ten to Memory
Action
Response to SNOOP Actions - Regular $
Bus Request Current State Next State/Action
Regular RequestRead VALID/DIRTY/RESV. VALID/Return Data
RFO VALID/DIRTY/RESV. INVALID/Return Data
Tx Request
T_Read VALID VALID/Return Data
T_Read DIRTY/RESV. VALID/Return Data
T_RFO VALID/DIRTY/RESV. INVALID/Return Data
Response to SNOOP Actions - Tx $
Bus Request Tx Tag Current State Next State/Action
Regular RequestRead Normal VALID/DIRTY/RESV. VALID/Return Data
RFO Normal VALID/DIRTY/RESV. INVALID/Return Data
Tx Request
T_Read Normal/Xcommit/Xabort VALID VALID/Return Data
T_Read Normal/Xcommit/Xabort DIRTY/RESV. NA/BUSY SIGNAL
T_RFO Normal/Xcommit/Xabort DIRTY/RESV. NA/BUSY SIGNAL
-
8/13/2019 12 Example
20/34
20CS-510
Test Methodology
TM implemented in Proetus - execution driven simulator from MIT Two versions of TM implementation
Goodmans snoopy protocol for bus-based arch
Chaiken directory protocol for (simulated) Alewife machine 32 Processors
memory latency of 4 clock cycles
1stlevel cache latency of 1 clock cycles 2048x8B Direct-mapped regular cache 64x8B fully-associative Tx cache
Strong Memory Consistency Model
Compare TM to 4 different implementation Techniques SW
TTS (test-and-test-and-set) spinlock with exponential backoff [TTS Lock] SW queuing [MCS Lock]
Process unable to lock puts itself in the queue, eliminating poll time
HW LL/SC (LOAD_LINKED/STORE_COND) with exponential backoff [LL/SC Direct/Lock]
HW queuing [QOSB] Queue maintenance incorporated into cache-coherency protocol
Goodmans QOSB protocol - head in memory elements in unused CLs
Benchmarks
Counting LL/SC directly used on the single-word counter variable
Producer & Consumer
Doubly-Linked List
All benchmarks do fixed amount of work
-
8/13/2019 12 Example
21/34
21CS-510
Counting Benchmark
N processes increment shared counter 2^16/n times, n=1 to 32
Short CS with 2 shared-mem accesses, high contention
In absence of contention, TTS makes 5 references to mem for each increment RD + test-and-set to acquire lock + RD and WR in CS + release lock
TM requires only 3 mem accesses RD & WR to counter and then COMMIT (no bus traffic)
void process (int work)
{int success = O, backof f = BACKOFF_MIN;unsigned wait;
while (success < work) {ST(&counter, LTX(&counter) + 1) ;if (COMMIT()) {
success++;
backof f = BACKOFF_MIN;} else {
wait = randomo % (01
-
8/13/2019 12 Example
22/34
22CS-510
Counting Results
LL/SC outperforms TM LL/SC applied directly to counter variable, no explicit commit required
For other benchmarks, adv lost as shared object spans multiple words only way to use LL/SC is as a spin lock
TM has higher thruput than all other mechanisms at most levels of concurrency TM uses no explicit locks and so fewer accesses to memory (LL/SC -2, TM-3,TTS-5)
Total cyclesneeded tocomplete thebenchmark
ConcurrentProcesses
BUS NW
SOURCE:Figure copiedfrom paper
TTS Lock
MCS Lock - SWQ
QOSB - HWQ
TM
LL/SC Direct
TTS Lock
TM
LL/SC Direct
MCS Lock - SWQ
QOSB - HWQ
-
8/13/2019 12 Example
23/34
23CS-510
Prod/Cons Benchmark
N processes share a bounded buffer, initially empty Half produce items, half consume items
Benchmark finishes when 2^16 operations have completed
typedef struct { Word deqs; Word enqs; Word items [QUEUE_SI ZE ] ; } queue;
unsigned queue_deq (queue *q) {
unsigned head, tail, result, wait;
unsigned backoff = BACKOFF_MIN
while (1) {result = QUEUE_EMPTY ;
tail = LTX(&q-> enqs) ;
head = LTX( &q->deqs ) ;
if (head ! = tail) { /* queue not empty? */
result = LT(&q->items [head % QUEUE_SIZE] ) ;
ST(&q->deqs, head + 1) ; /* advance counter */
}if (COMMIT() ) break; .
wait = randomo % (01 backof f */
while (wait--) ;
if (backoff < BACKOFF_MAX) backof f ++;
}
return result;
}
SOURCE:from paper
d/ lCycles needed
-
8/13/2019 12 Example
24/34
24CS-510
Prod/Cons Results
In Bus arch, almost flat thruput for all TM yields higher thruput but not as dramatic as counting benchmark
In NW arch, all thruputs suffer as contention increases TM suffers the least and wins
yto complete theBenchmark
ConcurrentProcesses
BUS NW
SOURCE:Figure copiedfrom paper
N
QOSB - HWQ
TTS Lock
TM
LL/SC Direct
MCS Lock - SWQ
TM
QOSB - HWQ
LL/SC Direct
MCS Lock - SWQ
TTS Lock
bl i k d i h k
-
8/13/2019 12 Example
25/34
25CS-510
Doubly-Linked List Benchmark
N processes share a DL list anchored by Head & Tail pointers Process Dequeues an item by removing the item pointed by tail and then Enqueues it by threading it onto the list as head
Process that removes last item sets both Head & Tail to NULL
Process that inserts item into an empty list sets both Head & Tail to point to the new item
Benchmark finishes when 2^16 operations have completed
SOURCE:from paper
typedef struct list_elem {
struct list elem *next; /* next to dequeue */
struct list elem *prev; /* previously enqueued */
int value;
} entry;
shared entry *Head, *Tail;
void list_enq(entry* new) {
entry *old tail;
unsigned backoff = BACKOFF_MIN;
unsigned wait;
new->next = new->prev = NULL;
while (TRUE) {
old_tail = (entry*)LTX(&Tail);
if (VALIDATE()) {
ST(&new->prev, old tail);if (old_tail == NULL) ST(&Head, new);
else ST(&old tail->next, new);
ST(&Tail, new);
if (COMMIT()) return;
}
wait = randomo % (01
-
8/13/2019 12 Example
26/34
26CS-510
Doubly-Linked List Results
Concurrency difficult to exploit by conventional means State dependent concurrency is not simple to recognize using locks
Enquerers dont know if it must lock tail-ptr until after it has locked head-ptr & vice-versa for dequeuers Queue non-empty: each Tx modifies head or tail but not both, so enqueuers can (in principle) execute
without interference from dequeuers and vice-versa
Queue Empty: Tx must modify both pointers and enqueuers and dequeuers conflict
Locking techniques uses only single lock Lower thruput as single lock prohibits overlapping of enqueues and dequeues
TM naturally permits this kind of parallelism
to complete theBenchmark
ConcurrentProcesses
BUS NW
SOURCE:Figure copiedfrom paper
MCS Lock - SWQ
MCS Lock - SWQ
TMTM
QOSB - HWQ
QOSB - HWQ
TTS Lock
TTS Lock
LL/SC Direct
LL/SC Direct
-
8/13/2019 12 Example
27/34
27CS-510
Blue Gene/Q Processor First Tx Memory HW implementation
Used in Sequoia supercomputer built by IBM for LawrenceLivermore Labs
Due to be completed in 2012
Sequoia is 20 petaflop machine
Blue Gene/Q will have 18 cores One dedicated to OS tasks
One held in reserve (fault tolerance?)
4 way hyper-threaded, 64-bit PowerPC A2 based
Sequoia may use up to 100k Blue Genes
1.6 GHz, 205 Gflops, 55W, 1.47 B transistors, 19 mm sq
-
8/13/2019 12 Example
28/34
28CS-510
TM on Blue Gene/Q Transactional Memory only works intra chip
Inter chip conflicts not detected
Uses a tag scheme on the L2 cache memory Tags detect load/store data conflicts in tx
Cache data has version tag
Cache can store multiple versions of same data
SW commences tx, does its work, then tells HW to attemptcommit
If unsuccessful, SW must re-try
Appears to be similar to approach on Suns Rock processor
Ruud Haring on TM: a lot of neat trickery, sheer
genius Full implementation much more complex than paper suggests?
Blue Gene/Q is the exception and does not mark widescale acceptance of HW TM
-
8/13/2019 12 Example
29/34
29CS-510
Pros & Cons Summary Pros
TM matches or outperforms atomic update locking techniques for simple benchmarks
Uses no locks and thus has fewer memory accesses
Avoids priority inversion, convoying and deadlock
Easy programming semantics Complex NB scenarios such as doubly-linked list more realizable through TM
Allow true concurrency and hence highly scalable (for smaller Tx sizes)
Cons
TM can not perform undoable operations including most I/O
Single cycle commit and abort restrict size of 1stlevel cache and hence Tx size
Is it good for anything other than data containers?
Portability is restricted by transactional cache size
Still SW dependent
Algorithm tuning benefits from SW based adaptive backoff
Tx cache overflow handling
Longer Tx increases the likelihood of being aborted by an interrupt or scheduling conflict
Tx should be able to complete within one scheduling time slot
Weaker consistency models require explicit barriers at start and end, impacting performance
Other complications make it more difficult to implement in HW
Multi-level caches
Nested Transactions (required for composability) Cache coherency complexity on many-core SMP and NUMA archs
Theoretically subject to starvation
Adaptive backoff strategy suggested fix - authors used exponential backoff
Else queuing mechanism needed
Poor debugger support
S
-
8/13/2019 12 Example
30/34
30CS-510
Summary TM is a novel multi-processor architecture which allows easy lock-free multi-
word synchronization in HW
Leveraging concept of Database Transactions
Overcoming single/double-word limitation Exploiting cache-coherency mechanisms
Wish Granted? Comment
Simple programming model Yes Very elegant
Avoids priority inversion,convoying and deadlock
Yes Inherent in NB approach
Equivalent or betterperformance than lock-based approach
Yes For very small and short txs
No restrictions on data setsize or contiguity
No Limited by practicalconsiderations
Composable No Possible but would addsignificant HW complexity
Wait-free No Possible but would addsignificant HW complexity
-
8/13/2019 12 Example
31/34
31CS-510
References
M.P. Herlihy and J.E.B. Moss. TransactionalMemory: Architectural support for lock-freedata structures. Technical Report 92/07,Digital Cambridge Research Lab, OneKendall Square, Cambridge MA 02139,December 1992.
Appendix
-
8/13/2019 12 Example
32/34
32CS-510
AppendixLinearizability: HerlihysCorrectness Condition
Invocation of an object/function followed by a response
History - sequence of invocations and responses made by a set of threads
Sequential history - a history where each invocation is followed immediately by its response
Serializable - if history can be reordered to form sequential history which is consistent with sequentialdefinition of the object
Linearizable - serializable history in which each response that preceded invocation in history must alsoprecede in sequential reordering
Object is linearizable if all of its usage histories may be linearized.
A invokes lock | B invokes lock | A fails | B succeeds
An example history
Reordering 1 - A sequential history but not serializable reordering
Reordering 2 - A linearizable reordering
A invokes lock | A fails | B invokes lock | B succeeds
B invokes lock | B succeeds | A invokes lock | A fails
Appendix
-
8/13/2019 12 Example
33/34
33CS-510
AppendixGoodmans Write-Once Protocol
D - line present only in onecache and differs from main
memory. Cache mustsnoop for read requests.
R - line present only in onecache and matches mainmemory.
V - line may be present inother caches and matchesmain memory.
I - Invalid.
Writes to V,I are write thru
Writes to D,R are write back
Appendix
-
8/13/2019 12 Example
34/34
34CS 510
AppendixTx Memory and dB Txs
Differences with dB Txs
Disk vs. memory - HW better for Txs than dB systems
Txs do not need to be durable (post termination)
dB is a closed system; Tx interacts with non-Tx operations
Tx must be backward compatible with existing programming environment
Success in database transaction field does not translate to transactionalmemory