transactional memory yujia jin. lock and problems lock is commonly used with shared data priority...
Post on 19-Dec-2015
215 views
TRANSCRIPT
Transactional Memory
Yujia Jin
Lock and Problems
• Lock is commonly used with shared data • Priority Inversion
– Lower priority process hold a lock needed by a higher priority process
• Convoy Effect– When lock holder is interrupted, other is forced to wait
• Deadlock– Circular dependence between different processes
acquiring locks, so everyone just wait for locks
Lock-free
• Shared data structure is lock-free if its operations do not require mutual exclusion
- Will not prevent multiple processes operating on the same object
+ avoid lock problems- Existing lock-free techniques use software
and do not perform well against lock counterparts
Transactional Memory
• Use transaction style operations to operate on lock free data
• Allow user to customized read-modify-write operation on multiple, independent words
• Easy to support with hardware, straight forward extensions to conventional multiprocessor cache
Transaction Style
• A finite sequence of machine instruction with– Sequence of reads,– Computation,– Sequence of write and– Commit
• Formal properties– Atomicity, Serializability (~ACID)
Access Instructions
• Load-transactional (LT)– Reads from shared memory into private register
• Load-transactional-exclusive (LTX)– LT + hinting write is coming up
• Store-transactional (ST)– Tentatively write from private register to shared
memory, new value is not visible to other processors till commit
State Instructions
• Commit– Tries to make tentative write permanent. – Successful if no other processor read its read set or write its
write set – When fails, discard all updates to write set– Return the whether successful or not
• Abort– Discard all updates to write set
• Validate– Return current transaction status– If current status is false, discard all updates to write set
Typical Transaction
/* keep trying */While ( true ) {
/* read variables */v1 = LT ( V1 ); …; vn = LT ( Vn );/* check consistency */if ( ! VALIDATE () ) continue;/* compute new values */compute ( v1, … , vn);/* write tentative values */ ST (v1, V1); … ST(vn, Vn);/* try to commit */if ( COMMIT () ) return result;else backoff;
}
Warning…
• Not intended for database use
• Transactions are short in time
• Transactions are small in dataset
Idea Behind Implementation
• Existing cache protocol detects accessibility conflicts
• Accessibility conflicts ~ transaction conflicts
• Can extended to cache coherent protocols– Includes bus snoopy, directory
Bus Snoopy Example
processor
Regular cache2048 8-byte lines
Direct mapped
Transaction cache64 8-byte lines
Fully associative
bus
• Caches are exclusive• Transaction cache contains tentative writes
without propagating them to other processors
Transaction Cache
• Cache line contains separate transactional tag in addition to coherent protocol tag– Transactional tag state: empty, normal, xcommit, xabort
• Two entries per transaction– Modification write to xabort, set to empty when abort– Xcommit contains the original, set to empty when commits
• Allocation policy order in decreasing favor– Empty entries, normal entries, xcommit entries
• Must guarantee a minimum transaction size
Bus Actions
• T_READ and T_RFO(read for ownership) are added for transactional requests
• Transactional request can be refused by responding BUSY• When BUSY response is received, transaction is aborted
– This prevents deadlock and continual mutual aborts– Can subject to starvation
Processor Actions
• Transaction active (TACTIVE) flag indicate whether a transaction is in progress, set on first transactional operation
• Transaction status (TSTATUS) flag indicate whether a transaction is aborted
LT Actions
• Check for XABORT entry• If false, check for NORMAL entry
– Switch NORMAL to XABORT and allocate XCOMMIT
• If false, issue T_READ on bus, then allocate XABORT and XCOMMIT
• If T_READ receive BUSY, abort– Set TSTATUS to false– Drop all XABORT entries– Set all XCOMMIT entries to NORMAL– Return random data
LTX and ST Actions
• Same as LT Except– Use T_RFO on a miss rather than T_READ– For ST, XABORT entry is updated
More Exciting Actions
• VALIDATE– Return TSTATUS flag
– If false, set TSTATUS true, TACTIVE false
• ABORT– Update cache, set TSTATUS true, TACTIVE false
• COMMIT– Return TSTATUS, set TSTATUS true, TACTIVE false
– Drops all XCOMMIT and changes all XABORT to NORMAL
Snoopy Cache Actions
• Regular cache acts like MESI invalidate, treats READ same as T_READ, RFO same as T_RFO
• Transactional cache– Non-transactional cycle: Acts like regular cache with
NORMAL entries only
– T_READ: If the the entry is valid (share), returns the value
– All other cycle: BUSY
Simulation
• Proteus Simulator• 32 processors• Regular cache
– Direct mapped, 2048 8-byte lines
• Transactional cache– Fully associative, 64 8-byte lines
• Single cycle caches access• 4 cycle memory access• Both snoopy bus and directory are simulated• 2 stage network with switch delay of 1 cycle each
Benchmarks
• Counter– n processors, each increment a shared counter (2^16)/n times
• Producer/Consumer buffer– n/2 processors produce, n/2 processor consume through a shared
FIFO– end when 2^16 items are consumed
• Doubly-linked list– N processors tries to rotate the content from tail to head– End when 2^16 items are moved– Variables shared are conditional– Traditional locking method can introduce deadlock
Comparisons
• Competitors– Transactional memory– Load-locked/store-cond (Alpha)– Spin lock with backoff – Software queue– Hardware queue
Counter Result
Producer/Consumer Result
Doubly Linked List Result
Conclusion
• Avoid extra lock variable and lock problems
• Trade dead lock for possible live lock/starvation
• Comparable performance to lock technique when shared data structure is small
• Relatively easy to implement