thread-level speculation as a memory consistency protocol for software dsm? marcelo cintra...
TRANSCRIPT
Thread-Level Speculation as a Memory Consistency Protocol
for Software DSM?
Marcelo Cintra
University of Edinburghhttp://www.dcs.ed.ac.uk/home/
mc
Dagstuhl Seminar - October 2003 2
Thread-Level Speculation (TLS)
Speculatively run whole “threads” and backtrack if necessary
Track data accesses to detect cross-thread “conflicting” memory accesses
Buffer state of speculative threads and commit when appropriate
Enforce some expected correct execution behavior
Dagstuhl Seminar - October 2003 3
Example 1: Speculative Parallelization
Original code: sequential with non-decidable dependences
Squash on data flow dependences
for(i=0; i<100; i++) { … = A[L[i]]+…
A[K[i]] = …}
Iteration J+2… = A[5]+…
A[5] = ...
Iteration J+1… = A[2]+…
A[2] = ...
Iteration J… = A[4]+…
A[5] = ...RAW
Dagstuhl Seminar - October 2003 4
Example 2: Speculative Synchronization [Martinez and Torrellas, ASPLOS02]
Original code: parallel with locks and barriers
Squash on conflicting accessesThread A
acquire
release
… = A[4]+…
A[5] = …
release
… = A[2]+…
A[2] = …
release
… = A[5]+…
A[5] = …
Thread B
acquire
Thread C
acquire
RAWWAW
Dagstuhl Seminar - October 2003 5
Example 2: Speculative Synchronization
Non-conflicting memory operations can perform out-of-order
Conflicting memory operations eventually complete in-order after rollback– Relaxes the order of non-conflicting memory
operations while still providing RC abstraction
At release/commit all pending stores must complete
TLS used to enforce RC in a more “relaxed” wayby means of speculation and rollback
Dagstuhl Seminar - October 2003 6
Outline
Background and motivation A TLS-based protocol for software DSM Summary Related work Conclusions
Dagstuhl Seminar - October 2003 7
LRC Consistency Protocol
Block on acquires and wait for lock Obtain lock along with invalidations On load page fault allocate local page and get
diff update On store page fault generate twin copy On release compare twin and private copy to
generate twin; send invalidations and lock to next thread in line
Dagstuhl Seminar - October 2003 8
Example LRC Operation
Thread A
acquire… = A[4]+…… A[5] = …release
Thread B
acquire… = A[2]+… …A[2] = …release
Thread C
acquire… = A[5]+…… A[5] = …release
Generate diff
Obtain diff from Thread
A
Dagstuhl Seminar - October 2003 9
TLS-based Consistency Protocol
On load or write miss allocate local page and twin copy Expand loads and stores to keep a record of the
accesses to individual fields of shared objects On commit
– Wait for “diff” from non-speculative thread– Check for violations– Merge “diff’s” and pass to next speculative thread in line
If violation detected– Incorporate received “diff” into twin copy and discard local copy– Discard own “diff”– Discard some private data (may require extra buffering)– Re-execute
Dagstuhl Seminar - October 2003 10
TLS “diff” and Violations
3 possible states for each field of shared object:– NotAccessed: thread did not touch this field– Loaded: thread loaded this field but did not store to it– Modified: thread stored to this field and possibly loaded
it
Violation and merging of “diff”s
Non-spec
Modified
Speculative
Loaded
NotAccessed
NotAccessed
Modified
NotAccessed
NotAccessed
Loaded
Violation
NotAccessed
NotAccessed
Modified
Violation
Modified
Modified
Dagstuhl Seminar - October 2003 11
Example TLS DSM Operation
Thread A
TLS_start… = A[4]+…TLS_load… A[5] = …TLS_storeTLS_end
Thread B
TLS_start… = A[2]+… TLS_load…A[2] = …TLS_storeTLS_end
Thread C
TLS_start… = A[5]+…TLS_load… A[5] = …TLS_storeTLS_endUpdate “diff” to have
A[5] as Modified
Update “diff” to haveA[2] as Loaded
Wait for non-spec (A) to finish.Obtain “diff” from A.Compare “diff” with own “diff”.No violations, so become non-spec.Merge “diff’s”
Get page with staledata
No need to update “diff”
Wait for non-spec (B) to finish.Obtain “diff” from B.Compare “diff” with own “diff”.Violation detected.
Dagstuhl Seminar - October 2003 12
Example Implementation
TLS_load:
TLS_store:
TLS_start:– Try to acquire lock with a non-blocking operation– If successful then become non-speculative– Otherwise get a place in line for the lock, and
execute speculatively
if (SA[i]==NotAccessed) SA[i]=Loaded
SA[i]=Modified
Dagstuhl Seminar - October 2003 13
Example Implementation
TLS_end:– If non-speculative then “pass” lock to next
thread in line; next thread becomes non-speculative
– Else, if next thread waiting for lock then Wait for non-speculative to finish Get “diff” from non-speculative thread Check for violations Merge “diff”s “Pass” lock to next thread in line
– Else, wait for lock
Dagstuhl Seminar - October 2003 14
Outline
Background and motivation A TLS-based protocol for software DSM Summary Related work Conclusions
Dagstuhl Seminar - October 2003 15
Will It Work?
Overheads– Augmented loads and stores
Both speculative parallelization and optimistic concurrency control in software have been done successfully
Compiler instrumentation for write trapping in DSM is not so bad [Adve et. al., HPCA96]
– Serialization of commits
Implementation– Hopefully not much more complex than a software DSM– Use source code augmentation and user help
Applications– Irregular applications with little overlap of modifications in critical
sections– Easy to switch back to normal DSM operation
Dagstuhl Seminar - October 2003 16
Outline
Background and motivation A TLS-based protocol for software DSM Summary Related work Conclusions
Dagstuhl Seminar - October 2003 17
Related Work
Speculative Synchronization:– Martinez and Torrellas (ASPLOS 2002); Rajwar and
Goodman (MICRO 2001) Hardware-based
Optimistic Concurrency Control and Software Transactional Memory– Herlihy (ACM TDBS 1990); Kung and Robinson (ACM
TDBS 1981) Source-code level speculation for transaction processing
– Shavit and Touitou (PODC 1995); Herlihy et. al., (PODC 2003) Run-time system speculation on top of hardware coherent systems
Dagstuhl Seminar - October 2003 18
Related Work
Speculation and consistency models:– Gniady, Falsafi, and Vijaykumar (ISCA 1999)
SC plus speculation in hardware Speculation only within instruction window and ld/st
queue
Dagstuhl Seminar - October 2003 19
Related Work
Software Speculative Parallelization:– Dang, Yu, and Rauchwerger (IPDPS 2002);
Rundberg and Stenström (WSSMM 2000); Cintra and Llanos (PPoPP 2003) Speculative parallelization at source-code level
– Papadimitriou and Mowry (CMU-CS-01-145) Speculative parallelization on software DSM protocol
Dagstuhl Seminar - October 2003 20
Related Work
Software DSM systems:– Treadmarks: Amza et. al. (IEEE Computer 1996)
Lazy RC (LRC)
– Midway: Bershad, Zekauskas, and Sawdon (CompCon 1993) Entry Consistency (EC)
– Adve et. al. (HPCA 1996) Compared LRC versus EC Compared twinning versus compiler instrumentation
for write trapping
Dagstuhl Seminar - October 2003 21
Outline
Background and motivation A TLS-based protocol for software DSM Summary Related work Conclusions
Dagstuhl Seminar - October 2003 22
Conclusions and Future Work
TLS can provide RC with more relaxed synchronization
Hardware speculative synchronization and software speculative parallelization have been successful
Must find applications Must perform detailed performance
evaluation ?
Thread-Level Speculation as a Memory Consistency Protocol
for Software DSM?
Marcelo Cintra
University of Edinburghhttp://www.dcs.ed.ac.uk/home/
mc