Download - Fence Complexity in Concurrent Algorithms
Fence Complexity in Concurrent Algorithms
Petr KuznetsovTU Berlin/DT-Labs
STM is about ease-of-programmingand efficiency
What is “efficient“ in a concurrent system?
4
Cost metrics
Space: used memoryCheapAdvanced garbage-collection
Time: the number of reads and writes (per operation)the number of stalls
5
Relaxed memory modelsMemory is much slower than CPURead: check the cache -> read the memoryWrite: invalidate the caches -> update the memoryTo overcome “stalled writes” – reorder operations
Reordering may result in inconsistency
6
What is inconsistency?
Process P:
Write(X,1)
Read(Y)
Process Q:
Write(Y,1)
Read(X)
P
QW(Y,1)
R(Y)W(X,1)
R(X)
W(X,1)
7
Possible outcomes
P Q
P reads before Q writes
P reads after Q writes
Q reads after P writes
Q reads before P writes
Out-of-order
8
Fixing out-of-order Memory fences: read-after-write (RAW)
write(X,1)
fence() // enforce the order
read(Y)
P
QW(Y,1)
R(Y)W(X,1)
R(X)
9
Fixing out-of-order Atomic operations: atomic-write-after-read atomic{
read(Y)
…
write(X,1)
}E.g., CAS, TAS, Fetch&Add,…
RAW/AWAR fences take ~60 RMRs
10
Our result
10
Any concurrent program in a certain class must use RAW/AWARs
11
What programs?
Concurrent data types:queues, counters, hash tables, trees,…Non-commutative operationsLinearizable solo-terminating implementations
Mutual exclusion
12
Non-commutative operations
Operation A is non-commutative if there exists operation B where (applied to some state):
A influences Band
B influences A
13
Example: Queue enq(v) – add v to the end of the queue deq() – dequeues the item at the head of the queue
Q=1;2
Q.deq():1;Q.deq():2 vs. Q.deq():2;Q.deq():1deq() influence each other
Q.enq(3):ok;Q.deq():1 vs. Q.deq():1;Q.enq(3):okenq() is commutative
14
Proof sketch A non-commutative operation must write Suppose not
deq():1 deq():11;2
there must be a write!
w
15
Proof sketch Let w be the first write Suppose there are no AWAR
deq():11;2
A(w) - the longest atomic construct containing w
w
w must be the first base-object event in A(w)!
16
Proof sketch Suppose there are no RAWs
deq():11;2
No RAW - no difference for deq()!
deq():1
A(w)
17
Mutual exclusionLock() – acquire the lockUnlock() – release the lock (Mutex) No two process holds the lock at the
same time (Deadlock-freedom) If at least one process
executes Lock() and no active process fails, at least one process acquires the lock
Two Lock() operations influence each other!
18
Our result
18
In any implementation of mutual exclusion or a concurrent data type with a non-
commutative operation op, a complete execution of op or lock() contains a
RAW or AWAR
Every successful lock acquire incurs a RAW/AWAR fence
19
Why do we care?
Hardware design: what primitives must be optimized?
API design: returned values matterSet with add returning fail vs. returning ok
Verification – early catch of obviously incorrect algorithm
20
What’s next? Weaker primitives?
Idempotent Work Stealing [Michael et al,PPoPP’09 ] Tight lower bounds?
How many RAW/AWAR fences are incurred? Other patterns
Read-after-readWrite-after-writeMulti-RAW:
write(Xi,1)
collect(X1,..,Xn)
21
References H. Attiya, R. Guerraoui, D. Hendler, P. Kuznetsov,
M. Michael, M. VechevLaws of Order: Expensive Synchronization in Concurrent Algorithms Cannot be EliminatedIn POPL 2011
Srivatsan’s talk on STM fence complexity, TR on the way
22
QUESTIONS?