decs: a dynamic elimination-combining stack algorithm
DESCRIPTION
DECS: A Dynamic Elimination-Combining Stack Algorithm. Gal Bar-Nissan, Danny Hendler , Adi Suissa. OPODIS 2011. Stack data-structure. We focus on the stack data-structure which supports two operations: push(v) – adds a new element (with value v) to the top of the stack - PowerPoint PPT PresentationTRANSCRIPT
1
DECS: A Dynamic Elimination-
Combining Stack Algorithm
Gal Bar-Nissan,Danny Hendler,
Adi Suissa
OPODIS 2011
2
Stack data-structureWe focus on the stack data-structure
which supports two operations:◦push(v) – adds a new element (with
value v) to the top of the stack◦pop – removes the top element from
the stack and returns it
3
Previous work – IBM/Treiber algorithm [1986]
Linked-list basedShared top pointer
next
next
top
nextnew
old
top
new
top
push operation
pop operation
Non-blocking algorithm Poor scalability (essentially sequential)
4
Previous work – Flat-combining [Hendler, Incze, Shavit, Tzafrir, 2010]
A list of operations to be performed
Each thread adds its operation to the listOne of the threads acquires a global lock
and performs the combined operationOther threads spin and wait for their
operation to be performed
push
pop push
push
pop push
Minimizes synchronization
Blocking algorithm Limited scalability (essentially sequential)
5
Previous work – Elimination Backoff (HSY) [Hendler, Shavit, Yerushalmi, 2004]
Eliminating reverse semantics operations
A thread attempts its operation:1. On the central stack (IBM/Treiber algorithm)2. Elimination Backoff – Eliminate with another
threadT1
T2
T3
pop
push( )
pop
T1
Central Stack
Non-blocking algorithm Provides parallelism – if workloads are symmetric
6
Our contributionsDECS – A Dynamic Elimination-
Combining Stack algorithm
Dynamically employs either of two techniques:1. Elimination2. Combining
A non-blocking version (NB-DECS)
7
DECS – Dynamic Elimination-Combining StackEmploys IMB/Treiber’s algorithm as a
central stack
A thread attempts its operation:1. On the central stack2. Elimination-Combining Backoff – Eliminate or
Combine with another thread
8
Elimination-Combining layer
A thread attempts its operation on the central
stack
1T1
Central Stack
op1
If that fails, it registers itself in a publication array
2
T1
It then chooses a random index from the publication array, and looks for another
thread
3
If no other thread is found, the thread waits
9
Elimination-Combining layer (cont'd)
4T2
Central Stack
op2
T1 T2
If it finds another thread with a reverse semantics
operation, the operations are eliminated
5
op1 != op2
Another thread that fails operating on the central
stack also registers in the array and tries to find
Another thread
10
If both threads have identical operation
semantics, one thread delegates its operation to
the other thread
6T2
Central Stack
op2
T1 T2
op1 == op2
T1
Elimination-Combining layer (cont'd)
delegate thread
11
Multi-PushT1
Central Stack
push
T1 Ta Tb
12
Multi-PopT1
Central Stack
pop
T1 Ta Tb Tc
M
M = min{stack_size, multi_op_size}
13
Multi-Eliminate
T1
pushT1 Ta Tb
T2 Tc Td TeT2
pop
Retry!
14
Data-structures
Push & Pop operations
MultiPop function
17
Collide function
18
ActiveCollide, Combine functions
19
MultiEliminate function
20
PassiveCollide
21
Experimental EvaluationEvaluated on an UltraSPARC T2+ –
8 cores CPU (each with 8 hardware threads) 64 hardware threads
Compared DECS with:◦ Treiber (with exponential backoff)◦ HSY (elimination backoff) algorithm◦ Flat-Combining (FC) stack
22
Symmetric workload50% push – 50% pop
Threads
Thro
ughp
ut
23
Moderately Asymmetric75% push – 25% pop
Threads
Thro
ughp
ut
24
Fully Asymmetric100% push – 0% pop
Threads
Thro
ughp
ut
25
DECS summary Scalable
Provides parallelism even for asymmetric workloads
Blocking
26
Non-blocking DECSA non-blocking algorithm is more
robust to thread failures
Similar to DECS, but threads that delegate an operation do not wait indefinitely
A thread stops waiting by signaling its delegate thread
27
NB-DECS - exampleA thread may stop waiting
after some timeoutT1
Central Stack
push
T1 Ta TbX
X
28
NB-DECS - overhead1. Test-and-set validation of each
popped element from the central stack
2. Elements must be popped from the central stack one-by-one
3. Test-and-set validation on eliminated operations
29
Symmetric workload50% push – 50% pop
Threads
Thro
ughp
ut
30
Moderately Asymmetric75% push – 25% pop
Threads
Thro
ughp
ut
31
Moderately Asymmetric25% push – 75% pop
Threads
Thro
ughp
ut