a dynamic elimination-combining stack algorithm

Post on 23-Feb-2016

38 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

A Dynamic Elimination-Combining Stack Algorithm. Gal Bar-Nissan, Danny Hendler and Adi Suissa Department of Computer Science, BGU, January 2011. Presnted by: Ilya Mirsky 28.03.2011. Outline. Concurrent programming terms Motivation Introduction DECS: The Algorithm - PowerPoint PPT Presentation

TRANSCRIPT

A Dynamic Elimination-Combining Stack AlgorithmGal Bar-Nissan, Danny Hendler and Adi SuissaDepartment of Computer Science, BGU, January 2011

Presnted by: Ilya Mirsky 28.03.2011

2

Outline Concurrent programming terms Motivation Introduction DECS: The Algorithm DECS Performance evaluation NB-DECS Summary

3

Concurrent programming terms Locks (coarse and fine grained) Non blocking algorithms

Wait-freedom Lock-freedom Obstruction-freedom

Linearizability Memory Contention Latency

4

Outline Concurrent programming terms Motivation Introduction DECS: The Algorithm DECS Performance evaluation NB-DECS Summary

5

Motivation Concurrent stacks are widely used in

parallel applications and operating systems. A simple implementation using coarse

grained locking mechanism causes a “hot spot” at the central stack object and poses a sequential bottleneck.

There is a need in a scalable concurrent stack, which presents a good performance under low, medium and high workloads, with no dependency in the ratio of the operations type (push/ pop).

6

Outline Concurrent programming terms Motivation Introduction DECS: The Algorithm DECS Performance evaluation NB-DECS Summary

7

Introduction Two key synchronization paradigms for construction of

scalable concurrent data structures are software combining and elimination.

The most highly scalable concurrent stack algorithm previously known is the lock-free elimination-backoff stack )Hendler, Shavit, Yershalmi).

The HSY stack is highly efficient under low contention, as well as under high contention when workload is symmetric.

Unfortunately, when workloads are asymmetric, the performance of HSY deteriorates to a sequential stack.

Flat-combining (by Hendler et al.) significantly outperforms HSY in low and medium contentions, but it does not scale and even deteriorates at high contention level.

8

Introduction - DECS DECS employs both combining & elimination

mechanism. Scales well for all workload types, and

outperforms other stack implementations. Maintains the simplicity and low overhead of

the HSY stack. Uses a contention-reduction layer as a backoff

scheme for a central stack- an elimination-combining layer.

A non blocking implementation is presented, NB-DECS, a lock-free variant of DECS in which threads that have waited for too long may cancel their “combining contract” and retry their operation on the central stack.

9

Introduction - DECS

10

Introduction - DECS

CentralStack

Elimination-combining layer

11

Introduction - DECS

CentralStack

Elimination-combining layer

12

Introduction - DECS

CentralStack

zzz…

zzz…

zzz…

Elimination-combining layer

13

Introduction - DECSzzz…

zzz…

zzz…

Wake up!

CentralStack

Elimination-combining layer

14

Introduction - DECS

CentralStack

zzz…

Elimination-combining layer

15

Introduction - DECS

CentralStack

zzz…

Elimination-combining layer

16

Introduction - DECS

CentralStack

zzz…

Elimination-combining layer

17

Outline Concurrent programming terms Motivation Introduction DECS: The Algorithm DECS Performance evaluation NB-DECS Summary

18

DECS- The Algorithm The data structures

1 6 4Collision Array

Locations Array

MultiOpint id;int op;int length;int cStatus;Cell cell;MultiOp next;MultiOp last;

CellData data;Cell next;

CellData data;Cell next;

CellData data;Cell next;

CellData data;Cell next;

CentralStack

Elimination-combining layer

19

DECS- The Algorithm

CentralStack

push(data1)

push(data2)

pop()

I wish there was someone

in similar situation…

I wish there was someone

in similar situation…

20

DECS- The Algorithm

multiOp tInfo = initMultiOp();

multiOp tInfo = initMultiOp(data);

DECS- The Algorithm

21

Collision Array

Locations Array

T. 6

T. 2

MultiOpid = 2op = POPlength = 1cStatus = INITcellnext = NULLlast

EMPTY

MultiOpid = 6op = PUSHlength = 1cStatus = INITcellnext = NULLlast

data1

…4

…4

EMPTY 6

6

I’ll wait, maybe

someone will arrive…

Yay, I can collide with

thread 6!

Active collider

Passive collider

DECS- The Algorithm Central Stack Functions

23

DECS- The Algorithm

24

DECS- The Algorithm

25

DECS- The Algorithm

T. 6

T. 2

zzz…

Collision Array

Locations Array

MultiOpid = 2op = POPlength = 1cStatus = INITcellnext = NULLlast

EMPTY

MultiOpid = 6op = PUSHlength = 1cStatus = INITcellnext = NULLlast

data1

I see that T. 6 got PUSH, and I got POP-

we can eliminate!

26

DECS- The Algorithm Elimination-Combining Layer Functions

27

DECS- The Algorithm

T. 6

T. 2

zzz…

MultiOpid = 2op = POPlength = 1cStatus = INITcellnext = NULLlast

EMPTY

MultiOpid = 6op = PUSHlength = 1cStatus = INITcellnext = NULLlast

data1

MultiOpid = 6op = PUSHlength = 0cStatus = FINISHEDcellnext = NULLlast

MultiOpid = 2op = POPlength = 0cStatus = FINISHEDcellnext = NULLlast

Working…

28

DECS- The Algorithm

T. 6

T. 2

zzz…

MultiOpid = 2op = POPlength = 1cStatus = INITcellnext = NULLlast

MultiOpid = 6op = PUSHlength = 1cStatus = INITcellnext = NULLlast

data1

MultiOpid = 6op = PUSHlength = 0cStatus = FINISHEDcellnext = NULLlast

MultiOpid = 2op = POPlength = 0cStatus = FINISHEDcellnext = NULLlast

Working…Done!

29

DECS- The Algorithm

30

DECS- The Algorithm

T. 6

T. 2

zzz…

Wake up man, I’ve done your

job!

Thank you T. 2, let’s go

have a beer; I’m buying!

31

DECS- The Algorithm

32

DECS- The Algorithm

33

Outline Concurrent programming terms Motivation Introduction DECS: The Algorithm DECS Performance evaluation NB-DECS Summary

34

DECS Performance Evaluation Hardware

128-way UltraSparc T2 Plus (T5140) server. A 2 chip system, in which each chip contains 8 cores, and each core multiplexes 8 hardware threads.

Running Solaris 10 OS. The cores in each CPU share the same L2 cache. C++ code compiled with GCC with the –O3 flag.

Compared VS: Treiber stack The HSY elimination-backoff stacks Flat-combining stack

35

DECS Performance Evaluation Course of experiments

Threads repeatedly apply operations on the stack for a fixed duration of 1 sec, and the resulting throughput is measured, varying the level of concurrency from 1 to 128.

Throughput is measured on both symmetric and asymmetric workloads.

Stacks are pre-populated with enough cells so that pop operations do not operate on an empty stack.

Each data point is the average of 3 runs.

36

DECS Performance Evaluation

X-axis: threads number

Symmetric workload

37

DECS Performance Evaluation

X-axis: threads number

Moderately-asymmetric workload

38

DECS Performance Evaluation

X-axis: threads number

Fully-asymmetric workload

39

Outline Concurrent programming terms Motivation Introduction DECS: The Algorithm DECS Performance evaluation NB-DECS Summary

40

NB-DECS DECS is blocking. For some applications non-blocking

implementation may be preferable because it’s more robust to thread failures.

NB-DECS is a lock-free variant of DECS that allows threads that delegated their operations to another thread, and have waited for too long, to cancel their “combining contracts”, and retry their operations.

41

Outline Concurrent programming terms Motivation Introduction DECS: The Algorithm DECS Performance evaluation NB-DECS Summary

42

Summary DECS comprises a combining-elimination

layer, therefore benefits from collision of operations of reverse, as well as identical semantics.

Empirical evaluation showed that DECS outperforms all best known stack algorithms for all workloads.

NB-DECS The idea of combining-elimination layer could

be used to efficiently implement other concurrent data-structures.

top related