non-blocking data structures for high- performance computing håkan sundell, phd

52
Non-blocking Data Structures for High- Performance Computing Håkan Sundell, PhD

Upload: stuart-bond

Post on 04-Jan-2016

222 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: Non-blocking Data Structures for High- Performance Computing Håkan Sundell, PhD

Non-blocking Data Structures for High-Performance Computing

Håkan Sundell, PhD

Page 2: Non-blocking Data Structures for High- Performance Computing Håkan Sundell, PhD

5 August 2005EPCC 20052

Outline

Shared Memory Synchronization Methods Memory Management Shared Data Structures

Dictionary Performance Conclusions

Page 3: Non-blocking Data Structures for High- Performance Computing Håkan Sundell, PhD

5 August 2005EPCC 20053

Shared Memory

CPU CPU CPU

CPU CPU CPU CPU CPU CPU

Cache Cache Cache

Cache bus Cache bus Cache bus

Memory

Memory Memory Memory

...

. . .

... .... . .

- Uniform Memory Access (UMA)

- Non-Uniform Memory Access (NUMA)

Page 4: Non-blocking Data Structures for High- Performance Computing Håkan Sundell, PhD

5 August 2005EPCC 20054

Synchronization

Shared data structures needs synchronization!

Accesses and updates must be coordinated to establish consistency.

P1P2

P3

Page 5: Non-blocking Data Structures for High- Performance Computing Håkan Sundell, PhD

5 August 2005EPCC 20055

Hardware Synchronization Primitives Consensus 1

Atomic Read/Write

Consensus 2 Atomic Test-And-Set (TAS), Fetch-And-Add

(FAA), Swap

Consensus Infinite Atomic Compare-And-Swap (CAS) Atomic Load-Linked/Store-Conditionally

ReadWrite

Read

M=f(M,…)

Page 6: Non-blocking Data Structures for High- Performance Computing Håkan Sundell, PhD

5 August 2005EPCC 20056

Mutual Exclusion

Access to shared data will be atomic because of lock

Reduced Parallelism by definitionBlocking, Danger of priority inversion

and deadlocks.• Solutions exists, but with high overhead,

especially for multi-processor systems

P1P2

P3

Page 7: Non-blocking Data Structures for High- Performance Computing Håkan Sundell, PhD

5 August 2005EPCC 20057

Non-blocking Synchronization

Perform operation/changes using atomic primitives

Lock-Free Synchronization Optimistic approach

• Retries until succeeding

Guarantees progress of at least one operation

Wait-Free Synchronization Always finishes in a finite number of its own

steps• Coordination with all participants

Page 8: Non-blocking Data Structures for High- Performance Computing Håkan Sundell, PhD

5 August 2005EPCC 20058

Memory Management

Dynamic data structures need dynamic memory management Concurrent D.S. need concurrent M.M.!

Page 9: Non-blocking Data Structures for High- Performance Computing Håkan Sundell, PhD

5 August 2005EPCC 20059

Concurrent Memory Management Concurrent Memory Allocation

i.e. malloc/free functionality Concurrent Garbage Collection

Questions (among many):• When to re-use memory?• How to de-reference pointers safely?

P2 P1 P3

Page 10: Non-blocking Data Structures for High- Performance Computing Håkan Sundell, PhD

5 August 2005EPCC 200510

Lock-Free Memory Management Memory Allocation

Valois 1995: fixed block-size, fixed purpose Michael 2004: Gidenstam et al. 2004, any

size, any purpose Garbage Collection

Valois 1995, Detlefs et al. 2001: reference counting

Michael 2002, Herlihy et al. 2002: hazard pointers

Gidenstam, Papatriantafilou, Sundell and Tsigas 2005: hazard pointer + reference counting

Page 11: Non-blocking Data Structures for High- Performance Computing Håkan Sundell, PhD

5 August 2005EPCC 200511

Lock-Free Reference Counting De-referencing links

1. Read the link contents, i.e. a pointer. 2. Increment (FAA) the reference count on

the corresponding object. What if the link is changed between step 1

and 2? Solution by Detlefs et al:

• Use DCAS on step 2 that operates on two arbitrary memory words. Retries if link is changed after step 2.

Solution by Valois et al:• The reference count field is present indefinitely.

Decrement reference count and retries if link is changed after step 2.

Page 12: Non-blocking Data Structures for High- Performance Computing Håkan Sundell, PhD

5 August 2005EPCC 200512

Lock-Free Hazard Pointers (Michael 2002) De-referencing links

1. Read the link contents, i.e. a pointer. 2. Set a hazard pointer to the read pointer

value. 3. Read the link contents again; if not same

as in step 1 then restart from step 1. Deletion

After deleted from data structure, put node on a local list.

When the local list reaches a certain size; scan all hazard pointers globally, reclaim memory of all nodes which address does not match the scan.

Page 13: Non-blocking Data Structures for High- Performance Computing Håkan Sundell, PhD

5 August 2005EPCC 200513

Lock-Free Memory Allocation

Solution (lock-free), IBM freelists:Create a linked-list of the free nodes,

allocate/reclaim using CAS

Needs some mechanism to avoid the ABA problem.

Head Mem 1 Mem 2 Mem n…

Used 1Reclaim

Allocate

Page 14: Non-blocking Data Structures for High- Performance Computing Håkan Sundell, PhD

5 August 2005EPCC 200514

Shared Data Structure:Dictionaries (Sets) Fundamental data structure Works on a set of <key,value> pairs Three basic operations:

Insert(k,v): Adds a new item

v=FindKey(k): Finds the item <k,v>v=DeleteKey(k): Finds and removes

the item <k,v>

Page 15: Non-blocking Data Structures for High- Performance Computing Håkan Sundell, PhD

5 August 2005EPCC 200515

Randomized Algorithm: Skip Lists

William Pugh: ”Skip Lists: A Probabilistic Alternative to Balanced Trees”, 1990 Layers of ordered lists with different

densities, achieves a tree-like behavior

Time complexity: O(log2N) – probabilistic!

1 2 3 4 5 6 7

Head Tail

50%25%…

Page 16: Non-blocking Data Structures for High- Performance Computing Håkan Sundell, PhD

5 August 2005EPCC 200516

New Lock-Free Concurrent Skip List

Define node state to depend on the insertion status at lowest level as well as a deletion flag

Insert from lowest level going upwards

Set deletion flag. Delete from highest level going downwards

1 2 3 4 5 6 7D D D D D D D

123

p

123

p D

Page 17: Non-blocking Data Structures for High- Performance Computing Håkan Sundell, PhD

5 August 2005EPCC 200517

Overlapping operations on shared data Example: Insert operation

- which of 2 or 3 gets inserted? Solution: Compare-And-Swap

atomic primitive:

CAS(p:pointer to word, old:word, new:word):booleanatomic do

if *p = old then *p := new; return true;

else return false;

1

2

3

4

Insert 3

Insert 2

Page 18: Non-blocking Data Structures for High- Performance Computing Håkan Sundell, PhD

5 August 2005EPCC 200518

Concurrent Insert vs. Delete operations

Problem:

- both nodes are deleted!

Solution (Harris et al): Use bit 0 of pointer to mark deletion status

1

3

42Delete

Insert

a)b)

1

3

42 * a)b)

c)

Page 19: Non-blocking Data Structures for High- Performance Computing Håkan Sundell, PhD

5 August 2005EPCC 200519

Helping Scheme

Threads need to traverse safely

Need to remove marked-to-be-deleted nodes while traversing – Help!

Finds previous node, finish deletion and continues traversing from previous node

1 42 *1 42 * or

? ?

1 42 *

Page 20: Non-blocking Data Structures for High- Performance Computing Håkan Sundell, PhD

5 August 2005EPCC 200520

Lock-Free Skip List - Techniques Summary The Skip List is treated as layers of ordered

lists Uses CAS atomic primitive Lock-Free memory management

IBM Freelists Reference counting (Valois+Michael&Scott)

Helping scheme Back-Off strategy All together proved to be linearizable

Page 21: Non-blocking Data Structures for High- Performance Computing Håkan Sundell, PhD

5 August 2005EPCC 200521

Lock-Free Skip List publications First publications in literature:

H. Sundell and P. Tsigas, ”Fast and Lock-Free Concurrent Priority Queues for Multi-thread Systems”, IPDPS 2003

H. Sundell and P. Tsigas, ”Scalable and Lock-Free Concurrent Dictionaries”, SAC 2004

Later publications: M. Fomitchev and E. Ruppert, “Lock-free

linked lists and skip lists”, PODC 2004 K. Fraser, “Practical lock-freedom”, PhD

thesis, 2004

Page 22: Non-blocking Data Structures for High- Performance Computing Håkan Sundell, PhD

5 August 2005EPCC 200522

New Lock-Free Skip List !

The thread that fulfils the deletion of a node removes the next pointer when finished.

Allows other threads to traverse through even marked next pointers.

If not possible to traverse forward, go back to the remembered position on previous (upper) levels.

Helps deletions-in-progress only when absolutely necessary.

Works with a modified version of Michael’s Hazard Pointer memory management!

Page 23: Non-blocking Data Structures for High- Performance Computing Håkan Sundell, PhD

5 August 2005EPCC 200523

Correctness

Linearizability (Herlihy 1991)In order for an implementation to be

linearizable, for every concurrent execution, there should exist an equal sequential execution that respects the partial order of the operations in the concurrent execution

Page 24: Non-blocking Data Structures for High- Performance Computing Håkan Sundell, PhD

5 August 2005EPCC 200524

Correctness

Define precise sequential semantics Define abstract state and its interpretation

Show that state is atomically updated Define linearizability points

Show that operations take effect atomically at these points with respect to sequential semantics

Creates a total order using the linearizability points that respects the partial order The algorithm is linearizable

Page 25: Non-blocking Data Structures for High- Performance Computing Håkan Sundell, PhD

5 August 2005EPCC 200525

Memory Consistency and Out-Of-Order execution Models on actual multiprocessor

architectures: Relaxed Memory Order etc.

Must insert special machine instructions (memory barriers) to enforce stronger memory consistency models!

t

W(x,1)Ti

Tj

Tk

W(y,0)

R(y)=0W(x,0)

R(x)=1

W(y,1)

R(y)=1

R(x)=1

R(x)=0

R(y)=1

Page 26: Non-blocking Data Structures for High- Performance Computing Håkan Sundell, PhD

5 August 2005EPCC 200526

Experiments

Experiment with 1-32 threads performed on Sun Fire 15K with 48 cpu’s. Each thread performs 20000 operations,

whereof the first total 50-10000 operations are Insert’s, remaining are equally randomly distributed over Insert, FindKey and DeleteKey’s.

Fixed Skiplist maximum level of 10. Compare with implementations of other skip

list-based dictionaries and a singly linked list by Michael, using same scenarios.

Averaged execution time of 10 experiments.

Page 27: Non-blocking Data Structures for High- Performance Computing Håkan Sundell, PhD

5 August 2005EPCC 200527

Page 28: Non-blocking Data Structures for High- Performance Computing Håkan Sundell, PhD

5 August 2005EPCC 200528

Page 29: Non-blocking Data Structures for High- Performance Computing Håkan Sundell, PhD

5 August 2005EPCC 200529

Page 30: Non-blocking Data Structures for High- Performance Computing Håkan Sundell, PhD

5 August 2005EPCC 200530

Page 31: Non-blocking Data Structures for High- Performance Computing Håkan Sundell, PhD

5 August 2005EPCC 200531

Page 32: Non-blocking Data Structures for High- Performance Computing Håkan Sundell, PhD

5 August 2005EPCC 200532

Page 33: Non-blocking Data Structures for High- Performance Computing Håkan Sundell, PhD

5 August 2005EPCC 200533

Page 34: Non-blocking Data Structures for High- Performance Computing Håkan Sundell, PhD

5 August 2005EPCC 200534

Page 35: Non-blocking Data Structures for High- Performance Computing Håkan Sundell, PhD

5 August 2005EPCC 200535

Multi-Word Compare-And-Swap Operations:

bool CASN(int *p1, int o1, int n1, …); int Read(int *p);

Standard algoritmic approach: 1. Try to acquire a lock on all positions of interest. 2. If already taken, help corresponding operation 3. If all taken and all match change status of operation 4. Remove locks and possibly write new values

My approach: Wait-free memory management (IPDPS 2005) Lock stealing and lock hand-over Allow un-sorted pointers

Page 36: Non-blocking Data Structures for High- Performance Computing Håkan Sundell, PhD

5 August 2005EPCC 200536

Page 37: Non-blocking Data Structures for High- Performance Computing Håkan Sundell, PhD

5 August 2005EPCC 200537

Page 38: Non-blocking Data Structures for High- Performance Computing Håkan Sundell, PhD

5 August 2005EPCC 200538

Page 39: Non-blocking Data Structures for High- Performance Computing Håkan Sundell, PhD

5 August 2005EPCC 200539

Page 40: Non-blocking Data Structures for High- Performance Computing Håkan Sundell, PhD

5 August 2005EPCC 200540

Page 41: Non-blocking Data Structures for High- Performance Computing Håkan Sundell, PhD

5 August 2005EPCC 200541

Page 42: Non-blocking Data Structures for High- Performance Computing Håkan Sundell, PhD

5 August 2005EPCC 200542

Page 43: Non-blocking Data Structures for High- Performance Computing Håkan Sundell, PhD

5 August 2005EPCC 200543

Page 44: Non-blocking Data Structures for High- Performance Computing Håkan Sundell, PhD

5 August 2005EPCC 200544

Page 45: Non-blocking Data Structures for High- Performance Computing Håkan Sundell, PhD

5 August 2005EPCC 200545

Page 46: Non-blocking Data Structures for High- Performance Computing Håkan Sundell, PhD

5 August 2005EPCC 200546

Page 47: Non-blocking Data Structures for High- Performance Computing Håkan Sundell, PhD

5 August 2005EPCC 200547

Page 48: Non-blocking Data Structures for High- Performance Computing Håkan Sundell, PhD

5 August 2005EPCC 200548

Lock-Free Deque

Practical algorithms in literature: Michael 2003, ”CAS-based lock-free

algorithm for shared deques”, Euro-Par 2003 Sundell and Tsigas, ”Lock-Free and Practical

Doubly Linked List-Based Deques using Single-Word Compare-And-Swap”, OPODIS 2004

Approach Apply new memory management on lock-

free deque

Page 49: Non-blocking Data Structures for High- Performance Computing Håkan Sundell, PhD

5 August 2005EPCC 200549

Page 50: Non-blocking Data Structures for High- Performance Computing Håkan Sundell, PhD

5 August 2005EPCC 200550

Page 51: Non-blocking Data Structures for High- Performance Computing Håkan Sundell, PhD

5 August 2005EPCC 200551

Conclusions

Work performed at EPCC Improved algorithm of lock-free skip list

• Improved Michael’s hazard pointer algorithm Experiments comparing with other recent dictionary

algorithms New implementation of CASN. Experiments comparing with other recent CASN

algorithms. Experiments comparing a lock-free deque

algorithm using different memory management techniques.

Future work Implement new lock-free/ wait-free dynamic data

structures. More experiments.

Page 52: Non-blocking Data Structures for High- Performance Computing Håkan Sundell, PhD

5 August 2005EPCC 200552

Questions?

Contact Information: Address:

Håkan Sundell Computing ScienceChalmers University of

Technology Email:

[email protected] Web:

http://www.cs.chalmers.se/~phs