log-free concurrent data structures - usenix · throughput vs. transactional redo-log-based 13...

16
Log-free concurrent data structures Tudor David (IBM Research, Zurich) Aleksandar Dragojevic (Microsoft Research, Cambridge) Rachid Guerraoui (EPFL) Igor Zablotchi (EPFL)

Upload: others

Post on 27-Mar-2020

10 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Log-free concurrent data structures - USENIX · Throughput vs. transactional redo-log-based 13 Consistently faster than log-based data structures Particularly suited to small and

Log-free concurrent data structures

Tudor David (IBM Research, Zurich)

Aleksandar Dragojevic (Microsoft Research, Cambridge)

Rachid Guerraoui (EPFL)

Igor Zablotchi (EPFL)

Page 2: Log-free concurrent data structures - USENIX · Throughput vs. transactional redo-log-based 13 Consistently faster than log-based data structures Particularly suited to small and

In-memory data structures – at the core of many systems

1Data structure performance – essential to system behavior

Lists

Trees

Hash tables

Skip lists

Interface:• insert(k,v);• remove(k);• find(k);

Page 3: Log-free concurrent data structures - USENIX · Throughput vs. transactional redo-log-based 13 Consistently faster than log-based data structures Particularly suited to small and

NV-RAM – expected to become ubiquitous

2

An ideal data structure:• fast• scalable• durable

An ideal data structure:• fast• scalable

How can we ensure performance + consistent durable state?

Page 4: Log-free concurrent data structures - USENIX · Throughput vs. transactional redo-log-based 13 Consistently faster than log-based data structures Particularly suited to small and

Upcoming technology: Non-volatile RAM

L1 cache L2 cache L3 cache DRAM FeRAM PCM MRAM STT-RAM

Read 1 ns 3 ns 10 ns 60 ns 100 ns 70 ns 62 ns 60 ns

Write 1 ns 3 ns 10 ns 60 ns 125 ns 150 ns 62 ns 60 ns

Latencies:

NV-RAM:• Durable• Byte-addressable• Latencies – comparable with DRAM• Programming model - map to area of virtual memory

Source:“nv_malloc:MemoryAllocationforNVRAM”,Schwalb etal.,2015

3

Page 5: Log-free concurrent data structures - USENIX · Throughput vs. transactional redo-log-based 13 Consistently faster than log-based data structures Particularly suited to small and

4

Persistent, fast and correct software – not trivial

Write-back cache:1: mark allocation2: initialize mem3: change link 14: change link 25: done = 1

1: mark memory as allocated2: initialize memory3: change link of node 14: change link of node 25: done = 1

1: mark memory as allocated2: persist allocation3: initialize memory4: persist memory content5: change link of node 16: persist new link7: change link of node 28: persist modified link9: done = 1

NV memory:

3: change link 1

5: done = 1

crash

Upon restart: incorrect state

CLWBACLWBBCLWBC

Batchingwrite-backs:beneficialforperformance

time

cache line write-back

store fence

Page 6: Log-free concurrent data structures - USENIX · Throughput vs. transactional redo-log-based 13 Consistently faster than log-based data structures Particularly suited to small and

5

Write-back cache:1: mark allocation2: initialize mem3: change link 14: change link 25: done = 1

Frequentwaitingfordatatobepersisted

1: mark memory as allocated2: persist allocation3: initialize memory4: persist memory content5: change link of node 16: persist new link7: change link of node 28: persist modified link9: done = 1

1: log[0] = starting transaction X2: persist log[0]3: log[1] = allocating a node at address A4: persist log[1]5: mark memory as allocated6: persist allocation7: initialize memory8: persist memory content9: log[2] = previous value of link10: persist log[2]11: change link 112: persist modified link13: log[3] = previous value of link14: persist log[3]15: change link 216: persist modified link17: done = 118: persist done19: mark transaction X as finished

NV memory:1: mark allocation2: initialize mem3: change link 1crash

Upon restart: undo incomplete operations

Persistent, fast and correct software – not trivial

Page 7: Log-free concurrent data structures - USENIX · Throughput vs. transactional redo-log-based 13 Consistently faster than log-based data structures Particularly suited to small and

6

Persist latency >> cache latency

For performance – minimize time waiting for data to be persisted;Logging – problematic:•We need to issue write-back to the log before each update;• The log entry must be persisted before we perform the

update;

Ourwork:minimizeloggingandwritebacksindatastructures

Page 8: Log-free concurrent data structures - USENIX · Throughput vs. transactional redo-log-based 13 Consistently faster than log-based data structures Particularly suited to small and

7

Our techniques for fast durable data structures1. Link-and-persist:

Use lock-free algorithms: they never leave the structure in an inconsistent state

⇒ no logging in the data structure algorithmCorrectness: link-and-persist

3. NV-epochs: coarse-grain memory management:Track active memory areas, don’t log individual allocations;Avoid work at run-time if it can be delayed for recovery time;

2. Link cache:Buffer updates;When one must be persisted, batch write-backs;

Log-free concurrent data structures

Page 9: Log-free concurrent data structures - USENIX · Throughput vs. transactional redo-log-based 13 Consistently faster than log-based data structures Particularly suited to small and

8

Correct durable implementations?

Use lock-free algorithms: they never leave the structure in an inconsistent state

⇒ no logging in the data structure algorithm

Correctness condition: durable linearizabilityAfter a restart, the structure reflects:• all operations completed (linearized) before the crash;• (potentially) some operations that were ongoing when the crash occurred;

persist

1. Persistently allocate and initialize node2. Add link to new node3. Persist link to new node

If crash between steps 2 and 3,

violation of durable linearizability

Page 10: Log-free concurrent data structures - USENIX · Throughput vs. transactional redo-log-based 13 Consistently faster than log-based data structures Particularly suited to small and

Link-and-persist: achieving correctness

9Link-and-persist: atomic “modify” and “persist” link

1. Persistently allocate and initialize node2. Add marked link to new node3. Persist link to new node4. Remove mark

persist

Other threads - persist marked link if needed

Page 11: Log-free concurrent data structures - USENIX · Throughput vs. transactional redo-log-based 13 Consistently faster than log-based data structures Particularly suited to small and

Going further: defer persisting links

10The link cache: batch write-backs of links

key 1 link addr1

key z link addr z

key y link addr y

Insert(X)X link addr X

Read(X)

write-back all links

A link only needs to be persisted when an operation depends on it

Store all un-persisted links in a fast concurrent cache

When an operation directly depends on a link in the cache:batch write-backs of all links in the cache (and empty the cache)

Page 12: Log-free concurrent data structures - USENIX · Throughput vs. transactional redo-log-based 13 Consistently faster than log-based data structures Particularly suited to small and

Memory allocation and reclamation

11

1: log alloc_and_link2: allocate and initialize new node;3: link into data structure;

1: log unlink_and_free2: unlink node;…3: free the node;

Insertion Deletion

Instead of logging each allocation, keep track of recently used memory areas

alloc 0xa010alloc 0xa110alloc 0xa210

Locality of allocations/deallocations

At recovery, traverse nodes in active areas at the moment of the crash/restart

Page 13: Log-free concurrent data structures - USENIX · Throughput vs. transactional redo-log-based 13 Consistently faster than log-based data structures Particularly suited to small and

Experimental evaluation

12

•No fast byte-addressable NV-RAM for now;• Inject latencies of flushes and write-backs to NV-RAM based on

published values;• Shown experiments:125 ns;

• Comparison with redo-log-based implementations• Also provide durable linearizability;

Page 14: Log-free concurrent data structures - USENIX · Throughput vs. transactional redo-log-based 13 Consistently faster than log-based data structures Particularly suited to small and

Throughput vs. transactional redo-log-based

13

Consistently faster than log-based data structuresParticularly suited to small and medium sized data structures

3.03

3.03

2.27

1.32

1.92 2.04

1.56

1.18

0

0.5

1

1.5

2

2.5

3

3.5

128 4K 64K 4MUpd

ate

thro

ughp

ut r

elat

ive

to

log-

base

d im

plem

enta

tion

# of elements in structure

Hash table

1 Thread

8 Threads

Linked lists, BSTs, skip lists in the paper

Page 15: Log-free concurrent data structures - USENIX · Throughput vs. transactional redo-log-based 13 Consistently faster than log-based data structures Particularly suited to small and

NV-Memcached

14Similar throughput, much lower recovery time

1. Lock-free Memcached: Memcached-clht

2. Replace hash table & slab allocator with durable versions

0

1

2

3

4

5

Thr

ough

put

(100

Kop/

s) 10⁶ keys

memcached

memcached-clht

nv-memcached

0.1

1

10

100

1000

Rec

over

y/w

arm

-up

time

(ms) 10⁶ keys

memcached

memcached-clht

nv-memcached

1:4 set to get ratio

Page 16: Log-free concurrent data structures - USENIX · Throughput vs. transactional redo-log-based 13 Consistently faster than log-based data structures Particularly suited to small and

15

Log-free concurrent data structures1. Link-and-persist:

Use lock-free algorithms: they never leave the structure in an inconsistent state

⇒ no logging in the data structure algorithm

Correctness: link-and-persist

3. NV-epochs: coarse-grain memory management:Track active memory areas, don’t log individual allocations;Avoid work at run-time if it can be delayed for recovery time;

2.Link cache:Buffer updates;When one must be persisted, batch write-backs;

Thank you!Up to several times throughput improvement