A Non-Blocking, Contention-FriendlySkip List
School of Information TechnologiesDr Vincent Gramoli | Lecturer
Insert Partner Logo - Delete if not required
Joint work with Tyler Crain (IRISA) and Michel Raynal (U. Rennes 1, IUF)
2Vincent Gramoli
Context
› 2011, concurrency in production (tablets, phones)
› All apps must be concurrent!
Multi-cores (dual-core, core-duo, quad-core…) are everywhere
3Vincent Gramoli
Context
› Pollack’s law [Borkar, CACM 2011]
Many-cores (the only affordable way of getting higher performance)
0 1 2 3 4 5 6 7 8 9 100
2
4
6
8
10Energy consumptionPerformance
Core complexity (Area of logic)
4Vincent Gramoli
Motivations
› Data structures are the bottleneck
- B-trees are cool to batch disk, not to main-memory
- Hash tables do not support single-access range queries
- Binary trees per-key load expectation is not uniform
› Skip lists are good candidates for in-memory database
- Single-access range queries are supported
- Per-key load distribution is expected to be uniform
- In-production databases use it (e.g., memsql, arango db)
Why Skip List?
5Vincent Gramoli
Skip List
› Sequential skip list [Bill Pugh, CACM 1990]
- similar to a linked list with shortcuts (plus next pointers)
- provide logarithmic insert/delete/contains in expectation
- a tower of high height is more likely accessed
- all shortcuts must be updated (in addition to next pointers)
-∞
-∞
-∞
+∞
+∞
+∞62365
36
1182312
36
12
6Vincent Gramoli
Skip List
› Sequential skip list [Bill Pugh, CACM 1990]
- similar to a linked list with shortcuts (plus next pointers)
- provide logarithmic insert/delete/contains in expectation
- a tower of high height is more likely accessed
- all shortcuts must be updated (in addition to next pointers)
-∞
-∞
-∞
+∞
+∞
+∞62365
36
1182312
36
12
contains(23)?
7Vincent Gramoli
Skip List
› Sequential skip list [Bill Pugh, CACM 1990]
- similar to a linked list with shortcuts (plus next pointers)
- provide logarithmic insert/delete/contains in expectation
- a tower of high height is more likely accessed
- all shortcuts must be updated (in addition to next pointers)
-∞
-∞
-∞
+∞
+∞
+∞62365
36
1182312
36
12
remove(62)
8Vincent Gramoli
Skip List
› Sequential skip list [Bill Pugh, CACM 1990]
- similar to a linked list with shortcuts (plus next pointers)
- provide logarithmic insert/delete/contains in expectation
- a tower of high height is more likely accessed
- all shortcuts must be updated (in addition to next pointers)
-∞
-∞
-∞
+∞
+∞
+∞62365
36
1182312
36
12
remove(62)
Related work
› Non-blocking skip lists [Fomitchev, Ruppert, PODC’04], [Sundell, Tsigas, SAC’04], [Lea, JSR166].
- provide logarithmic complexity in expectation (always)
- are not contention-friendly
- Typical hot spots are at the top (frequently accessed)Hot spots!
Contention scale
Vincent Gramoli
Related work
› Non-blocking skip lists [Fomitchev, Ruppert, PODC’04], [Sundell, Tsigas, SAC’04], [Lea, JSR166].
- provide logarithmic complexity in expectation (always)
- are not contention-friendly
- Typical hot spots are at the top (frequently accessed)
Contention scale
Vincent Gramoli
11Vincent Gramoli
Contention-friendly, non-blocking skip list
› Key ideas:
- Ensuring O(log n) complexity in the absence of contention
- Diminishing contention in case of contention bursts by relaxing O(log n)
› By means of update decoupling:
- Eager abstract modification:
- Update: returns after updating the bottom level
- Lazy and selective adaptation:
- Update: postpone the adaptation at higher levels
- Remove: chooses the least likely contended towers
12Vincent Gramoli
Contention-friendly, non-blocking skip list
› Eager abstract modification at bottom level
Example: inserting 12
-∞
-∞
-∞
+∞
+∞
+∞62365
36
11823
36
insert(12)
13Vincent Gramoli
Contention-friendly, non-blocking skip list
› Eager abstract modification at bottom level
Example: inserting 12
-∞
-∞
-∞
+∞
+∞
+∞62365
36
1182312
36
insert(12)
14Vincent Gramoli
Contention-friendly, non-blocking skip list
› Eager abstract modification at bottom level
› Operation is done, client gets response
Example: inserting 12
-∞
-∞
-∞
+∞
+∞
+∞62365
36
11823
36
insert(12)
12
15Vincent Gramoli
Contention-friendly, non-blocking skip list
› Eager abstract modification at bottom level
› Operation is done, client gets response
› Lazy update of the higher shortcuts
Example: inserting 12
-∞
-∞
-∞
+∞
+∞
+∞62365
36
11823
36
insert(12)
12
16Vincent Gramoli
Contention-friendly, non-blocking skip list
› Eager abstract modification at bottom level
› Operation is done, client gets response
› Lazy update of the higher shortcuts
Example: inserting 12
-∞
-∞
-∞
+∞
+∞
+∞62365
36
11823
36
insert(12)
12
12
17Vincent Gramoli
Contention-friendly, non-blocking skip list
› Eager abstract modification marks the tower
› Operation is done, client gets response
› Selective adaptation may decide not to remove it
Example: removing 36
-∞
-∞
-∞
+∞
+∞
+∞62365
36
11823
36
remove(36)
12
12
18Vincent Gramoli
Contention-friendly, non-blocking skip list
› Eager abstract modification marks the tower
› Operation is done, client gets response
› Selective adaptation may decide not to remove it
Example: removing 36
-∞
-∞
-∞
+∞
+∞
+∞62365
36
11823
36
remove(36)
12
12
19Vincent Gramoli
Contention-friendly, non-blocking skip list
› Fault-tolerance:
- If one core crashes, the others can still progress
- If the adaptation thread crashes, then performance might degrade but safety and progress is preserved
› Heterogeneous architecture:
- one slow core does not necessarily affect the execution of other cores
Non-blocking: the system as a whole always makes progress
20Vincent Gramoli
Actual Java Implementation
› Head: dummy node, tail: null node
› Leaves are nodes, internal are IndexItems, bottom pointers allow to fetch value in O(1)
› Logical deletion mark: node.value = (null)⊥
› Backward pointer to backtrack in case of deleted nodes found
› Only CAS are used for synchronization (never blocks)
› Doubly linked list at the bottom to allow backtrack in case a deleted node is found
›
21Vincent Gramoli
Actual Java Implementation
› Traversing the data structure (insert/delete/contains) is
- From left to right
- From top to bottom
- If a removed item is encountered, then backtrack to last non-removed item
- If a logically deleted node is found, then help-remove it
› Help-removal:
- If node already marked (concurrent help-removal) then give up help-removal
- Else mark it (CAS), give it a dummy marked successor (as in j.u.c.ConcurrentSkipListMap)
› Insertion:
- Logical: If logically deleted tower found at right position, then reset the value (1 CAS)
- Physical: If no such tower is found, insert as usual if not in between two marked nodes (1 CAS) to avoid lost update scenarios
› Deletion: Only done logically (1 CAS), further traversal may physically remove it
Implementation of insert/delete/contains
22Vincent Gramoli
Actual Java Implementation
› A single thread traverses repeatedly the structure
› Unlink physically deleted nodes that have height of 1.
› Deterministic level adjustment:
- If 3 consecutive towers have the same heights, raise the one in the middle by 1
- When too many tall towers are logically deleted as all towers of heights 1 are removed, then
- remove the bottom most level of the index items (setting the down pointer to ⊥(null))
Implementation of adaptation
23Vincent Gramoli
Actual Java Implementation
› Our skip list
- provides logarithmic complexity (w/o contention)
- is contention-friendly (sequential at the top, almost not contended at the bottom)
Contention scale
No hot spots, almost no contention
24Vincent Gramoli
Experiments
› Two 12-core AMD processors for a total of 24 hardware threads
› #executed operations per millisecond averaged over 5 runs of 5 seconds each
› Size (5000 elements, 10000 keys, 50% of update success, size expectation is fixed)
› Java
- Java SE 1.6.0 12-ea in server mode
- HotSpot JVM 11.2-b01
› java.util.concurrent.ConcurrentSkipListMap vs. Contention-friendly non-blocking skip list
› SPECjbb: in-memory database Java server benchmark
- Emulation of a three-tier client/server system
- Key-value store (with Map interfaced collections)
- We replace (order and order histories) thread-local collections into a shared one [Carlstrom et al. PPoPP’07]
- In addition to these accesses:
- Java BigDecimal
- XML processing
- Thread-local collections accesses
Settings
25Vincent Gramoli
Experiments
› Gain at 0%: half of the nodes have multiple level in the contention-friendly skip list while 25% only in the j.u.c.ConcurrentSkipListMap
› Gain at >0% update: due to the contention-friendliness
› Up to 2.5x speedup
MicrobenchmarkAttempted update: 0-100% (Effective update: 10-50%)24 threads
26Vincent Gramoli
Experiments
› Up to 1.8x speedup
Microbenchmark (con’t)Attempted update: 20% (Effective update: 10%)
27Vincent Gramoli
Experiments
› High scalability in both cases:
- 23 threads: 12.3x speedup over sequential for j.u.c.CSLM
- 23 threads: 14.6x speedup over sequential for CF-SkipList
› Performance:
- 23 threads: 747730 business ops/sec for j.u.c.CSLM
- 23 threads: 777662 business ops/sec for CF-SkipList
› J.u.c.CSLM has some advantage at 1 core: 7000 bops more (due to memory optimization?)
› 24 threads: the adapting thread compete for processor time
SPECjbb 2005 (modified)
28Vincent Gramoli
Conclusion
› Data structures are the bottleneck
› Skip list is appealing for in-memory database
› Non-blocking-ness gives fault tolerance and heterogeneity support
› Contention-friendliness is the most effective
- At high level of concurrency
- Under high contention
› Next steps: optimizations
› Open questions:
- What else is used in in-memory DB? What are the observed workloads?
- Can we think of other data structures benefiting from the same decoupling?
- Hash table?
Contention-Friendly Non-Blocking Skip List