code specialization for memory efficient hash tries · code specialization for memory efficient...
TRANSCRIPT
Code Specialization for Memory Efficient
Hash Tries
Michael Steindorfer, Jurgen Vinju Centrum Wiskunde & Informatica, Amsterdam, The Netherlands
0%
25%
50%
75%
100%
MapGeneric Specialized
45%
100%
2
0%
25%
50%
75%
100%
SetGeneric Specialized
22%
100%
3
• memory usage vs runtime
• size of source code or binary
• platform specifics
Hash Tries
Hash TriesFast Immutable Data Structures on the JVM
Hash Tries(Wide) Hash-Prefix Trees with Array Nodes
{32, 2, 4098, 34}
8
2
320 1 2 3 4 5 6 7 8
...... 31
034
1 2 3 4 5 6 7 8...
... 31
20 1 2 3
40984 5 6 7 8
...... 31
(a) 33 and 2
320 1 2 3 4 5 6
...... 31
034
1 2 3 4 5 6...
... 31
20 1 2 3
40984 5 6
...... 31
(b) 33 and 2
Figure 1. Inserting a sequence of numbers in a hash array mapped trie. The small index numbers refer to theposition in a sparse array.
insertion calculates the number’s hash code⇤:
hash(32) = . . . 000 00001 000002 = 0 0 0 0 0 1 032
hash(2) = . . . 000 00000 000102 = 0 0 0 0 0 0 232
hash(4098) = . . . 100 00000 000102 = 0 0 0 0 4 0 232
hash(34) = . . . 000 00001 000102 = 0 0 0 0 0 1 232
We first separate the hash codes in chunks of 5-bit to notate chunks as decimal values with rangesfrom 0 to 31. Then insertion places the values in a 32-nary tree (where each is encoded as a sparsearray), based on the hash code prefixes. The tree structure gets expanded until every prefix can beunambiguously stored.
To continue our example: 32 is inserted at the root node; 2 as well (because they do not sharea common prefix). 4098 shares the prefix path !0!2 with value 2, consequently it is placedunambiguously on level 3. 32 shares the prefix path !2 with 2 and 4098, but can be differentiatedon level 2 from both.
Note, that a chunk size of 5-bit for 32-bit hash codes results in trees with a maximal depth ofdlog32(232)e = 7.
In contrast Figure 3 illustrates an array-based hash-table. By comparing the visualizations of bothdata structures we can identify the following list of disadvantages of HAMTs over array-based hashtables (Section ?? provides evidence for our claims):
• Lookup has to follow a tree path of length between 1–7 nodes answer containment queries.Memory indirections and indexing into sparse arrays is less efficient than performing a singleindex-based lookup in continuous array.
⇤We assume a hash function for integers returns the argument, i.e. identity.
Copyright c� 0000 John Wiley & Sons, Ltd. Softw. Pract. Exper. (0000)Prepared using speauth.cls DOI: 10.1002/spe
2
320 1 2 3 4 5 6 7 8
...... 31
034
1 2 3 4 5 6 7 8...
... 31
20 1 2 3
40984 5 6 7 8
...... 31
(a) 33 and 2
320 1 2 3 4 5 6
...... 31
034
1 2 3 4 5 6...
... 31
20 1 2 3
40984 5 6
...... 31
(b) 33 and 2
Figure 1. Inserting a sequence of numbers in a hash array mapped trie. The small index numbers refer to theposition in a sparse array.
insertion calculates the number’s hash code⇤:
hash(32) = . . . 000 00001 000002 = 0 0 0 0 0 1 032
hash(2) = . . . 000 00000 000102 = 0 0 0 0 0 0 232
hash(4098) = . . . 100 00000 000102 = 0 0 0 0 4 0 232
hash(34) = . . . 000 00001 000102 = 0 0 0 0 0 1 232
We first separate the hash codes in chunks of 5-bit to notate chunks as decimal values with rangesfrom 0 to 31. Then insertion places the values in a 32-nary tree (where each is encoded as a sparsearray), based on the hash code prefixes. The tree structure gets expanded until every prefix can beunambiguously stored.
To continue our example: 32 is inserted at the root node; 2 as well (because they do not sharea common prefix). 4098 shares the prefix path !0!2 with value 2, consequently it is placedunambiguously on level 3. 32 shares the prefix path !2 with 2 and 4098, but can be differentiatedon level 2 from both.
Note, that a chunk size of 5-bit for 32-bit hash codes results in trees with a maximal depth ofdlog32(232)e = 7.
In contrast Figure 3 illustrates an array-based hash-table. By comparing the visualizations of bothdata structures we can identify the following list of disadvantages of HAMTs over array-based hashtables (Section ?? provides evidence for our claims):
• Lookup has to follow a tree path of length between 1–7 nodes answer containment queries.Memory indirections and indexing into sparse arrays is less efficient than performing a singleindex-based lookup in continuous array.
⇤We assume a hash function for integers returns the argument, i.e. identity.
Copyright c� 0000 John Wiley & Sons, Ltd. Softw. Pract. Exper. (0000)Prepared using speauth.cls DOI: 10.1002/spe
9
abstract class TrieSet implements java.util.Set {
TrieNode root; int size;
class TrieNode { int bitmap; Object[] contentAndSubTries; } }
4
0 2
320 1
0 434
2 4098
(a) Scala
320 2
034
1
20
40984
(b) Clojure
232
0
034
1
20
40984
(c) Clojure
Figure 3. Conceptual difference in tree layout between Clojure’s and Scala’s HAMT implementations.
Figure 4. Footprints of HAMT sets and HAMT maps in 32-bit and 64-bit environments. Defaulting to 5-bitprefix chunks.
subnode pointers. Michael IDon’t talk explicitly about Clojure/Scala, rather about mixed/separatedHAMT node designs.I
Division between internal nodes and leaf nodes. Scala’s HAMT implementations divide the treestructure into internal nodes and leaf nodes. The internal nodes amount for the hash-based prefixtree structure. The leaf nodes encapsulate the data tuple (i.e, a key in case of a set, a key/value pair
Copyright c� 0000 John Wiley & Sons, Ltd. Softw. Pract. Exper. (0000)Prepared using speauth.cls DOI: 10.1002/spe
10
abstract class TrieSet implements java.util.Set {
TrieNode root; int size;
class TrieNode { int bitmap; Object[] contentAndSubTries; } }
4
0 2
320 1
0 434
2 4098
(a) Scala
320 2
034
1
20
40984
(b) Clojure
232
0
034
1
20
40984
(c) Clojure
Figure 3. Conceptual difference in tree layout between Clojure’s and Scala’s HAMT implementations.
Figure 4. Footprints of HAMT sets and HAMT maps in 32-bit and 64-bit environments. Defaulting to 5-bitprefix chunks.
subnode pointers. Michael IDon’t talk explicitly about Clojure/Scala, rather about mixed/separatedHAMT node designs.I
Division between internal nodes and leaf nodes. Scala’s HAMT implementations divide the treestructure into internal nodes and leaf nodes. The internal nodes amount for the hash-based prefixtree structure. The leaf nodes encapsulate the data tuple (i.e, a key in case of a set, a key/value pair
Copyright c� 0000 John Wiley & Sons, Ltd. Softw. Pract. Exper. (0000)Prepared using speauth.cls DOI: 10.1002/spe
11
abstract class TrieSet implements java.util.Set {
TrieNode root; int size;
class TrieNode { int bitmap; Object[] contentAndSubTries; } }
4
0 2
320 1
0 434
2 4098
(a) Scala
320 2
034
1
20
40984
(b) Clojure
232
0
034
1
20
40984
(c) Clojure
Figure 3. Conceptual difference in tree layout between Clojure’s and Scala’s HAMT implementations.
Figure 4. Footprints of HAMT sets and HAMT maps in 32-bit and 64-bit environments. Defaulting to 5-bitprefix chunks.
subnode pointers. Michael IDon’t talk explicitly about Clojure/Scala, rather about mixed/separatedHAMT node designs.I
Division between internal nodes and leaf nodes. Scala’s HAMT implementations divide the treestructure into internal nodes and leaf nodes. The internal nodes amount for the hash-based prefixtree structure. The leaf nodes encapsulate the data tuple (i.e, a key in case of a set, a key/value pair
Copyright c� 0000 John Wiley & Sons, Ltd. Softw. Pract. Exper. (0000)Prepared using speauth.cls DOI: 10.1002/spe
12
13
... class NodeNode extends TrieNode { int bitmap; TrieNode nodeAtIndex0; TrieNode nodeAtIndex1; } class ElementNode extends TrieNode { int bitmap; Object keyAtIndex0; TrieNode nodeAtIndex1; } class NodeElement extends TrieNode { int bitmap; TrieNode nodeAtIndex0; Object keyAtIndex1; } ...
class TrieNode { int bitmap; Object[] contentAndSubTries; }
class TrieNode { int bitmap; Object[] contentAndSubTries; }
14
... class NodeNode extends TrieNode { int bitmap; TrieNode nodeAtIndex0; TrieNode nodeAtIndex1; } class ElementNode extends TrieNode { int bitmap; Object keyAtIndex0; TrieNode nodeAtIndex1; } class NodeElement extends TrieNode { int bitmap; TrieNode nodeAtIndex0; Object keyAtIndex1; } ...
ExponentialNumber of Specializations
15
Memory Overhead per Pointer (Set, 32-bit)
0 Bytes
8 Bytes
16 Bytes
24 Bytes
32 Bytes
40 Bytes
48 Bytes
1-ary 2-ary 3-ary 4-ary 5-ary 6-ary 7-ary 8-ary 9-ary 10-ary 11-ary 12-ary
7,38,08,08,99,010,310,712,814,0
18,7
24,0
48,0
16
Frequency by Node Arity
0%
10%
20%
30%
40%
50%
60%
70%
0-ary 1-ary 2-ary 3-ary 4-ary 5-ary 6-ary 7-ary 8-ary 9-ary 10-ary 11-ary 12-ary
1%1%1%1%1%1%1%1%3%
14%
63%
1%0%
17
Arities % of Nodes
≤4 82%
≤8 86%
≤12 90%
18
Arities Specializations
≤4 31
≤8 511
≤12 8191
19
Avoiding Permutations
20
Arities Specializations
≤4 15 (31)
≤8 45 (511)
≤12 91 (8191)
21
abstract class TrieSet implements java.util.Set { TrieNode root; int size;
interface TrieNode { ... } ... class NodeNode extends TrieNode { int bitmap; TrieNode nodeAtIndex0; TrieNode nodeAtIndex1; } class ElementNode extends TrieNode { int bitmap; Object keyAtIndex0; TrieNode nodeAtIndex1; } class NodeElement extends TrieNode { int bitmap; TrieNode nodeAtIndex0; Object keyAtIndex1; } class ElementElement extends TrieNode { int bitmap; Object keyAtIndex0; Object keyAtIndex1; } ... }
4
0 2
320 1
0 434
2 4098
(a) Scala
320 2
034
1
20
40984
(b) Clojure
232
0
034
1
20
40984
(c) Clojure
Figure 3. Conceptual difference in tree layout between Clojure’s and Scala’s HAMT implementations.
Figure 4. Footprints of HAMT sets and HAMT maps in 32-bit and 64-bit environments. Defaulting to 5-bitprefix chunks.
subnode pointers. Michael IDon’t talk explicitly about Clojure/Scala, rather about mixed/separatedHAMT node designs.I
Division between internal nodes and leaf nodes. Scala’s HAMT implementations divide the treestructure into internal nodes and leaf nodes. The internal nodes amount for the hash-based prefixtree structure. The leaf nodes encapsulate the data tuple (i.e, a key in case of a set, a key/value pair
Copyright c� 0000 John Wiley & Sons, Ltd. Softw. Pract. Exper. (0000)Prepared using speauth.cls DOI: 10.1002/spe
22
23
4
0 2
320 1
0 434
2 4098
(a) Scala
320 2
034
1
20
40984
(b) Clojure
232
0
034
1
20
40984
(c) Clojure
Figure 3. Conceptual difference in tree layout between Clojure’s and Scala’s HAMT implementations.
Figure 4. Footprints of HAMT sets and HAMT maps in 32-bit and 64-bit environments. Defaulting to 5-bitprefix chunks.
subnode pointers. Michael IDon’t talk explicitly about Clojure/Scala, rather about mixed/separatedHAMT node designs.I
Division between internal nodes and leaf nodes. Scala’s HAMT implementations divide the treestructure into internal nodes and leaf nodes. The internal nodes amount for the hash-based prefixtree structure. The leaf nodes encapsulate the data tuple (i.e, a key in case of a set, a key/value pair
Copyright c� 0000 John Wiley & Sons, Ltd. Softw. Pract. Exper. (0000)Prepared using speauth.cls DOI: 10.1002/spe
abstract class TrieSet implements java.util.Set { TrieNode root; int size;
interface TrieNode { ... } ... class NodeNode extends TrieNode {
byte pos1; TrieNode nodeAtPos1; byte pos2; TrieNode nodeAtPos2; } class ElementNode extends TrieNode {
byte pos1; Object keyAtPos1; byte pos2; TrieNode nodeAtPos2; } class NodeElement extends TrieNode {
byte pos1; TrieNode nodeAtPos1; byte pos2; Object keyAtPos2; } class ElementElement extends TrieNode {
byte pos1; Object keyAtPos1; byte pos2; Object keyAtPos2; } ... }
Lookup Performance (lower is better)
0%
25%
50%
75%
100%
125%
150%
MapGeneric Specialized 0-4 Specialized 0-8 Specialized 0-12
138%138%130%
100%
24
Memory Usage (lower is better)
0%
25%
50%
75%
100%
Map Set
22%
45%
23%
46%52%
62%
100%100% Generic0-40-80-12
25
Memory Usage (lower is better)
0%
25%
50%
75%
100%
Map Set
22%
45%
23%
46%52%
62%
100%100% Generic0-40-80-12
26
Memory Footprint Compared To Competition (lower is better)
0x
1x
2x
3x
4x
5x
Specialized 0-8 Clojure Scala
3,75x
2,2x
1x
4,9x
1,6x
1x
MapSet
27
worst hash distribution ->
good memory performance
28
best hash distribution ->
worst memory performance
29
best hash distribution ->
worst memory performancebest
30