hitchhiker trees - strangeloop 2016
TRANSCRIPT
Exotic Functional Data Structures:Hitchhiker TreesDavid Greenberg9/17/16Strange Loop
Who am I?
The Basics
FunctionalData StructuresWhat are they, anyway?
Functional Data Structures
Immutable7 + 1 = 8But 7 is still 7
Functional Data Structures
x = [1, 2, 3]
y = x
y += [4]
if x == y:
print("I'm a sad panda”)
How to fix this?x = [1, 2, 3]
y = x[:]
y += [4]
if x != y:
print("I'm a happy panda")
A List of Fruit
Mutation in anImmutable World
REFERENCES
What are pointers?(besides hard)
Pointers!
Pointers and Sharing
Doing Better with Pointers
Linked List
Editing the Linked List
Worse Case Performance
Philosophy of Identity
Q: When isn’t an apple an apple?A: When an apple points to an orange points to a banana isn’t an apple points to an orange points to a mango.
Trees
Binary Search Trees
Lookups are log2(n)
1 = 20
2 = 21
4 = 22
ElementsperLevel
Big O AnalysisWe Care About the Dominating Factor
Performance Analysis/Algebra
We have L levelsLookups cost LOnly the last level mattersThere are 2L-1 elementsThus: n = 2L-1
log2(n) = L
Functional updatesPath Copying
Path Copying
Updates still log2(n)
Properties of TreesBalanced
How do we maintain this?
How to order the valuesSort themTrie
I/OChanging Our Cost ModelWhere did the 2 come from in log2(n)?
IDEAMore childrenFat nodes with ~B children
Going Wide
B Trees are Optimal for Reads
Lower Bound of logB(n) for sorted lookups
Controlling the base of the logarithm is awesome
log2(1000) = 9.96log5(1000) = 4.29log100(1000) = 1.5
Going wide gives big constant speedups for free
Under our I/O cost model
B Tree BookkeepingNot as simple as a Binary Search Tree
Separate Node TypesIndex & Data Nodes
B+ Tree
Reduce B to fit more levels on screen
Introducing Fractal Trees
Fractal Trees
BRIEF ASIDE
We can insert fasterlogb(n) is only for sorted lookups
Appending to a LogConstant time to appendAlready know the next index where we need to insert
A B C D E
Fractal Trees
Fractal InsertionInserting 0
Walking Through Insertions
Inserting -1
Walking Through Insertions
Inserting 28
Walking Through Insertions
Inserting 29
Walking Through Insertions
Inserting -2
Walking Through Insertions
Inserting 11.5
Walking Through Insertions
Inserting 100
What about Reads?
Looking up 20
Find the Path
Project Pending Operations
Broken for Scans
Only Project Values Within Range
Hitchhiker vs Fractal
Path Copying or Not!
Fractal Trees update in-place
Path Copying or Not!
Hitchhiker Trees use path-copying
Flush ControlTotal I/O I/O per
FlushAvg I/O per Insert
B+ Tree 21 3 3Fractal Tree 12 1 to 4 1.7Hitchhiker Tree
5 5 0.7
Real Branching FactorsB+ Trees have fan out of 1000-2000Hitchhiker Trees have fan out of 100-200But Hitchhiker Tree buffers hold 900-1000 elements!
I want to try it!On Github
Datacrypt is PluggableBackend StorageI/O ManagementSerializationSorting Algorithm
Works with RedisCalled the Outboard API
OutboardLooks like a hash mapData stored off-heap in RedisFunctional data structures mean free snapshotsAfter a VM restart, just reconnect to RedisLifetime of in-memory data doesn’t need to be tied to lifetime of runtime memory
What’ll we build next?Q&A
Thanks to:Andy Chambers for JDBC Backend &
GC ImprovementsCasey Marshall for S3 Backend
(Prefix) Tries
(Hash) Array Mapped Tries
We add the fat node trick from B treesWe hash keys first for even distributionNo need to store full hash: prefix is enough