hitchhiker trees - strangeloop 2016

Post on 11-Apr-2017

473 Views

Category:

Software

1 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Exotic Functional Data Structures:Hitchhiker TreesDavid Greenberg9/17/16Strange Loop

Who am I?

The Basics

FunctionalData StructuresWhat are they, anyway?

Functional Data Structures

Immutable7 + 1 = 8But 7 is still 7

Functional Data Structures

x = [1, 2, 3]

y = x

y += [4]

if x == y:

print("I'm a sad panda”)

How to fix this?x = [1, 2, 3]

y = x[:]

y += [4]

if x != y:

print("I'm a happy panda")

A List of Fruit

Mutation in anImmutable World

REFERENCES

What are pointers?(besides hard)

Pointers!

Pointers and Sharing

Doing Better with Pointers

Linked List

Editing the Linked List

Worse Case Performance

Philosophy of Identity

Q: When isn’t an apple an apple?A: When an apple points to an orange points to a banana isn’t an apple points to an orange points to a mango.

Trees

Binary Search Trees

Lookups are log2(n)

1 = 20

2 = 21

4 = 22

ElementsperLevel

Big O AnalysisWe Care About the Dominating Factor

Performance Analysis/Algebra

We have L levelsLookups cost LOnly the last level mattersThere are 2L-1 elementsThus: n = 2L-1

log2(n) = L

Functional updatesPath Copying

Path Copying

Updates still log2(n)

Properties of TreesBalanced

How do we maintain this?

How to order the valuesSort themTrie

I/OChanging Our Cost ModelWhere did the 2 come from in log2(n)?

IDEAMore childrenFat nodes with ~B children

Going Wide

B Trees are Optimal for Reads

Lower Bound of logB(n) for sorted lookups

Controlling the base of the logarithm is awesome

log2(1000) = 9.96log5(1000) = 4.29log100(1000) = 1.5

Going wide gives big constant speedups for free

Under our I/O cost model

B Tree BookkeepingNot as simple as a Binary Search Tree

Separate Node TypesIndex & Data Nodes

B+ Tree

Reduce B to fit more levels on screen

Introducing Fractal Trees

Fractal Trees

BRIEF ASIDE

We can insert fasterlogb(n) is only for sorted lookups

Appending to a LogConstant time to appendAlready know the next index where we need to insert

A B C D E

Fractal Trees

Fractal InsertionInserting 0

Walking Through Insertions

Inserting -1

Walking Through Insertions

Inserting 28

Walking Through Insertions

Inserting 29

Walking Through Insertions

Inserting -2

Walking Through Insertions

Inserting 11.5

Walking Through Insertions

Inserting 100

What about Reads?

Looking up 20

Find the Path

Project Pending Operations

Broken for Scans

Only Project Values Within Range

Hitchhiker vs Fractal

Path Copying or Not!

Fractal Trees update in-place

Path Copying or Not!

Hitchhiker Trees use path-copying

Flush ControlTotal I/O I/O per

FlushAvg I/O per Insert

B+ Tree 21 3 3Fractal Tree 12 1 to 4 1.7Hitchhiker Tree

5 5 0.7

Real Branching FactorsB+ Trees have fan out of 1000-2000Hitchhiker Trees have fan out of 100-200But Hitchhiker Tree buffers hold 900-1000 elements!

I want to try it!On Github

Datacrypt is PluggableBackend StorageI/O ManagementSerializationSorting Algorithm

Works with RedisCalled the Outboard API

OutboardLooks like a hash mapData stored off-heap in RedisFunctional data structures mean free snapshotsAfter a VM restart, just reconnect to RedisLifetime of in-memory data doesn’t need to be tied to lifetime of runtime memory

What’ll we build next?Q&A

Thanks to:Andy Chambers for JDBC Backend &

GC ImprovementsCasey Marshall for S3 Backend

(Prefix) Tries

(Hash) Array Mapped Tries

We add the fat node trick from B treesWe hash keys first for even distributionNo need to store full hash: prefix is enough

top related