fawn: a fast array of wimpy nodes presented by: clint sbisa & irene haque

FAWN: A Fast Array of Wimpy Nodes

Presented by:Clint Sbisa & Irene Haque

Motivation

Large-scale data-intensive applications Facebook, LinkedIn, Dynamo CPU-I/O Gap storage, network and memory bottlenecks low CPU utilization CPU Power slower CPUs execute more queries per second per Watt 1 billion vs. 100 million instructions per Joule inefficient energy saving techniques Memory Power

FAWN

Data-intensive, computational simple workloadsSmall objects - 100B - 1KB Cluster of embedded CPUs using flash storage Efficient Fast random reads Slow random writes FAWN-KV Key-value storage Consistent HashingFAWN-DS Data store Log structured

FAWN - DS

Log-structure key-value storeContains all values in a key range for each virtual ID Maps 160-bit key Hash Index bucket = i low order index bits key fragment = next 15 low order bits6 byte in-memory Hash Index stores frag and pointer

FAWN - DS

Basic Functions: Store Lookup Delete Concurrent operations

Virtual Node Maintenance: Split Merge Compact

Consistent hashing of back-end VIDs Management node assigns each front-end to circular key space Front-end nodes manages its key space forwards out-of-range request Back-end nodes - VIDs contacts front-end when joining owns a key range

FAWN - KV

Chain replication

FAWN - KV

Join split key range pre-copy chain insertion log flush Leave merge key range Join into each chain

FAWN - KV

Individual Node Performance

• Lookup speed

• Bulk store speed: 23.2 MB/s, or 96% of raw speed


• Put speed

• Compared to BerkeleyDB: 0.07 MB/s – shows necessity of log-based filesystems


• Read- and write-intensive workloads

System Benchmarks

• System throughput and power consumption

Impact of Ring Membership Changes

• Query throughput during node join and maintenance operations

Impact of Ring Membership Changes

• Query latency

Alternative Architectures

• Large Dataset, Low Query → FAWN+Disk

• Small Dataset, High Query → FAWN+DRAM

• Middle Range → FAWN+SSD

Conclusion

• Fast and energy efficient processing of random read-intensive workloads

• Over an order of magnitude more queries per Joule than traditional disk-based systems

fawn: a fast array of wimpy nodes presented by: clint sbisa & irene haque

Documents