cheap and large cams for high performance data-intensive networked systems ashok anand, chitra...
TRANSCRIPT
![Page 1: Cheap and Large CAMs for High Performance Data-Intensive Networked Systems Ashok Anand, Chitra Muthukrishnan, Steven Kappes, and Aditya Akella University](https://reader035.vdocument.in/reader035/viewer/2022070403/56649f285503460f94c40175/html5/thumbnails/1.jpg)
Cheap and Large CAMs for High Performance Data-Intensive Networked Systems
Ashok Anand, Chitra Muthukrishnan, Steven Kappes, and Aditya AkellaUniversity of Wisconsin-Madison
Suman NathMicrosoft Research
![Page 2: Cheap and Large CAMs for High Performance Data-Intensive Networked Systems Ashok Anand, Chitra Muthukrishnan, Steven Kappes, and Aditya Akella University](https://reader035.vdocument.in/reader035/viewer/2022070403/56649f285503460f94c40175/html5/thumbnails/2.jpg)
New data-intensive networked systems
Large hash tables (10s to 100s of GBs)
![Page 3: Cheap and Large CAMs for High Performance Data-Intensive Networked Systems Ashok Anand, Chitra Muthukrishnan, Steven Kappes, and Aditya Akella University](https://reader035.vdocument.in/reader035/viewer/2022070403/56649f285503460f94c40175/html5/thumbnails/3.jpg)
New data-intensive networked systems
Data center Branch office
WAN
WAN optimizersObject
Object store (~4 TB)Hashtable (~32GB)
Look up
Object
Chunks(4 KB)
Key (20 B)
Chunk pointer Large hash tables (32 GB)
High speed (~10 K/sec) inserts and evictions
High speed (~10K/sec) lookups for 500 Mbps link
![Page 4: Cheap and Large CAMs for High Performance Data-Intensive Networked Systems Ashok Anand, Chitra Muthukrishnan, Steven Kappes, and Aditya Akella University](https://reader035.vdocument.in/reader035/viewer/2022070403/56649f285503460f94c40175/html5/thumbnails/4.jpg)
New data-intensive networked systems
• Other systems – De-duplication in storage systems (e.g., Datadomain)– CCN cache (Jacobson et al., CONEXT 2009)– DONA directory lookup (Koponen et al., SIGCOMM 2006)
Cost-effective large hash tablesCheap Large cAMs
![Page 5: Cheap and Large CAMs for High Performance Data-Intensive Networked Systems Ashok Anand, Chitra Muthukrishnan, Steven Kappes, and Aditya Akella University](https://reader035.vdocument.in/reader035/viewer/2022070403/56649f285503460f94c40175/html5/thumbnails/5.jpg)
Candidate options
DRAM 300K $120K+
Disk 250 $30+
Random reads/sec
Cost(128 GB)
Flash-SSD 10K* $225+
Random writes/sec
250
300K
5K*
Too slow Too
expensive
* Derived from latencies on Intel M-18 SSD in experiments
2.5 ops/sec/$
Slow writes
How to deal with slow writes of Flash SSD
+Price statistics from 2008-09
![Page 6: Cheap and Large CAMs for High Performance Data-Intensive Networked Systems Ashok Anand, Chitra Muthukrishnan, Steven Kappes, and Aditya Akella University](https://reader035.vdocument.in/reader035/viewer/2022070403/56649f285503460f94c40175/html5/thumbnails/6.jpg)
Our CLAM design
• New data structure “BufferHash” + Flash• Key features– Avoid random writes, and perform sequential writes
in a batch• Sequential writes are 2X faster than random writes (Intel
SSD)• Batched writes reduce the number of writes going to Flash
– Bloom filters for optimizing lookups
BufferHash performs orders of magnitude better than DRAM based traditional hash tables in ops/sec/$
![Page 7: Cheap and Large CAMs for High Performance Data-Intensive Networked Systems Ashok Anand, Chitra Muthukrishnan, Steven Kappes, and Aditya Akella University](https://reader035.vdocument.in/reader035/viewer/2022070403/56649f285503460f94c40175/html5/thumbnails/7.jpg)
Outline
• Background and motivation
• CLAM design– Key operations (insert, lookup, update)– Eviction– Latency analysis and performance tuning
• Evaluation
![Page 8: Cheap and Large CAMs for High Performance Data-Intensive Networked Systems Ashok Anand, Chitra Muthukrishnan, Steven Kappes, and Aditya Akella University](https://reader035.vdocument.in/reader035/viewer/2022070403/56649f285503460f94c40175/html5/thumbnails/8.jpg)
Flash/SSD primer
• Random writes are expensive Avoid random page writes
• Reads and writes happen at the granularity of a flash page
I/O smaller than page should be avoided, if possible
![Page 9: Cheap and Large CAMs for High Performance Data-Intensive Networked Systems Ashok Anand, Chitra Muthukrishnan, Steven Kappes, and Aditya Akella University](https://reader035.vdocument.in/reader035/viewer/2022070403/56649f285503460f94c40175/html5/thumbnails/9.jpg)
Conventional hash table on Flash/SSD
Flash
Keys are likely to hash to random locations
Random writes
SSDs: FTL handles random writes to some extent;But garbage collection overhead is high
~200 lookups/sec and ~200 inserts/sec with WAN optimizer workload, << 10 K/s and 5 K/s
![Page 10: Cheap and Large CAMs for High Performance Data-Intensive Networked Systems Ashok Anand, Chitra Muthukrishnan, Steven Kappes, and Aditya Akella University](https://reader035.vdocument.in/reader035/viewer/2022070403/56649f285503460f94c40175/html5/thumbnails/10.jpg)
Conventional hash table on Flash/SSD
DRAM
Flash
Can’t assume locality in requests – DRAM as cache won’t work
![Page 11: Cheap and Large CAMs for High Performance Data-Intensive Networked Systems Ashok Anand, Chitra Muthukrishnan, Steven Kappes, and Aditya Akella University](https://reader035.vdocument.in/reader035/viewer/2022070403/56649f285503460f94c40175/html5/thumbnails/11.jpg)
Our approach: Buffering insertions
• Control the impact of random writes• Maintain small hash table (buffer) in memory • As in-memory buffer gets full, write it to flash
– We call in-flash buffer, incarnation of buffer
Incarnation: In-flash hash table
Buffer: In-memory hash table
DRAM Flash SSD
![Page 12: Cheap and Large CAMs for High Performance Data-Intensive Networked Systems Ashok Anand, Chitra Muthukrishnan, Steven Kappes, and Aditya Akella University](https://reader035.vdocument.in/reader035/viewer/2022070403/56649f285503460f94c40175/html5/thumbnails/12.jpg)
Two-level memory hierarchyDRAM
Flash
Buffer
Incarnation table
Incarnation
1234
Net hash table is: buffer + all incarnations
Oldest incarnation
Latest incarnation
![Page 13: Cheap and Large CAMs for High Performance Data-Intensive Networked Systems Ashok Anand, Chitra Muthukrishnan, Steven Kappes, and Aditya Akella University](https://reader035.vdocument.in/reader035/viewer/2022070403/56649f285503460f94c40175/html5/thumbnails/13.jpg)
Lookups are impacted due to buffers
DRAM
Flash
Buffer
Incarnation table
Lookup key
In-flash look ups
Multiple in-flash lookups. Can we limit to only one?
4 3 2 1
![Page 14: Cheap and Large CAMs for High Performance Data-Intensive Networked Systems Ashok Anand, Chitra Muthukrishnan, Steven Kappes, and Aditya Akella University](https://reader035.vdocument.in/reader035/viewer/2022070403/56649f285503460f94c40175/html5/thumbnails/14.jpg)
Bloom filters for optimizing lookups
DRAM
Flash
Buffer
Incarnation table
Lookup keyBloom filters
In-memory look ups
False positive!
Configure carefully! 4 3 2 1
2 GB Bloom filters for 32 GB Flash for false positive rate < 0.01!
![Page 15: Cheap and Large CAMs for High Performance Data-Intensive Networked Systems Ashok Anand, Chitra Muthukrishnan, Steven Kappes, and Aditya Akella University](https://reader035.vdocument.in/reader035/viewer/2022070403/56649f285503460f94c40175/html5/thumbnails/15.jpg)
Update: naïve approachDRAM
Flash
Buffer
Incarnation table
Bloom filtersUpdate key
Update keyExpensive random writes
Discard this naïve approach
4 3 2 1
![Page 16: Cheap and Large CAMs for High Performance Data-Intensive Networked Systems Ashok Anand, Chitra Muthukrishnan, Steven Kappes, and Aditya Akella University](https://reader035.vdocument.in/reader035/viewer/2022070403/56649f285503460f94c40175/html5/thumbnails/16.jpg)
Lazy updatesDRAM
Flash
Buffer
Incarnation table
Bloom filtersUpdate key
Insert key
4 3 2 1
Lookups check latest incarnations first
Key, new value
Key, old value
![Page 17: Cheap and Large CAMs for High Performance Data-Intensive Networked Systems Ashok Anand, Chitra Muthukrishnan, Steven Kappes, and Aditya Akella University](https://reader035.vdocument.in/reader035/viewer/2022070403/56649f285503460f94c40175/html5/thumbnails/17.jpg)
Eviction for streaming apps
• Eviction policies may depend on application– LRU, FIFO, Priority based eviction, etc.
• Two BufferHash primitives– Full Discard: evict all items
• Naturally implements FIFO
– Partial Discard: retain few items• Priority based eviction by retaining high priority items
• BufferHash best suited for FIFO– Incarnations arranged by age– Other useful policies at some additional cost
• Details in paper
![Page 18: Cheap and Large CAMs for High Performance Data-Intensive Networked Systems Ashok Anand, Chitra Muthukrishnan, Steven Kappes, and Aditya Akella University](https://reader035.vdocument.in/reader035/viewer/2022070403/56649f285503460f94c40175/html5/thumbnails/18.jpg)
Issues with using one buffer
• Single buffer in DRAM– All operations and
eviction policies
• High worst case insert latency– Few seconds for 1
GB buffer– New lookups stall
DRAM
Flash
Buffer
Incarnation table
Bloom filters
4 3 2 1
![Page 19: Cheap and Large CAMs for High Performance Data-Intensive Networked Systems Ashok Anand, Chitra Muthukrishnan, Steven Kappes, and Aditya Akella University](https://reader035.vdocument.in/reader035/viewer/2022070403/56649f285503460f94c40175/html5/thumbnails/19.jpg)
Partitioning buffers
• Partition buffers– Based on first few bits
of key space– Size > page
• Avoid i/o less than page
– Size >= block• Avoid random page
writes
• Reduces worst case latency
• Eviction policies apply per buffer
DRAM
Flash
Incarnation table
4 3 2 1
0 XXXXX 1 XXXXX
![Page 20: Cheap and Large CAMs for High Performance Data-Intensive Networked Systems Ashok Anand, Chitra Muthukrishnan, Steven Kappes, and Aditya Akella University](https://reader035.vdocument.in/reader035/viewer/2022070403/56649f285503460f94c40175/html5/thumbnails/20.jpg)
BufferHash: Putting it all together
• Multiple buffers in memory• Multiple incarnations per buffer in flash• One in-memory bloom filter per incarnation
DRAM
Flash
Buffer 1 Buffer K. . . .
Net hash table = all buffers + all incarnations
![Page 21: Cheap and Large CAMs for High Performance Data-Intensive Networked Systems Ashok Anand, Chitra Muthukrishnan, Steven Kappes, and Aditya Akella University](https://reader035.vdocument.in/reader035/viewer/2022070403/56649f285503460f94c40175/html5/thumbnails/21.jpg)
Outline
• Background and motivation
• Our CLAM design– Key operations (insert, lookup, update)– Eviction– Latency analysis and performance tuning
• Evaluation
![Page 22: Cheap and Large CAMs for High Performance Data-Intensive Networked Systems Ashok Anand, Chitra Muthukrishnan, Steven Kappes, and Aditya Akella University](https://reader035.vdocument.in/reader035/viewer/2022070403/56649f285503460f94c40175/html5/thumbnails/22.jpg)
Latency analysis
• Insertion latency – Worst case size of buffer – Average case is constant for buffer > block size
• Lookup latency– Average case Number of incarnations – Average case False positive rate of bloom filter
![Page 23: Cheap and Large CAMs for High Performance Data-Intensive Networked Systems Ashok Anand, Chitra Muthukrishnan, Steven Kappes, and Aditya Akella University](https://reader035.vdocument.in/reader035/viewer/2022070403/56649f285503460f94c40175/html5/thumbnails/23.jpg)
Parameter tuning: Total size of Buffers
.
. .
.
Total size of buffers = B1 + B2 + … + BN
Too small is not optimalToo large is not optimal eitherOptimal = 2 * SSD/entry
DRAM
Flash
Given fixed DRAM, how much allocated to buffers
B1 BN
# Incarnations = (Flash size/Total buffer size)
Lookup #Incarnations * False positive rate
False positive rate increases as the size of bloom filters decrease
Total bloom filter size = DRAM – total size of buffers
![Page 24: Cheap and Large CAMs for High Performance Data-Intensive Networked Systems Ashok Anand, Chitra Muthukrishnan, Steven Kappes, and Aditya Akella University](https://reader035.vdocument.in/reader035/viewer/2022070403/56649f285503460f94c40175/html5/thumbnails/24.jpg)
Parameter tuning: Per-buffer size
Affects worst case insertion
What should be size of a partitioned buffer (e.g. B1) ?
.
. .
.
DRAM
Flash
B1 BN
Adjusted according to application requirement (128 KB – 1 block)
![Page 25: Cheap and Large CAMs for High Performance Data-Intensive Networked Systems Ashok Anand, Chitra Muthukrishnan, Steven Kappes, and Aditya Akella University](https://reader035.vdocument.in/reader035/viewer/2022070403/56649f285503460f94c40175/html5/thumbnails/25.jpg)
Outline
• Background and motivation
• Our CLAM design– Key operations (insert, lookup, update)– Eviction– Latency analysis and performance tuning
• Evaluation
![Page 26: Cheap and Large CAMs for High Performance Data-Intensive Networked Systems Ashok Anand, Chitra Muthukrishnan, Steven Kappes, and Aditya Akella University](https://reader035.vdocument.in/reader035/viewer/2022070403/56649f285503460f94c40175/html5/thumbnails/26.jpg)
Evaluation
• Configuration– 4 GB DRAM, 32 GB Intel SSD, Transcend SSD– 2 GB buffers, 2 GB bloom filters, 0.01 false positive
rate– FIFO eviction policy
![Page 27: Cheap and Large CAMs for High Performance Data-Intensive Networked Systems Ashok Anand, Chitra Muthukrishnan, Steven Kappes, and Aditya Akella University](https://reader035.vdocument.in/reader035/viewer/2022070403/56649f285503460f94c40175/html5/thumbnails/27.jpg)
BufferHash performance
• WAN optimizer workload– Random key lookups followed by inserts– Hit rate (40%)– Used workload from real packet traces also
• Comparison with BerkeleyDB (traditional hash table) on Intel SSDAverage latency BufferHash BerkeleyDB
Look up (ms) 0.06 4.6
Insert (ms) 0.006 4.8
Better lookups!
Better inserts!
![Page 28: Cheap and Large CAMs for High Performance Data-Intensive Networked Systems Ashok Anand, Chitra Muthukrishnan, Steven Kappes, and Aditya Akella University](https://reader035.vdocument.in/reader035/viewer/2022070403/56649f285503460f94c40175/html5/thumbnails/28.jpg)
Insert performance
0.001 0.01 0.1 1 10 100
BerkeleyDB
0.001 0.01 0.1 1 10 100
Bufferhash
0.2
0.40.6
0.81.0
CDF
Insert latency (ms) on Intel SSD
99% inserts < 0.1 ms
40% of inserts > 5 ms !
Random writes are slow! Buffering effect!
![Page 29: Cheap and Large CAMs for High Performance Data-Intensive Networked Systems Ashok Anand, Chitra Muthukrishnan, Steven Kappes, and Aditya Akella University](https://reader035.vdocument.in/reader035/viewer/2022070403/56649f285503460f94c40175/html5/thumbnails/29.jpg)
Lookup performance
0.001 0.01 0.1 1 10 100
Bufferhash
0.001 0.01 0.1 1 10 100
BerkeleyDB
0.20.40.60.81.0
CDF
99% of lookups < 0.2ms
40% of lookups > 5 ms
Garbage collection overhead due to writes!
60% lookups don’t go to Flash 0.15 ms Intel SSD latency
Lookup latency (ms) for 40% hit workload
![Page 30: Cheap and Large CAMs for High Performance Data-Intensive Networked Systems Ashok Anand, Chitra Muthukrishnan, Steven Kappes, and Aditya Akella University](https://reader035.vdocument.in/reader035/viewer/2022070403/56649f285503460f94c40175/html5/thumbnails/30.jpg)
Performance in Ops/sec/$
• 16K lookups/sec and 160K inserts/sec
• Overall cost of $400
• 42 lookups/sec/$ and 420 inserts/sec/$– Orders of magnitude better than 2.5 ops/sec/$ of
DRAM based hash tables
![Page 31: Cheap and Large CAMs for High Performance Data-Intensive Networked Systems Ashok Anand, Chitra Muthukrishnan, Steven Kappes, and Aditya Akella University](https://reader035.vdocument.in/reader035/viewer/2022070403/56649f285503460f94c40175/html5/thumbnails/31.jpg)
Other workloads
• Varying fractions of lookups• Results on Trancend SSD
Lookup fraction BufferHash BerkeleyDB0 0.007 ms 18.4 ms0.5 0.09 ms 10.3 ms1 0.12 ms 0.3 ms
• BufferHash ideally suited for write intensive workloads
Average latency per operation
![Page 32: Cheap and Large CAMs for High Performance Data-Intensive Networked Systems Ashok Anand, Chitra Muthukrishnan, Steven Kappes, and Aditya Akella University](https://reader035.vdocument.in/reader035/viewer/2022070403/56649f285503460f94c40175/html5/thumbnails/32.jpg)
Evaluation summary
• BufferHash performs orders of magnitude better in ops/sec/$ compared to traditional hashtables on DRAM (and disks)
• BufferHash is best suited for FIFO eviction policy– Other policies can be supported at additional cost, details in
paper
• WAN optimizer using Bufferhash can operate optimally at 200 Mbps, much better than 10 Mbps with BerkeleyDB– Details in paper
![Page 33: Cheap and Large CAMs for High Performance Data-Intensive Networked Systems Ashok Anand, Chitra Muthukrishnan, Steven Kappes, and Aditya Akella University](https://reader035.vdocument.in/reader035/viewer/2022070403/56649f285503460f94c40175/html5/thumbnails/33.jpg)
Related Work
• FAWN (Vasudevan et al., SOSP 2009)– Cluster of wimpy nodes with flash storage– Each wimpy node has its hash table in DRAM– We target…• Hash table much bigger than DRAM • Low latency as well as high throughput systems
• HashCache (Badam et al., NSDI 2009)– In-memory hash table for objects stored on disk
![Page 34: Cheap and Large CAMs for High Performance Data-Intensive Networked Systems Ashok Anand, Chitra Muthukrishnan, Steven Kappes, and Aditya Akella University](https://reader035.vdocument.in/reader035/viewer/2022070403/56649f285503460f94c40175/html5/thumbnails/34.jpg)
Conclusion
• We have designed a new data structure BufferHash for building CLAMs
• Our CLAM on Intel SSD achieves high ops/sec/$ for today’s data-intensive systems
• Our CLAM can support useful eviction policies
• Dramatically improves performance of WAN optimizers
![Page 35: Cheap and Large CAMs for High Performance Data-Intensive Networked Systems Ashok Anand, Chitra Muthukrishnan, Steven Kappes, and Aditya Akella University](https://reader035.vdocument.in/reader035/viewer/2022070403/56649f285503460f94c40175/html5/thumbnails/35.jpg)
Thank you
![Page 36: Cheap and Large CAMs for High Performance Data-Intensive Networked Systems Ashok Anand, Chitra Muthukrishnan, Steven Kappes, and Aditya Akella University](https://reader035.vdocument.in/reader035/viewer/2022070403/56649f285503460f94c40175/html5/thumbnails/36.jpg)
ANCS 2010:ACM/IEEE Symposium on
Architectures for Networking and Communications Systems
• Estancia La Jolla Hotel & Spa (near UCSD)• October 25-26, 2010• Paper registration & abstract: May 10, 2010• Submission deadline: May 17, 2010• http://www.ancsconf.org/
![Page 37: Cheap and Large CAMs for High Performance Data-Intensive Networked Systems Ashok Anand, Chitra Muthukrishnan, Steven Kappes, and Aditya Akella University](https://reader035.vdocument.in/reader035/viewer/2022070403/56649f285503460f94c40175/html5/thumbnails/37.jpg)
Backup slides
![Page 38: Cheap and Large CAMs for High Performance Data-Intensive Networked Systems Ashok Anand, Chitra Muthukrishnan, Steven Kappes, and Aditya Akella University](https://reader035.vdocument.in/reader035/viewer/2022070403/56649f285503460f94c40175/html5/thumbnails/38.jpg)
![Page 39: Cheap and Large CAMs for High Performance Data-Intensive Networked Systems Ashok Anand, Chitra Muthukrishnan, Steven Kappes, and Aditya Akella University](https://reader035.vdocument.in/reader035/viewer/2022070403/56649f285503460f94c40175/html5/thumbnails/39.jpg)
WAN optimizer using BufferHash
• With BerkeleyDB, throughput up to 10 Mbps
• With BufferHash, throughput up to 200 Mbps with Transcend SSD– 500 Mbps with Intel SSD
• At 10 Mbps, average throughput per object improves by 65% with BufferHash