efficient packet classification with digest caches
DESCRIPTION
Francis Chang Wu-chang Feng Wu-chi Feng Kang Li. Efficient Packet Classification with Digest Caches. Packet classification. Essential in all network devices Routers, firewalls, NATs, Diffserv/QoS markers, etc. But, complexity increasing Number of rules Number of fields to classify - PowerPoint PPT PresentationTRANSCRIPT
Efficient Packet Classification with Digest Caches
Francis ChangWu-chang Feng
Wu-chi FengKang Li
Packet classificationEssential in all network devices
Routers, firewalls, NATs, Diffserv/QoS markers, etc.
But, complexity increasingNumber of rulesNumber of fields to classifySize of header (IPv6)Number of flows
Packet classificationPerformance-bound by memory
Must store and access large headers and many rules quicklyLookup algorithms perform better when given more memory
Classic space-time trade-off in performance
Supporting line speeds requires a large amount of fast memoryFast memory expensiveLarge memory slow
Probabilistic NetworkingGoal of work
Throw a wrench into space-time trade-offExamine a third axis: accuracy
Reduce memory requirements by relaxing the accuracy of packet classification function
What are the quantifiable benefits of sacrificing accuracy on the packet classification function?
What? Willingly make mistakes?Sure
Packet errors and lack of reliability are a fact of lifeMasked by application layer or ignoredLots of packets are bad, some are undetectably bad [Stone00]
TCP1 in 1100 to 32000 TCP packets fail checksum1 in 16 million to 10 billion TCP packets are UNDECTABLY bad
UDPUDP packets are not required to have valid cksumEven if the cksum is bad, OS will give the packet to the application (Linux)
Routing problems occur frequentlyTransient loops [Hengartner02]Outages
Several places to apply idea…Full classification
Exact multi-dimensional solutions still too costly [Baboescu03]Inaccuracy may help
Work in progress…
Classification cachesSpace requirements grow linearly with number of flows and fields Use lossy recall in remembering previous classification decisions to reduce cache size
Our current work..
Initial approachBloom filter
Approximate data structure to store flows matching a binary predicate
Spell checkersBrowser and web caches
How it worksL х N array of memory
Addressed by L independent hash functionsEach function addresses N buckets
Storing new flowsSet bits corresponding to the results of L hash functions on header
Looking up flowsCheck bits corresponding to the results of L hash functions on header
Collisions in filter cause inaccurate classifications
Francis Chang, Wu-chang Feng, Kang Li, “Approximate Caches for Packet Classification”, in Proceedings of INFOCOM ’04, March 2004.
Bloom filter
hL-1h1
Flow insertion
1
1
1
Unknown flow0
0
h0
0
1
2
N-1
NL virtual bins out of L*N actual bins
The value of making mistakesInitial results promising
Small, high-performance caches with 1 in a billion error rateStorage capacity invariant to header size and fields
Size of approximate cache determined by number of flows to store and desired accuracySize of exact cache determined by number of flows to store and header size and fields
IPv4-based connection identifier = 13 bytesIPv6-based connection identifier = 37 bytes
But…Some glaring disadvantages
Large number of levels and memory lookups requiredNot amenable to most NP architectures
Requires hardware support and parallel, bit-level memory addressing
Aging propertiesCan not gracefully age cacheNo selective replacement policies possible (i.e. LRU)Must periodically expunge entire cache
Results in large variance in full classifications required
New approachDigest caches
Use a traditional cache architectureStore and use a digest of classification fields instead of full header(s)
Digest cachesHow it works
Upon full classification of packet header fields (P)Calculate h1(P) and h2(P)
Use h1(P) to select cache line
Insert h2(P) and classification result into cache line
Subsequent packetsCalculate h1(P) and h2(P)
Use h1(P) to select cache line
Lookup h2(P) in cache lineIf match, follow cached resultIf no match, perform full classification
Misclassification caused by hash-signature collisionsIncreases as the number of bits in digest decreases (c)Increases as the associativity of cache increases (d)
c
dp
2
Digest cachesFixes all of the problems of Bloom filter caches
Less memory accessesNP-friendly
Does not require parallel, bit-addressable memory accessCan alleviate need for associative hardware (more later)
Gracefully agesCan smoothly remove old entries
Storage comparison between approaches
82
dp
322
dp
9101 p
0p
0p
EvaluationTrace-driven simulation
PCCS simulator (http://pccs.sourceforge.net)Packet traces
Bell Labs traceOne hour trace at Bell Labs Research in May 2002
OGI traceOne hour trace of OGI’s OC-3c link on July 26, 2002
Choosing associativityExperiment
Fixed misclassification probability of 10–9
Variable bit digest based on associativity
Results similar to previous studiesSmall amount of associativity ideal for performance
Comparing approachesDigest cache
32-bit digests4-way associativity
Bloom filter cache Optimal, 30-level filter
Exact caches IPv4 and IPv6 flow caches
930
1012
1 p
0p
9101 p
Hit rates vs. cache size
Miss-rate variance vs. cache size
NP implementationIXP1200, L3Forwarder
4-way associative digest cache803Mbs
Bloom filter cache1 level = 990 Mbs4 levels = 652 Mbs
A final note to those who hate being wrong Can be used to accelerate exact caches
ConsiderExact cache where where associativity is emulatedEntire cache line must be read sequentially to find match
Digest cache accelerationUse smaller, digest cache stored in fastest (possibly associative) memory to mirror entries in exact cacheLookup in digest cache gives exact location of relevant entry in exact cache
Good for implementing associative caches on NPs that do not have hardware support
Speed-up analysis in paper
Questions?
Extra slides
Misclassification rates for digest caches
4-way associative digest cache
Cache misses over time