A Smart Pre-Classifier to Reduce Power
Consumption of TCAMs for
Multi-dimensional Packet Classification
Yadi Ma, Suman Banerjee
University of Wisconsin-Madison
Packet classification
R Internet
S1
S2
Subnet A Subnet B
D
From To Traffic type Action
S1 D Port 80 Forward via L1
S2 D * Drop all traffic
A B * Reserve 50 Mbps
L1
L2
Classifier at Router R
Definition
• Packet classification: given a classifier, find the first (highest priority)
matching rule for each incoming packet
• A classifier contains a set of rules ordered by priority
• Our focus: n-tuple classification
• Example classifier:
• Given a packet header: (32.75.226.153, 198.35.180.5, 80,1040, UDP)
Rule # Source IP Dest. IP Source Port Dest. Port Protocol Action
1 * 10.112.*.* 5001 - 65535 * TCP deny
2 32.75.226.153 * * 1001 - 2000 UDP deny
3 199.36.184.* * 49152 - 65535 * UDP deny
4 * * * * * permit
Packet classification schemes
• Software-based schemes– Tradeoff between memory usage and speed
– Examples: HiCuts, HyperCuts, EffiCuts, etc
• Hardware (TCAM)-based schemes– Popular for high-throughput packet classification
Problem Statement
• TCAMs are power-hungry
• Design a TCAM-based method that:
– Greatly reduces power consumption of TCAMs,
especially for large classifiers
– Uses commodity TCAMs
– Is easy to implement
Outline
Introduction and motivation
Design of SmartPC
– Algorithms to manage two-stage classification
Evaluation methods and results
Conclusion
Packet classification system for SmartPC
• Two-stage classification
– First stage: pre-classifier
– Second stage: two parallel searches
Index TCAM
(Pre-classifier
entries)
Match
index
Index SRAM
TCAM
(Classifier
rules)
Associated SRAM
(priorities + actions)
“General” blocks
Priority
resolution
Action
“Specific”
block
How to build an efficient pre-classifier?
Pre-classifier
• How to build a pre-classifier?
– Built on two dimensions: source IP address
and destination IP addresses
– By expanding and combining two dimensional
rules recursively
• Also shuffle original rules into different
TCAM blocks accordingly
Why 5d to 2d is a good choice?
Maximum number of overlapping rules
in the two-dimensional space
• Analyze more than 200 real classifiers ranging in size from 3 to 15,181
Maximum number of overlapping rules is an order of magnitude smaller
than classifier size.
An example classifier containing 14 rules
Same example classifier containing 14 rules
272727
SmartPC
2
0
1
2 3/4
56
7
8
9 10
11/1
2/1
3
Dst_addr
Src_addr
P0
P1
0,1,5,6,8
P0,P1
TCAM
Pre-classifier
282828
SmartPC
0
1
2 3/4
56
7
8
9 10
11/1
2/1
3
Dst_addr
Src_addr
P0
P1
0,1,5,6,8 2, 3,4,9,10
P0,P1
Specific blocks
TCAM
Pre-classifier
292929
SmartPC
0
1
2 3/4
56
7
8
9 10
11/1
2/1
3
Dst_addr
Src_addr
P0
P1
0,1,5,6,8 2, 3,4,9,10
P0,P1
TCAM
Pre-classifierGeneral block
7,11,12,13
Specific blocks
353535
Example: how to build a pre-classifier
0
1
2 3/4
56
7
8
9 10
11/1
2/1
3
Dst_addr
Src_addr
P0
P0
0
2
363636
Example: how to build a pre-classifier
0
1
2 3/4
56
7
8
9 10
11/1
2/1
3
Dst_addr
Src_addr
P0
P0
0
2
, 1
373737
Example: how to build a pre-classifier
0
1
2 3/4
56
7
8
9 10
11/1
2/1
3
Dst_addr
Src_addr
P0
P0
0
2
, 1
383838
Example: how to build a pre-classifier
0
1
2 3/4
56
7
8
9 10
11/1
2/1
3
Dst_addr
Src_addr
P0
P0
0
2
, 1, 5, 6
393939
Example: how to build a pre-classifier
0
1
2 3/4
56
7
8
9 10
11/1
2/1
3
Dst_addr
Src_addr
P0
P0
0
2
, 1, 5, 6
7
404040
Example: how to build a pre-classifier
0
1
2 3/4
56
7
8
9 10
11/1
2/1
3
Dst_addr
Src_addr
P0
P0
0
2
, 1, 5, 6
7
, 8
414141
Example: how to build a pre-classifier
0
1
2 3/4
56
7
8
9 10
11/1
2/1
3
Dst_addr
Src_addr
P0
P0
0
2
, 1, 5, 6
7 ,11,12,13
, 8
424242
Example: how to build a pre-classifier
0
1
2 3/4
56
7
8
9 10
11/1
2/1
3
Dst_addr
Src_addr
P0
P0
0
2
, 1, 5, 6
7 ,11,12,13
, 8
P1
, P1
434343
Example: how to build a pre-classifier
0
1
2 3/4
56
7
8
9 10
11/1
2/1
3
Dst_addr
Src_addr
P0
P0
0 , 1, 5, 6
7 ,11,12,13
, 8
P1
2, 3,4,9,10
, P1
Specific blocks
General blockPre-classifier
packet
444444
Index TCAM
(Pre-classifier
entries)
Match
index
Incoming
packet
Index SRAM
0, 1, 5, 6, 8
7, 11, 12, 13
TCAM
(Classifier
rules)
Associated SRAM
(priorities + actions)
General block(s)
1, acceptPriority
resolution
accept
7, deny
0
1
1
P0
P1 2 ,3, 4, 9, 10Specific
block
.
.
..
.
.
Packet classification system for SmartPC
0, 1, 5, 6, 8
7, 11, 12, 13
1, accept
7, deny
Properties of pre-classifiers
• Entries in a pre-classifier are non-overlapping
• Each rule in a classifier is either covered by only
one pre-classifier entry, or marked as general
Rule update
• Rule update overhead of SmartPC is generally smaller
than that of regular TCAMs
• The ordering of TCAM entries is kept within one specific
block or within a small number of general blocks, rather
than throughout all the blocks
• Rule update
– Insert a rule
– Delete a rule
Outline
Introduction and motivation
Design of SmartPC
– Algorithms to manage two-stage classification
Evaluation methods and results
Conclusion
Experimental setup (1)
• Summary of classifiers
Name Size MaxOveralps Wildcard
S1 9802 22 4
S2 9416 126 57
S3 9497 76 18
S4 9624 82 12
S5 7255 28 0
S6 99823 27 5
S7 87039 249 79
S8 99836 89 47
S9 99866 81 38
S10 99220 10 0
10 real classifiers 10 synthetic classifiers
Name Size MaxOveralps Wildcard
R1 5233 49 18
R2 5626 63 32
R3 5874 98 48
R4 6339 47 16
R5 7356 38 5
R6 8063 64 35
R7 8475 31 4
R8 10054 1 0
R9 11574 334 271
R10 15181 177 143
Experimental setup (2)
• Block size of TCAMs – Evaluated various sizes: 32, 64, 128, 256, 512 and 1024, respectively.
• Metric– Power reductions
• Percentage of reductions on activated blocks
– Storage overhead of pre-classifier entries
• Percentage of pre-classifier size compared to the size of a whole classifier
• Schemes– SmartPC
– Default TCAM (without SmartPC)
– A naïve scheme named Naive-divide
Power reductions
With block size 128, the median and average power reductions
are 91% and 88%, respectively
Real classifiers Synthetic classifiers
Percentage of power reductions vs. TCAM block size
Storage overhead
Real classifiers Synthetic classifiers
Small storage overhead, less than 4% for every
classifier.
Fraction of storage overhead vs. TCAM block size
Comparison of SmartPC with Naïve-divide
Real classifiers Synthetic classifiers
SmartPC outperforms naïve-divide by more than
20% on average.
Percentage of power reductions with block size 128
Discussion
• Effect of prefix distribution and prefix length
• Power reduction on small classifiers
• Power reduction on IPv6 classifiers
Conclusion
Uses commodity TCAMs
Is easy to implement
Greatly reduces power consumptions of
TCAMs, especially for larger classifiers
• Propose SmartPC, which:
Questions
Thanks
Backup slides
Prior work on Packet Classification
• Software-based approaches
– Examples: HiCuts, HyperCuts, EffiCuts, etc
• TCAM-based approaches
– High speed but suffer from some deficiencies such as
high power consumption
– Schemes for power efficiency:
• CoolCAMs (INFOCOM 2003): reduce power consumption of
TCAMs, but limited to IP forwarding
• Extended TCAMs (ICNP 2003): requires a new type of TCAM
that returns multiple matches
• Significant recent work within companies and are of
proprietary nature
Number of blocks activated vs. block
sizeR1 R9
S4 S10
Observations
• TCAMs
– The main component of power consumption in TCAMs
is proportional to the number of searched entries
– Hardware supports turning on a small number of blocks
– Hardware supports multiple searches simultaneously, such as
Cisco’s TCAM4
• Classifiers– For each incoming packet, often only a small number of
matching rules in a classifier need to be searched
http://www.cisco.com/en/US/prod/collateral/switches/ps5718/ps4324/prod_white_paper0900aecd806dc821.html
Some stats
• A 2006 report reported: – Data centers in U.S. today consume about 61 billion kWh (1.5%
of total U.S. electricity consumption) for a total electricity cost of about $4.5 billion
– National energy consumption by servers and data centers could nearly double by 2011 to more than 100 billion kWh
• According to a Sigcomm CCR 2008 paper, network consumes 10-20% of a data center's total power.
• With the growing sizes of classifiers, and the transition from IPv4 to IPv6, the high power consumption of TCAMs increases both power supply cost and cooling cost
Report to Congress on Server and Data Center Energy Efficiency by U.S. Environmental Protection Agency.
The cost of a cloud: research problems in data center networks in SIGCOMM CCR 2009
Properties of real classifiers
Maximum number of overlapping rules
in the two-dimensional space
Number of wildcard rules in
the two-dimensional space
• Analyze more than 200 real classifiers ranging in size
from 3 to 15,181
Reduce the five-dimensional problem to two-dimensional!
Pre-process a classifier
• Given a mutlti-dimensional classifier C containing a number of rules:– The two-dimensional space is divided into non-
overlapping rectangles. Each rectangle covers a cluster of rules and represents an entry in the pre-classifier P for C
– Shuffle rules in C such that each pre-classifier entry is associated with a TCAM block, named a specific block
– If the number of rules that intercept with a pre-classifier entry exceeds TCAM block size, those extra rules are stored in TCAM blocks named general block(s)
2, 3, 4, 16
5, 6, 7, 8, 9
11, 12, 13, 14, 15
Dst_addr
Src_addrGiven a classifier which contains 19 rules, block size = 5
1
2
3
4
5
7
8
9
6
10
13
11
14
12
15
19
P1
P2
P3
P1
P2
P3
16
17
18 1, 10, 17, 18, 19
Pre-process a classifier
2-dimensional
pre-classifiers entries
In TCAM block(s)
5-dimensional
classifier rules in
TCAM blocks
Specific blocks
General blocks
ResultKey
Expect huge power reduction on large classifiers
Pre-classifier
TCAM
Proposed solution: SmartPC
How to build an efficient pre-classifier?