a smart pre-classifier to reduce power consumption of tcams … · 2012-10-26 · definition...

Post on 19-Jul-2020

0 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

A Smart Pre-Classifier to Reduce Power

Consumption of TCAMs for

Multi-dimensional Packet Classification

Yadi Ma, Suman Banerjee

University of Wisconsin-Madison

Packet classification

R Internet

S1

S2

Subnet A Subnet B

D

From To Traffic type Action

S1 D Port 80 Forward via L1

S2 D * Drop all traffic

A B * Reserve 50 Mbps

L1

L2

Classifier at Router R

Definition

• Packet classification: given a classifier, find the first (highest priority)

matching rule for each incoming packet

• A classifier contains a set of rules ordered by priority

• Our focus: n-tuple classification

• Example classifier:

• Given a packet header: (32.75.226.153, 198.35.180.5, 80,1040, UDP)

Rule # Source IP Dest. IP Source Port Dest. Port Protocol Action

1 * 10.112.*.* 5001 - 65535 * TCP deny

2 32.75.226.153 * * 1001 - 2000 UDP deny

3 199.36.184.* * 49152 - 65535 * UDP deny

4 * * * * * permit

Packet classification schemes

• Software-based schemes– Tradeoff between memory usage and speed

– Examples: HiCuts, HyperCuts, EffiCuts, etc

• Hardware (TCAM)-based schemes– Popular for high-throughput packet classification

Problem Statement

• TCAMs are power-hungry

• Design a TCAM-based method that:

– Greatly reduces power consumption of TCAMs,

especially for large classifiers

– Uses commodity TCAMs

– Is easy to implement

Outline

Introduction and motivation

Design of SmartPC

– Algorithms to manage two-stage classification

Evaluation methods and results

Conclusion

Packet classification system for SmartPC

• Two-stage classification

– First stage: pre-classifier

– Second stage: two parallel searches

Index TCAM

(Pre-classifier

entries)

Match

index

Index SRAM

TCAM

(Classifier

rules)

Associated SRAM

(priorities + actions)

“General” blocks

Priority

resolution

Action

“Specific”

block

How to build an efficient pre-classifier?

Pre-classifier

• How to build a pre-classifier?

– Built on two dimensions: source IP address

and destination IP addresses

– By expanding and combining two dimensional

rules recursively

• Also shuffle original rules into different

TCAM blocks accordingly

Why 5d to 2d is a good choice?

Maximum number of overlapping rules

in the two-dimensional space

• Analyze more than 200 real classifiers ranging in size from 3 to 15,181

Maximum number of overlapping rules is an order of magnitude smaller

than classifier size.

An example classifier containing 14 rules

Same example classifier containing 14 rules

272727

SmartPC

2

0

1

2 3/4

56

7

8

9 10

11/1

2/1

3

Dst_addr

Src_addr

P0

P1

0,1,5,6,8

P0,P1

TCAM

Pre-classifier

282828

SmartPC

0

1

2 3/4

56

7

8

9 10

11/1

2/1

3

Dst_addr

Src_addr

P0

P1

0,1,5,6,8 2, 3,4,9,10

P0,P1

Specific blocks

TCAM

Pre-classifier

292929

SmartPC

0

1

2 3/4

56

7

8

9 10

11/1

2/1

3

Dst_addr

Src_addr

P0

P1

0,1,5,6,8 2, 3,4,9,10

P0,P1

TCAM

Pre-classifierGeneral block

7,11,12,13

Specific blocks

353535

Example: how to build a pre-classifier

0

1

2 3/4

56

7

8

9 10

11/1

2/1

3

Dst_addr

Src_addr

P0

P0

0

2

363636

Example: how to build a pre-classifier

0

1

2 3/4

56

7

8

9 10

11/1

2/1

3

Dst_addr

Src_addr

P0

P0

0

2

, 1

373737

Example: how to build a pre-classifier

0

1

2 3/4

56

7

8

9 10

11/1

2/1

3

Dst_addr

Src_addr

P0

P0

0

2

, 1

383838

Example: how to build a pre-classifier

0

1

2 3/4

56

7

8

9 10

11/1

2/1

3

Dst_addr

Src_addr

P0

P0

0

2

, 1, 5, 6

393939

Example: how to build a pre-classifier

0

1

2 3/4

56

7

8

9 10

11/1

2/1

3

Dst_addr

Src_addr

P0

P0

0

2

, 1, 5, 6

7

404040

Example: how to build a pre-classifier

0

1

2 3/4

56

7

8

9 10

11/1

2/1

3

Dst_addr

Src_addr

P0

P0

0

2

, 1, 5, 6

7

, 8

414141

Example: how to build a pre-classifier

0

1

2 3/4

56

7

8

9 10

11/1

2/1

3

Dst_addr

Src_addr

P0

P0

0

2

, 1, 5, 6

7 ,11,12,13

, 8

424242

Example: how to build a pre-classifier

0

1

2 3/4

56

7

8

9 10

11/1

2/1

3

Dst_addr

Src_addr

P0

P0

0

2

, 1, 5, 6

7 ,11,12,13

, 8

P1

, P1

434343

Example: how to build a pre-classifier

0

1

2 3/4

56

7

8

9 10

11/1

2/1

3

Dst_addr

Src_addr

P0

P0

0 , 1, 5, 6

7 ,11,12,13

, 8

P1

2, 3,4,9,10

, P1

Specific blocks

General blockPre-classifier

packet

444444

Index TCAM

(Pre-classifier

entries)

Match

index

Incoming

packet

Index SRAM

0, 1, 5, 6, 8

7, 11, 12, 13

TCAM

(Classifier

rules)

Associated SRAM

(priorities + actions)

General block(s)

1, acceptPriority

resolution

accept

7, deny

0

1

1

P0

P1 2 ,3, 4, 9, 10Specific

block

.

.

..

.

.

Packet classification system for SmartPC

0, 1, 5, 6, 8

7, 11, 12, 13

1, accept

7, deny

Properties of pre-classifiers

• Entries in a pre-classifier are non-overlapping

• Each rule in a classifier is either covered by only

one pre-classifier entry, or marked as general

Rule update

• Rule update overhead of SmartPC is generally smaller

than that of regular TCAMs

• The ordering of TCAM entries is kept within one specific

block or within a small number of general blocks, rather

than throughout all the blocks

• Rule update

– Insert a rule

– Delete a rule

Outline

Introduction and motivation

Design of SmartPC

– Algorithms to manage two-stage classification

Evaluation methods and results

Conclusion

Experimental setup (1)

• Summary of classifiers

Name Size MaxOveralps Wildcard

S1 9802 22 4

S2 9416 126 57

S3 9497 76 18

S4 9624 82 12

S5 7255 28 0

S6 99823 27 5

S7 87039 249 79

S8 99836 89 47

S9 99866 81 38

S10 99220 10 0

10 real classifiers 10 synthetic classifiers

Name Size MaxOveralps Wildcard

R1 5233 49 18

R2 5626 63 32

R3 5874 98 48

R4 6339 47 16

R5 7356 38 5

R6 8063 64 35

R7 8475 31 4

R8 10054 1 0

R9 11574 334 271

R10 15181 177 143

Experimental setup (2)

• Block size of TCAMs – Evaluated various sizes: 32, 64, 128, 256, 512 and 1024, respectively.

• Metric– Power reductions

• Percentage of reductions on activated blocks

– Storage overhead of pre-classifier entries

• Percentage of pre-classifier size compared to the size of a whole classifier

• Schemes– SmartPC

– Default TCAM (without SmartPC)

– A naïve scheme named Naive-divide

Power reductions

With block size 128, the median and average power reductions

are 91% and 88%, respectively

Real classifiers Synthetic classifiers

Percentage of power reductions vs. TCAM block size

Storage overhead

Real classifiers Synthetic classifiers

Small storage overhead, less than 4% for every

classifier.

Fraction of storage overhead vs. TCAM block size

Comparison of SmartPC with Naïve-divide

Real classifiers Synthetic classifiers

SmartPC outperforms naïve-divide by more than

20% on average.

Percentage of power reductions with block size 128

Discussion

• Effect of prefix distribution and prefix length

• Power reduction on small classifiers

• Power reduction on IPv6 classifiers

Conclusion

Uses commodity TCAMs

Is easy to implement

Greatly reduces power consumptions of

TCAMs, especially for larger classifiers

• Propose SmartPC, which:

Questions

Thanks

Backup slides

Prior work on Packet Classification

• Software-based approaches

– Examples: HiCuts, HyperCuts, EffiCuts, etc

• TCAM-based approaches

– High speed but suffer from some deficiencies such as

high power consumption

– Schemes for power efficiency:

• CoolCAMs (INFOCOM 2003): reduce power consumption of

TCAMs, but limited to IP forwarding

• Extended TCAMs (ICNP 2003): requires a new type of TCAM

that returns multiple matches

• Significant recent work within companies and are of

proprietary nature

Number of blocks activated vs. block

sizeR1 R9

S4 S10

Observations

• TCAMs

– The main component of power consumption in TCAMs

is proportional to the number of searched entries

– Hardware supports turning on a small number of blocks

– Hardware supports multiple searches simultaneously, such as

Cisco’s TCAM4

• Classifiers– For each incoming packet, often only a small number of

matching rules in a classifier need to be searched

http://www.cisco.com/en/US/prod/collateral/switches/ps5718/ps4324/prod_white_paper0900aecd806dc821.html

Some stats

• A 2006 report reported: – Data centers in U.S. today consume about 61 billion kWh (1.5%

of total U.S. electricity consumption) for a total electricity cost of about $4.5 billion

– National energy consumption by servers and data centers could nearly double by 2011 to more than 100 billion kWh

• According to a Sigcomm CCR 2008 paper, network consumes 10-20% of a data center's total power.

• With the growing sizes of classifiers, and the transition from IPv4 to IPv6, the high power consumption of TCAMs increases both power supply cost and cooling cost

Report to Congress on Server and Data Center Energy Efficiency by U.S. Environmental Protection Agency.

The cost of a cloud: research problems in data center networks in SIGCOMM CCR 2009

Properties of real classifiers

Maximum number of overlapping rules

in the two-dimensional space

Number of wildcard rules in

the two-dimensional space

• Analyze more than 200 real classifiers ranging in size

from 3 to 15,181

Reduce the five-dimensional problem to two-dimensional!

Pre-process a classifier

• Given a mutlti-dimensional classifier C containing a number of rules:– The two-dimensional space is divided into non-

overlapping rectangles. Each rectangle covers a cluster of rules and represents an entry in the pre-classifier P for C

– Shuffle rules in C such that each pre-classifier entry is associated with a TCAM block, named a specific block

– If the number of rules that intercept with a pre-classifier entry exceeds TCAM block size, those extra rules are stored in TCAM blocks named general block(s)

2, 3, 4, 16

5, 6, 7, 8, 9

11, 12, 13, 14, 15

Dst_addr

Src_addrGiven a classifier which contains 19 rules, block size = 5

1

2

3

4

5

7

8

9

6

10

13

11

14

12

15

19

P1

P2

P3

P1

P2

P3

16

17

18 1, 10, 17, 18, 19

Pre-process a classifier

2-dimensional

pre-classifiers entries

In TCAM block(s)

5-dimensional

classifier rules in

TCAM blocks

Specific blocks

General blocks

ResultKey

Expect huge power reduction on large classifiers

Pre-classifier

TCAM

Proposed solution: SmartPC

How to build an efficient pre-classifier?

top related