cams for comms

8/3/2019 Cams for Comms

1/16

Page1of16

CAMs for COMMs

Content Addressable Memories for Data Communication Applications

Introduction

The primary purpose of data communication switch is to route packets to their

appropriate destinations. This involves searching through routing tables. A routing table

is a table constructed of addresses and associated information entries of addressing

data required for the routing of the packets to their destined ports. An entry in such a

routing table, corresponding to a certain address, provides the switch with someassociated information for a decision of how to route the packets.

Searching through the routing table should ideally be accomplished within the

timeframe it takes to read the packet off the link, or, if cut-through switching is

exercised, where the head of the packet is routed out before the tail arrives, the time it

takes to read the address fields of the header off the incoming link. As bandwidth and

switching speeds increase, the time allocated for implementation of the lookup

procedure is reduced to the point where a software or Random Access Memory (RAM)-

based approach is not fast enough.

Taking advantage of the inherent parallelism of Content Addressable Memory (CAM) is

evident since it offers low latency over a wide variety of address structures. The two

most common search-intensive tasks that use CAMs are packet forwarding and packet

classification in Internet routers. Other applications include processors' cache memory,

Translation Look-aside Buffers (TLB), lossless data compression applications, database

accelerators, and neural networks.

What is a CAM?

Most memory devices store and retrieve data by addressing specific memory locations.

Finding specific data patterns within a standard RAM often becomes the bottleneck for

systems that rely on fast memory access due to the fact that it requires several

accessing cycles. The time required to find specific data stored in memory could be

reduced considerably if the stored data can be identified for access by the content of

the data itself rather than by its address. Memory that is accessed in this way is called

CAM.


2/16

Page2of16

Figure 1: RAM Vs. CAM

CAM provides significant performance advantages compared with other memory search

algorithms such as binary and tree-based searches or look-aside tag buffers, bycomparing the desired content against all pre-stored entries simultaneously, frequently

resulting in an order-of-magnitude reduction of search time. Thus, CAM is hardware,

associative search engine, much faster than algorithmic approaches for search-intensive

applications.

CAMs are composed of conventional semiconductor memory, usually Static RAM

(SRAM), with added comparison circuitry that enables a search operation to complete

in a single clock cycle.

CAM is ideally suited for several applications, including Ethernet address lookup, datacompression, pattern-recognition, cache tags, high-bandwidth address filtering, and fast

lookup of routing, high-bandwidth address filtering, user privilege, security, or

encryption information on a packet-by-packet basis for high-performance data switches,

firewalls, bridges, and routers.

CAM Basics

Basic Build ing Block The RAM

Since CAMs are derivative of the RAM technology, explaining the CAM technology isbased on the RAM.

Note: Although possible, implementation of the CAM, based on Dynamic RAM (DRAM),

is not popular mainly due to the refreshing required for DRAMs, which reduces the

device's throughput performances. Therefore, the following explanation refers to SRAM.

RAM is an integrated circuit that stores data temporarily in a matrix fashion. Data is

stored in RAM at a particular location, which is called an address the combination of

its column and row address (interaction).


3/16

Page3of16

Figure 2: SRAM Architecture

In RAM, the user presents the address on the address lines (bus) and the memory

outputs the data stored in that address.

The number of address lines dictates the depth of the memory, but the width of the

memory can theoretically be extended as far as desired.

A single-bit SRAM cell is presented in Figure 3

Figure 3: Single-Bit SRAM Cell


4/16

Page4of16

Transistors T1 and T2 forms a flip-flop circuit which is the basic storage device.

Transistors T3 and T4 forms constant current sources which are the flip-flop's

transistors loads.

To access the cell for read or write operation, two circuits are added:

Figure 4: Single-Bit SRAM Cell with Addressing

Once a valid address is applied, transistors T5 and T6 transfers the cell's data to (in

case of read) or from (in case of write) sense amplifiers.

The sense amplifiers translate the signal from dual-ended (differential) to single-ended,

or vise-versa. These amplifiers are common to all columns of the memory array.

The circuit presented in above Figure 4 forms the basic SRAM Single cell circuit.

Memory Organization

Combining several single bit circuits (as presented in Figure 4) forms a memory word.

Combining several memory words forms the memory array. The organization of the

memory component is, in fact, the combination of the circuits in the above Figure 2 and

Figure 4, and presented below:


5/16

Page5of16

Figure 5: Memory Organization

Using SRAM Cell to Build a CAM Cell

To use the above described SRAM cell as CAM cell, three transistors need to be added,

as presented in Figure 6 below.

These additional transistors compare the outputs of the SRAM cell (the stored data bit)

to a coparand (search key) provided via the data bus through the Write sense amplifiers

(thus turning the bit lines into search lines).

All the cells' output transistors (like: T9) are wired-NOR together so that when all bits of

the search key equals the content of the memory word the Match signal line is

pulled-down to logic "0", generating a True (inverted logic) signal at the output of the

CAM.


6/16

Page6of16

Figure 6: Single-Bit CAM Cell

With CAM, the user presents the data and receives a match signal (True/False, which is

equal to match/ mismatch), sometimes with additional information (for example:

address where the match was found or some additional associated data, as presented

in Figure 7 below).

The CAM searches through the memory within a single clock cycle and returns the

results.

The CAM can be pre-loaded with its database at device startup and re-written during

device operation.

The CAM can accelerate any application requiring fast searches of databases, lists, or

patterns, such as in image or voice recognition, or computer and communication

designs.

For this reason, CAM is used in applications where search time is critical and must be

very fast. For example, the search key could be the Internet Protocol (IP) address of a

network user, and the associated information could be a users access privileges and

location on the network.

If the search key presented to the CAM is stored in the CAMs table, the CAM indicates a

match and returns the associated information, which consists of the users privileges.


7/16


8/16

Page8of16

2. Ternary CAM supporting storage and search of triple-type bits: Zero, One,and Don't Care (0,1,X), thus adding flexibility to the search patterns.

Binary CAM

The above described circuit is the most basic form of CAM. It is also known as Binary

CAM (BCAM) a CAM which compares dual-type (0, 1) logic bits of the search key to

the content of the memory.

Each bit is compared for True or False (0 or 1) values and only when a precise match is

found, the Match signal is generated.

Ternary CAM

In some cases (for example: searching for patterns), only a partial match is required,

where only a few of the bits need to be precisely matched to the search key and the

rest to be considered Don't Care. For example a stored word of 10XX0 which will

match any of the four search words 10000, 10010, 10100, or 10110.

In these cases, the unattended (compared) bits should be masked (considered match-

in-any-case) and generate a match signal regardless of the actual data stored.

The added search flexibility comes at an additional cost, over BCAM, since the internal

memory cell must encode three possible states instead of the two.

This additional state is typically implemented by adding a mask bit to every memory

cell.

This is also known as Ternary CAM (TCAM) a CAM which compares triple-type (0,1

and X) logic bits of the search key to the content of the memory.

Figure 8 below presents the logic operation of the TCAM.


9/16

Page9of16

Figure 8: Ternary CAM Search and Mask Logic

Other Types of CAM

Throughout the last few years, where data communication equipment designers grew

to appreciate the advantages of CAM, many innovative new CAM types were suggested.

These include:

Pow er saving implementations due to the relatively high powerconsumption of the CAM circuit (see discussion on page 10), restrictions

regarding the array size, response speed and the abilities to integrate CAM cores

into ASICs, were imposed.


10/16

Page10of16

New technologies, circuit designs and architectures allow higher array densities,

higher speeds of operation and easier integration of CAM cores into PPs' ASIC

designs.

The most popular power saving scheme is the Bank-Selection scheme. In this

scheme, only a subset of the CAM is active in any given cycle and the high power

consumption search lines are shared between these banks.

Other technolog ies attempts are made to improve the CAM circuits withrespect to:

Minimizing the cell's footprint to allow higher array densities, by utilizingDRAM cell structure (a DRAM cell requires only four transistors to compare

with the above described 6/8 transistors SRAM cell), or single transistormemory cell implementation.

Other types of search logic the above described search mechanism isNOR-based logic. NAND- and XOR-based were suggested too.

Other Architectures special architectures were developed for specialpurposes. Among these, the following need to be mentioned:

Additional memory array(s) for associated data (mentioned above). CAMs for special applications like the Prefix CAM.(PCAM) a Ternary

CAM optimized for longest prefix matching tasks (IPv4 and other), and the

Label Encoded CAM (LECAM) a parallel packet classification CAMemploying some special algorithmic techniques with a modified CAM

architecture.

CAMs with cache memory (searching for recently used key searches) orwith pipelined hierarchical search scheme to speed up the CAM's search

operation.

And others.

CAM Application

CAMs are well suited to performing search operations and can be used to accelerate

any application ranging from Local Area Networks (LANs), database management, file-

storage management, pattern recognition, artificial intelligence, fully associative and

processor-specific cache memories, disk cache memories, and high-end data

communication devices like data switches and routers.

Typical data communication applications are: Virtual Path Identifier / Virtual Circuit

Identifier (VPI/VCI) translation in Asynchronous Transfer Mode (ATM) switches up to

OC12 (622 Mbps) data rates, packet forwarding, IP filtering and packet classification for


11/16


12/16

Page12of16

The network address, which can vary in size depending on the subnetconfiguration, and,

The host address, which occupies the remaining bits.Each subnet has a network mask that specifies which bits of the address are the

network address and which bits are the host address.

Routing is done by comparing against a routing table, which is maintained by the

router, which contains:

Each known destination network address, The associated network mask, and, The information needed to route packets to that destination.

Without a TCAM circuit, the router need to:

Compare the destination address of the packet to be routed with each entry inthe routing table,

Perform a logical AND with the network mask, and, Compare it with the network address.

If these are equal, the corresponding routing information is used to forward the packet.

Using a TCAM for the routing table makes the lookup process very efficient Theaddresses are stored using Don't Care for the host part of the address, so looking up

the destination address in the TCAM immediately retrieves the correct routing entry;

both the masking and comparison are done by the TCAM hardware circuits.

The above described Address Lookup function inspects the destination address of the

packet and selects an output port associated with that address. The list of destination

addresses of the router, and their corresponding output ports, is called the Routing

Table. An example of a simplified Routing Table is displayed in Table 1 below.

Line Number Address (Binary) Output Port

1 101XX A

2 0110X B

3 011XX C

4 10011 D

Table 1: Simplified Rou ting Table


13/16

Page13of16

All four entries in the above table are 5-bit words. Due to the X (Don't Care) bits, the

first three entries in Table 1 represent a range of input addresses. For example: the

entry on Line 1 indicates that all addresses in the range of 101002101112 areforwarded to port A. The router scans, for each incoming packet, its destination port in

the Address Lookup Table. For example: if the router receives a packet with incoming

address of 011012, the Address Lookup will yield matches of both Line 2 and Line 3 in

Table 1. Line 2 will be selected since it best defines the search key's bit pattern. This is

the indication that port B is the most direct route to the destination.

This lookup style is called longest-prefix matching and is required to implement the

most recent Internet Protocol (IP) networking standard. The routing parameters

determining the complexity of the implementation are:

Entry size Table size Search rate Table update rate

IPv4 protocol's addresses are 32-bits long while IPv6 protocol's addresses are 128-bits

long. Supplementary information like the source address and QoS information can

expand IPv6 Routing Table entry sizes to 288576 bits.

Terabit-class routers need to perform hundreds of millions of searches per second inaddition to thousands of routing table updates per second. Almost all algorithmic

approaches are too slow to keep up with such high-speed routing requirements. Only

hardware-based CAM can meet such requirements due to their high search throughput.

IP Filtering

An IP filter is a security feature that restricts unauthorized access to LAN resources or

restricts traffic on a WAN link (IP traffic that goes through the router). IP filters can be

used to restrict the types of Internet traffic that are permitted to access a LAN, and LAN

workstations can be restricted to specific Internet-based applications (such as e-mail).

TCAMs can be used as a filter that blocks all access except for those packets that are

given explicit permission according to the rules of the IP filter. In this application, the

TCAM compares the packet being routed to the port against the IP Filter Rules residing

within CAM. When a match is found, the packet is either permitted or denied, as

presented in Figure 10 below.


14/16

Page14of16

Figure 10: TCAM as an IP Filter

ATM Sw itch

CAMs can be used as a translation table in ATM switching network components.

Due to the fact that ATM networks are connection-oriented, virtual circuits need to be

set up across them prior to any data transfer.

Two types of ATM virtual circuits exist:

Virtual Path, identified by a Virtual Path Identifier (VPI), and, Channel Path, identified by a Channel Path Identifier (VCI).

VPI/VCI values are localized each segment of the total connection has unique

VPI/VCI combinations.

Whenever an ATM cell travels through a switch, its VPI/VCI value must be changed into

the value used for the next segment of connection. This process is called VPI/VCI

translation.

Due to the fact that speed is a significant factor in an ATM network, the speed at which

this translation is done forms a critical factor in the networks overall performance.

CAM can be used for the address translation and contribute significantly to the process

rate. During the translation process, the CAM takes incoming VPI/VCI values in ATM cellheaders and generates addresses that access data in the associated RAM. The

CAM/RAM combination (see discussion in page 10) enables the realization of multi-

mega-bit translation tables with full parallel search capability.

VPI/VCI fields from the ATM cell header are compared to a list of current connections

stored in the CAM array. As a result of the comparison, CAM generates an address that

is used to access an associated RAM where VPI/VCI mapping data and other connection

information is stored.


15/16

Page15of16

The ATM controller modifies the cell header using the VPI/VCI data from the associated

RAM, and the cell is sent to the switch, as presented in Figure 11.

Figure 11: CAM in an ATM sw itch

Translation Look-aside Buffer

A Translation Look-aside Buffer (TLB) is a cache-buffer in a CPU containing parts of the

page-table translating from virtual into physical addresses.

This buffer has a fixed number of entries and is used to improve the speed of virtual

address translation.

The buffer is typically implemented with a CAM in which the search key is the virtual

address and the search result is a real or physical address. If the CAM search yields a

match the translation is known and the match data is used. If no match found the

translation proceeds via the page-table, requiring several more cycles to complete.

The TLB can reside between the CPU and the cache, or between the cache and primary

storage memory. This is pending whether the cache is using virtual addressing or

physical addressing.

In case the cache is virtually addressed, requests are sent directly from the CPU to the

cache, which then accesses the TLB as necessary. If the cache is physically addressed,

the CPU does a TLB lookup on every memory operation, and the resulting physical

address is sent to the cache.

Although not intended by design, if system security has been breached, a restoration

sub-system can use the translation look-aside buffer to alter the view of memory in

order to hide a subversive program or backdoor on a computer.


16/16

Page16of16

Data Compression

Data compression eliminates the inherent redundancy in a given data file, thus

generating an equivalent but smaller file. CAM is well suited for data compression since

a significant portion of compression algorithm time is spent on searching for pre-defined

data patterns. Replacing the algorithms with a hardware based search engine can

significantly increase the throughput of a compression function.

In a data compression application, CAM lookup is performed following the presentation

of each word of the original data as can be seen in F. If the presented word bit-pattern

is found, then the appropriate code is output. If the word is not found in the CAM, then

another word is shifted in.

The CAM will generate the results in a single transaction regardless of table size or

search list length.

This virtue makes CAM an ideal candidate for data compression.

Figure 12: Data Compression

cams for comms

Documents