cams for comms

Upload: niv-b-amit

Post on 06-Apr-2018

226 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/3/2019 Cams for Comms

    1/16

    Page1of16

    CAMs for COMMs

    Content Addressable Memories for Data Communication Applications

    Introduction

    The primary purpose of data communication switch is to route packets to their

    appropriate destinations. This involves searching through routing tables. A routing table

    is a table constructed of addresses and associated information entries of addressing

    data required for the routing of the packets to their destined ports. An entry in such a

    routing table, corresponding to a certain address, provides the switch with someassociated information for a decision of how to route the packets.

    Searching through the routing table should ideally be accomplished within the

    timeframe it takes to read the packet off the link, or, if cut-through switching is

    exercised, where the head of the packet is routed out before the tail arrives, the time it

    takes to read the address fields of the header off the incoming link. As bandwidth and

    switching speeds increase, the time allocated for implementation of the lookup

    procedure is reduced to the point where a software or Random Access Memory (RAM)-

    based approach is not fast enough.

    Taking advantage of the inherent parallelism of Content Addressable Memory (CAM) is

    evident since it offers low latency over a wide variety of address structures. The two

    most common search-intensive tasks that use CAMs are packet forwarding and packet

    classification in Internet routers. Other applications include processors' cache memory,

    Translation Look-aside Buffers (TLB), lossless data compression applications, database

    accelerators, and neural networks.

    What is a CAM?

    Most memory devices store and retrieve data by addressing specific memory locations.

    Finding specific data patterns within a standard RAM often becomes the bottleneck for

    systems that rely on fast memory access due to the fact that it requires several

    accessing cycles. The time required to find specific data stored in memory could be

    reduced considerably if the stored data can be identified for access by the content of

    the data itself rather than by its address. Memory that is accessed in this way is called

    CAM.

  • 8/3/2019 Cams for Comms

    2/16

    Page2of16

    Figure 1: RAM Vs. CAM

    CAM provides significant performance advantages compared with other memory search

    algorithms such as binary and tree-based searches or look-aside tag buffers, bycomparing the desired content against all pre-stored entries simultaneously, frequently

    resulting in an order-of-magnitude reduction of search time. Thus, CAM is hardware,

    associative search engine, much faster than algorithmic approaches for search-intensive

    applications.

    CAMs are composed of conventional semiconductor memory, usually Static RAM

    (SRAM), with added comparison circuitry that enables a search operation to complete

    in a single clock cycle.

    CAM is ideally suited for several applications, including Ethernet address lookup, datacompression, pattern-recognition, cache tags, high-bandwidth address filtering, and fast

    lookup of routing, high-bandwidth address filtering, user privilege, security, or

    encryption information on a packet-by-packet basis for high-performance data switches,

    firewalls, bridges, and routers.

    CAM Basics

    Basic Build ing Block The RAM

    Since CAMs are derivative of the RAM technology, explaining the CAM technology isbased on the RAM.

    Note: Although possible, implementation of the CAM, based on Dynamic RAM (DRAM),

    is not popular mainly due to the refreshing required for DRAMs, which reduces the

    device's throughput performances. Therefore, the following explanation refers to SRAM.

    RAM is an integrated circuit that stores data temporarily in a matrix fashion. Data is

    stored in RAM at a particular location, which is called an address the combination of

    its column and row address (interaction).

  • 8/3/2019 Cams for Comms

    3/16

    Page3of16

    Figure 2: SRAM Architecture

    In RAM, the user presents the address on the address lines (bus) and the memory

    outputs the data stored in that address.

    The number of address lines dictates the depth of the memory, but the width of the

    memory can theoretically be extended as far as desired.

    A single-bit SRAM cell is presented in Figure 3

    Figure 3: Single-Bit SRAM Cell

  • 8/3/2019 Cams for Comms

    4/16

    Page4of16

    Transistors T1 and T2 forms a flip-flop circuit which is the basic storage device.

    Transistors T3 and T4 forms constant current sources which are the flip-flop's

    transistors loads.

    To access the cell for read or write operation, two circuits are added:

    Figure 4: Single-Bit SRAM Cell with Addressing

    Once a valid address is applied, transistors T5 and T6 transfers the cell's data to (in

    case of read) or from (in case of write) sense amplifiers.

    The sense amplifiers translate the signal from dual-ended (differential) to single-ended,

    or vise-versa. These amplifiers are common to all columns of the memory array.

    The circuit presented in above Figure 4 forms the basic SRAM Single cell circuit.

    Memory Organization

    Combining several single bit circuits (as presented in Figure 4) forms a memory word.

    Combining several memory words forms the memory array. The organization of the

    memory component is, in fact, the combination of the circuits in the above Figure 2 and

    Figure 4, and presented below:

  • 8/3/2019 Cams for Comms

    5/16

    Page5of16

    Figure 5: Memory Organization

    Using SRAM Cell to Build a CAM Cell

    To use the above described SRAM cell as CAM cell, three transistors need to be added,

    as presented in Figure 6 below.

    These additional transistors compare the outputs of the SRAM cell (the stored data bit)

    to a coparand (search key) provided via the data bus through the Write sense amplifiers

    (thus turning the bit lines into search lines).

    All the cells' output transistors (like: T9) are wired-NOR together so that when all bits of

    the search key equals the content of the memory word the Match signal line is

    pulled-down to logic "0", generating a True (inverted logic) signal at the output of the

    CAM.

  • 8/3/2019 Cams for Comms

    6/16

    Page6of16

    Figure 6: Single-Bit CAM Cell

    With CAM, the user presents the data and receives a match signal (True/False, which is

    equal to match/ mismatch), sometimes with additional information (for example:

    address where the match was found or some additional associated data, as presented

    in Figure 7 below).

    The CAM searches through the memory within a single clock cycle and returns the

    results.

    The CAM can be pre-loaded with its database at device startup and re-written during

    device operation.

    The CAM can accelerate any application requiring fast searches of databases, lists, or

    patterns, such as in image or voice recognition, or computer and communication

    designs.

    For this reason, CAM is used in applications where search time is critical and must be

    very fast. For example, the search key could be the Internet Protocol (IP) address of a

    network user, and the associated information could be a users access privileges and

    location on the network.

    If the search key presented to the CAM is stored in the CAMs table, the CAM indicates a

    match and returns the associated information, which consists of the users privileges.

  • 8/3/2019 Cams for Comms

    7/16

  • 8/3/2019 Cams for Comms

    8/16

    Page8of16

    2. Ternary CAM supporting storage and search of triple-type bits: Zero, One,and Don't Care (0,1,X), thus adding flexibility to the search patterns.

    Binary CAM

    The above described circuit is the most basic form of CAM. It is also known as Binary

    CAM (BCAM) a CAM which compares dual-type (0, 1) logic bits of the search key to

    the content of the memory.

    Each bit is compared for True or False (0 or 1) values and only when a precise match is

    found, the Match signal is generated.

    Ternary CAM

    In some cases (for example: searching for patterns), only a partial match is required,

    where only a few of the bits need to be precisely matched to the search key and the

    rest to be considered Don't Care. For example a stored word of 10XX0 which will

    match any of the four search words 10000, 10010, 10100, or 10110.

    In these cases, the unattended (compared) bits should be masked (considered match-

    in-any-case) and generate a match signal regardless of the actual data stored.

    The added search flexibility comes at an additional cost, over BCAM, since the internal

    memory cell must encode three possible states instead of the two.

    This additional state is typically implemented by adding a mask bit to every memory

    cell.

    This is also known as Ternary CAM (TCAM) a CAM which compares triple-type (0,1

    and X) logic bits of the search key to the content of the memory.

    Figure 8 below presents the logic operation of the TCAM.

  • 8/3/2019 Cams for Comms

    9/16

    Page9of16

    Figure 8: Ternary CAM Search and Mask Logic

    Other Types of CAM

    Throughout the last few years, where data communication equipment designers grew

    to appreciate the advantages of CAM, many innovative new CAM types were suggested.

    These include:

    Pow er saving implementations due to the relatively high powerconsumption of the CAM circuit (see discussion on page 10), restrictions

    regarding the array size, response speed and the abilities to integrate CAM cores

    into ASICs, were imposed.

  • 8/3/2019 Cams for Comms

    10/16

    Page10of16

    New technologies, circuit designs and architectures allow higher array densities,

    higher speeds of operation and easier integration of CAM cores into PPs' ASIC

    designs.

    The most popular power saving scheme is the Bank-Selection scheme. In this

    scheme, only a subset of the CAM is active in any given cycle and the high power

    consumption search lines are shared between these banks.

    Other technolog ies attempts are made to improve the CAM circuits withrespect to:

    Minimizing the cell's footprint to allow higher array densities, by utilizingDRAM cell structure (a DRAM cell requires only four transistors to compare

    with the above described 6/8 transistors SRAM cell), or single transistormemory cell implementation.

    Other types of search logic the above described search mechanism isNOR-based logic. NAND- and XOR-based were suggested too.

    Other Architectures special architectures were developed for specialpurposes. Among these, the following need to be mentioned:

    Additional memory array(s) for associated data (mentioned above). CAMs for special applications like the Prefix CAM.(PCAM) a Ternary

    CAM optimized for longest prefix matching tasks (IPv4 and other), and the

    Label Encoded CAM (LECAM) a parallel packet classification CAMemploying some special algorithmic techniques with a modified CAM

    architecture.

    CAMs with cache memory (searching for recently used key searches) orwith pipelined hierarchical search scheme to speed up the CAM's search

    operation.

    And others.

    CAM Application

    CAMs are well suited to performing search operations and can be used to accelerate

    any application ranging from Local Area Networks (LANs), database management, file-

    storage management, pattern recognition, artificial intelligence, fully associative and

    processor-specific cache memories, disk cache memories, and high-end data

    communication devices like data switches and routers.

    Typical data communication applications are: Virtual Path Identifier / Virtual Circuit

    Identifier (VPI/VCI) translation in Asynchronous Transfer Mode (ATM) switches up to

    OC12 (622 Mbps) data rates, packet forwarding, IP filtering and packet classification for

  • 8/3/2019 Cams for Comms

    11/16

  • 8/3/2019 Cams for Comms

    12/16

    Page12of16

    The network address, which can vary in size depending on the subnetconfiguration, and,

    The host address, which occupies the remaining bits.Each subnet has a network mask that specifies which bits of the address are the

    network address and which bits are the host address.

    Routing is done by comparing against a routing table, which is maintained by the

    router, which contains:

    Each known destination network address, The associated network mask, and, The information needed to route packets to that destination.

    Without a TCAM circuit, the router need to:

    Compare the destination address of the packet to be routed with each entry inthe routing table,

    Perform a logical AND with the network mask, and, Compare it with the network address.

    If these are equal, the corresponding routing information is used to forward the packet.

    Using a TCAM for the routing table makes the lookup process very efficient Theaddresses are stored using Don't Care for the host part of the address, so looking up

    the destination address in the TCAM immediately retrieves the correct routing entry;

    both the masking and comparison are done by the TCAM hardware circuits.

    The above described Address Lookup function inspects the destination address of the

    packet and selects an output port associated with that address. The list of destination

    addresses of the router, and their corresponding output ports, is called the Routing

    Table. An example of a simplified Routing Table is displayed in Table 1 below.

    Line Number Address (Binary) Output Port

    1 101XX A

    2 0110X B

    3 011XX C

    4 10011 D

    Table 1: Simplified Rou ting Table

  • 8/3/2019 Cams for Comms

    13/16

    Page13of16

    All four entries in the above table are 5-bit words. Due to the X (Don't Care) bits, the

    first three entries in Table 1 represent a range of input addresses. For example: the

    entry on Line 1 indicates that all addresses in the range of 101002101112 areforwarded to port A. The router scans, for each incoming packet, its destination port in

    the Address Lookup Table. For example: if the router receives a packet with incoming

    address of 011012, the Address Lookup will yield matches of both Line 2 and Line 3 in

    Table 1. Line 2 will be selected since it best defines the search key's bit pattern. This is

    the indication that port B is the most direct route to the destination.

    This lookup style is called longest-prefix matching and is required to implement the

    most recent Internet Protocol (IP) networking standard. The routing parameters

    determining the complexity of the implementation are:

    Entry size Table size Search rate Table update rate

    IPv4 protocol's addresses are 32-bits long while IPv6 protocol's addresses are 128-bits

    long. Supplementary information like the source address and QoS information can

    expand IPv6 Routing Table entry sizes to 288576 bits.

    Terabit-class routers need to perform hundreds of millions of searches per second inaddition to thousands of routing table updates per second. Almost all algorithmic

    approaches are too slow to keep up with such high-speed routing requirements. Only

    hardware-based CAM can meet such requirements due to their high search throughput.

    IP Filtering

    An IP filter is a security feature that restricts unauthorized access to LAN resources or

    restricts traffic on a WAN link (IP traffic that goes through the router). IP filters can be

    used to restrict the types of Internet traffic that are permitted to access a LAN, and LAN

    workstations can be restricted to specific Internet-based applications (such as e-mail).

    TCAMs can be used as a filter that blocks all access except for those packets that are

    given explicit permission according to the rules of the IP filter. In this application, the

    TCAM compares the packet being routed to the port against the IP Filter Rules residing

    within CAM. When a match is found, the packet is either permitted or denied, as

    presented in Figure 10 below.

  • 8/3/2019 Cams for Comms

    14/16

    Page14of16

    Figure 10: TCAM as an IP Filter

    ATM Sw itch

    CAMs can be used as a translation table in ATM switching network components.

    Due to the fact that ATM networks are connection-oriented, virtual circuits need to be

    set up across them prior to any data transfer.

    Two types of ATM virtual circuits exist:

    Virtual Path, identified by a Virtual Path Identifier (VPI), and, Channel Path, identified by a Channel Path Identifier (VCI).

    VPI/VCI values are localized each segment of the total connection has unique

    VPI/VCI combinations.

    Whenever an ATM cell travels through a switch, its VPI/VCI value must be changed into

    the value used for the next segment of connection. This process is called VPI/VCI

    translation.

    Due to the fact that speed is a significant factor in an ATM network, the speed at which

    this translation is done forms a critical factor in the networks overall performance.

    CAM can be used for the address translation and contribute significantly to the process

    rate. During the translation process, the CAM takes incoming VPI/VCI values in ATM cellheaders and generates addresses that access data in the associated RAM. The

    CAM/RAM combination (see discussion in page 10) enables the realization of multi-

    mega-bit translation tables with full parallel search capability.

    VPI/VCI fields from the ATM cell header are compared to a list of current connections

    stored in the CAM array. As a result of the comparison, CAM generates an address that

    is used to access an associated RAM where VPI/VCI mapping data and other connection

    information is stored.

  • 8/3/2019 Cams for Comms

    15/16

    Page15of16

    The ATM controller modifies the cell header using the VPI/VCI data from the associated

    RAM, and the cell is sent to the switch, as presented in Figure 11.

    Figure 11: CAM in an ATM sw itch

    Translation Look-aside Buffer

    A Translation Look-aside Buffer (TLB) is a cache-buffer in a CPU containing parts of the

    page-table translating from virtual into physical addresses.

    This buffer has a fixed number of entries and is used to improve the speed of virtual

    address translation.

    The buffer is typically implemented with a CAM in which the search key is the virtual

    address and the search result is a real or physical address. If the CAM search yields a

    match the translation is known and the match data is used. If no match found the

    translation proceeds via the page-table, requiring several more cycles to complete.

    The TLB can reside between the CPU and the cache, or between the cache and primary

    storage memory. This is pending whether the cache is using virtual addressing or

    physical addressing.

    In case the cache is virtually addressed, requests are sent directly from the CPU to the

    cache, which then accesses the TLB as necessary. If the cache is physically addressed,

    the CPU does a TLB lookup on every memory operation, and the resulting physical

    address is sent to the cache.

    Although not intended by design, if system security has been breached, a restoration

    sub-system can use the translation look-aside buffer to alter the view of memory in

    order to hide a subversive program or backdoor on a computer.

  • 8/3/2019 Cams for Comms

    16/16

    Page16of16

    Data Compression

    Data compression eliminates the inherent redundancy in a given data file, thus

    generating an equivalent but smaller file. CAM is well suited for data compression since

    a significant portion of compression algorithm time is spent on searching for pre-defined

    data patterns. Replacing the algorithms with a hardware based search engine can

    significantly increase the throughput of a compression function.

    In a data compression application, CAM lookup is performed following the presentation

    of each word of the original data as can be seen in F. If the presented word bit-pattern

    is found, then the appropriate code is output. If the word is not found in the CAM, then

    another word is shifted in.

    The CAM will generate the results in a single transaction regardless of table size or

    search list length.

    This virtue makes CAM an ideal candidate for data compression.

    Figure 12: Data Compression