fabric on a chip

Upload: sujithaa-subramanian

Post on 10-Apr-2018

216 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/8/2019 Fabric on a Chip

    1/38

    Presentation by:

    C.AnnaduraiSSN College of Engineering

    Fabric on a Chip: A

    Memory-managementPerspective

  • 8/8/2019 Fabric on a Chip

    2/38

    Objectives

    Distributed architecture challenges Fabric Flow

    Control CRS Cell based Multi-Stage Benes

    Switch Fabric challenges

    Metro Architecture Basics

    Reassembly Window

    Challenges

  • 8/8/2019 Fabric on a Chip

    3/38

    Agenda

    Ciscoshigh endrouter

    CRS-1

    Futuredirections

    CRS-1s NPMetro (SPP)

    CRS-1sFabric

    CRS-1sLine Card

  • 8/8/2019 Fabric on a Chip

    4/38

    What drove the CRS?A sample taxonomy

    OC768

    Multi chassis Improved BW/Watt & BW/Space

    New OS (IOS-XR) Scalable control plane

  • 8/8/2019 Fabric on a Chip

    5/38

    Multiple router flavoursA sample taxonomy

    Core OC-12 (622Mbps) and up (to OC-768 ~= 40Gbps)

    Big, fat, fast, expensive

    E.g. Cisco HFR, Juniper T-640 HFR: 1.2Tbps each, interconnect up to 72 giving 92Tbps, start at $450k

    Transit/Peering-facing OC-3 and up, good GigE density

    ACLs, full-on BGP, uRPF, accounting Customer-facing

    FR/ATM/

    Feature set as above, plus fancy queues, etc

    Broadband aggregator High scalability: sessions, ports, reconnections Feature set as above

    Customer-premises (CPE) 100Mbps

    NAT, DHCP, firewall, wireless, VoIP,

    Low cost, low-end, perhaps just software on a PC

  • 8/8/2019 Fabric on a Chip

    6/38

    Routers are pushed to the

    edge

    A sample taxonomy

    Over time routers are pushed to the edge as: BW requirements grow

    # of interfaces scale Different routers have different offering

    Interfaces types (core is mostly Eathernet)

    Features. Sometimes the same feature is implemented differently

    User interface

    Redundancy models

    Operating system

    Costumers look for: investment protection

    Stable network topology Feature parity

  • 8/8/2019 Fabric on a Chip

    7/38

    What does Scaling means A sample taxonomy

    Interfaces (BW, number, variance)

    BW

    Packet rate

    Features (e.g. Support link BW in a flexible manner)

    More Routes

    Wider ECO system Effective Management (e.g. capability to support more BGPpeers and more events)

    Fast Control (e.g. distribute routing information)

    Availability

    Serviceability

    Scaling is both up and down (logical routers)

  • 8/8/2019 Fabric on a Chip

    8/38

    RouteTableCPU BufferMemory

    LineInterface

    MAC

    LineInterface

    MAC

    LineInterface

    MAC

    Typically

  • 8/8/2019 Fabric on a Chip

    9/38

    High BW distributed

    LineCard

    MAC

    LocalBufferMemory

    CPUCard

    LineCard

    MAC

    LocalBufferMemory

    Crossbar: Switched Backplane

    LineInterface

    CPU

    Mem

    ory FwdingTable

    RoutingTable

    FwdingTable

    Typically

  • 8/8/2019 Fabric on a Chip

    10/38

    Distributed architecture challenges

    (examples) HW wise

    Switching fabric

    High BW switching QOS Traffic loss Speedup

    Data plane (SW) High BW / packet rate Limited resources (cpu, memory)

    Control plane (SW)

    High event rate Routing information distribution (e.g. forwarding tables)

  • 8/8/2019 Fabric on a Chip

    11/38

  • 8/8/2019 Fabric on a Chip

    12/38

  • 8/8/2019 Fabric on a Chip

    13/38

    Switch Fabric challenges

    Scale - many ports

    Fast Distributed arbitration

    Minimum disruption with QOS model

    Minimum blocking

    Balancing

    Redundancy

    P e io s sol tion GSR Cell

  • 8/8/2019 Fabric on a Chip

    14/38

    Previous solution: GSRCellbased XBAR w centralized

    scheduling Each LC hasvariable widthlinks to and

    from the XBAR,depending onits bandwidthrequirement

    CentralschedulingISLIP based Two request-

    grant-acceptrounds

    Each arbitrationround lasts onecell time

    Per destinationLC virtualoutput queues

    Linecard(emphasizing fabric interface)

    XBAR SwitchingMatrix(showing

    connections for justone linecard)

    Fabric Scheduler(showing

    connections for justone linecard)

    grant

    request

    XBAR Control

    Request/Grant

    Control

    Virtual Output

    Queues

    Cellavailabilityinformation

    Celltransmitcontrol

    Reassembly

    Queues

    Ingressdata

    Egressdata

    1 to 16 transmit andreceive lanes

    # of lanes varies per linecardtype based on bandwidth

    One Output Queue perdestination linecard

    One Reassembly Queueper source linecard (and

    per unicast/multicast)

    To-Fabric Lane(s)

    From-Fabric Lane(s)

  • 8/8/2019 Fabric on a Chip

    15/38

    CRS Cell based Multi-Stage

    Benes

    Multiple paths to a destination

    For a given input to output port, the no. of paths is equal to the no. of centerstage elements

    Distribution between S1 and S2 stages. Routing at S2 and S3Cell routing

  • 8/8/2019 Fabric on a Chip

    16/38

    Fabric speedup

    Q-fabric tries to approximate an

    output buffered switch to minimize sub-port blocking

    Buffering at output allows betterscheduling

    In single stage fabrics a 2X speedup

    very closely approximates an outputbuffered fabric *

    For multi-stage the speedup factor toapprox output buffered behavior is

  • 8/8/2019 Fabric on a Chip

    17/38

    Fabric Flow Control

    Overview

    Discard - time constant in the 10s of

    mS range Originates from from fab and is directedat to fab.

    Is a very fine level of granularity, discardto the level of individual destination rawqueues.

    Back Pressure - time constant in the10s ofS range. Originates from the Fabric and is directed

    at to fab. Operates per priority at increasingly

  • 8/8/2019 Fabric on a Chip

    18/38

    Reassembly Window

    Cells transitioning the Fabric take

    different paths between Sprayer andSponge.

    Cells for the same packet will arrive

    out of order. The Reassembly Window for a given

    Source is defined as the the worst-case differential delay two cells from apacket encounter as they traverse the

    Fabric. The Fabric limits the Reassembly

  • 8/8/2019 Fabric on a Chip

    19/38

    Linecard challenges

    Power

    COGS Multiple interfaces

    Intermediate buffering

    Speed up

    CPU subsystem

  • 8/8/2019 Fabric on a Chip

    20/38

    Cisco CRS-1 Line Card

    MODULAR SERVICES CARD PLIM

    MIDPLAN

    E

    MIDPLAN

    E

    CPUSquidGW

    OC192

    FramerandOptics

    OC192

    FramerandOptics

    OC192Framer

    andOptics

    OC192Framer

    andOptics

    OC192Framerand

    Optics

    OC192Framerand

    Optics

    OC192FramerandOptics

    OC192FramerandOptics

    Egress Packet FlowFrom Fabric

    InterfaceModuleASIC

    RXMETRO

    RXMETRO

    IngressQueuing

    IngressQueuing

    TXMETRO

    TXMETRO

    FromFabricASIC

    FromFabricASIC

    EgressQueuingEgress

    Queuing

    4

    1

    8

    76

    5

    23

  • 8/8/2019 Fabric on a Chip

    21/38

    MODULAR SERVICES CARD PLIM

    MIDPLAN

    E

    MIDPLAN

    E

    CPUSquidGW

    OC192

    FramerandOptics

    OC192

    FramerandOptics

    OC192Framer

    andOptics

    OC192Framer

    andOptics

    OC192Framerand

    Optics

    OC192Framerand

    Optics

    OC192FramerandOptics

    OC192FramerandOptics

    Egress Packet FlowFrom Fabric

    InterfaceModuleASIC

    RXMETRO

    RXMETRO

    IngressQueuing

    IngressQueuing

    TXMETRO

    TXMETRO

    FromFabricASIC

    FromFabricASIC

    EgressQueuin

    g

    EgressQueuin

    g

    4

    1

    8

    76

    5

    23

    LineCardCPU

    EgressMetro

    IngressMetro

    IngressQueuing

    PowerRegulators

    FabricSerdes

    From

    Fabric

    EgressQueuing

    Cisco CRS-1 Line Card

  • 8/8/2019 Fabric on a Chip

    22/38

    EgressMetro

    IngressMetro

    LineCardCPU

    IngressQueuing

    PowergulatorsRe

    FabricSerdes

    From

    Fabric

    EgressQueuing

    Cisco CRS-1 Line Card

  • 8/8/2019 Fabric on a Chip

    23/38

  • 8/8/2019 Fabric on a Chip

    24/38

    Metro Subsystem

  • 8/8/2019 Fabric on a Chip

    25/38

    Metro Subsystem

    What is it ?

    Massively Parallel NP Codename Metro

    Marketing name SPP

    (Silicon PacketProcessor)

    What were the Goals ?

    Programmability Scalability

    Who designed &programmed it ?

  • 8/8/2019 Fabric on a Chip

    26/38

    Metro Subsystem

    Metro2500 Balls250Mhz

    35W

    TCAM125MSPS128kx144-

    bit entries

    2 channels

    FCRAM166Mhz DDR9 Channels

    Lookups andTableMemory

    QDR2 SRAM250Mhz DDR5 Channels

    Policing stateClassificationresults Queuelength state

  • 8/8/2019 Fabric on a Chip

    27/38

    Metro Top Level

    Packet Out

    96 Gb/s BW

    Packet In

    96 Gb/s BW

    18mmx18mm - IBM.13um

    18M gates

    8Mbit SRAM and RAs

    ControlProcessorInterface

    Proprietary

    2Gb/s

  • 8/8/2019 Fabric on a Chip

    28/38

    Gee-whiz numbers

    188 32-bit

    embedded Risccores

    ~50Bips

    175

    78 MPPS peakperformance

    Why Programmability ?

  • 8/8/2019 Fabric on a Chip

    29/38

    Why Programmability ?

    Simple forwarding not sosimple Example FEATURES:

    MPLS3 Labels Link Bundling (v4)

    Load Balancing L3 (v4)

    1 Policier Check

    Marking

    TE/FRR

    Sampled Netflow

    WRED

    ACL

    IPv4 Multicast

    IPv6 Unicast

    Per prefix accounting

    GRE/L2TPv3 Tunneling

    RPF check (loose/strict) v4

    Load Balancing V3 (v6)

    Link Bundling (v6)

    Congestion Control

    IPv4 Unicastlookup algorithm

    Hundreds of

    Load

    balancingEntries per

    Millions of

    Routes

    100k+ of

    adjacenciesPointer to

    Statistics

    Counters

    L3

    loadbalanceentry L2

    info

    Increasing pressure to add

    1-2 level of increasedindirection for High

    Availability and increased

    update rates

    Lookup

    L3info

    Load Balancing and Adjacencies : Sram/DRAMSram/Dram

    leaf

    policy based

    routing TCAMtable

    TCAM

    PBR associative

    Sram/DRAM

    1:1

    data

    L2Adjacency

    Programmability also means

    Ability to juggle feature orderingSupport for heterogeneous mixes of feature chainsRapid introduction of new features (Feature Velocity)

  • 8/8/2019 Fabric on a Chip

    30/38

    96G

    188 96G

    96G

    96G

    PPEPPE

    PPEPPE

    On-ChipPacketBuffer

    Resource Fabric

    ResourceResource

    ResourceResource

    Metro Architecture BasicsPacket tailsstored on-

    chip Packet

    Distribution

    Run-to-completion (RTC)simple SW modelefficient heterogeneous feature

    processingRTC and Non-Flow based Packet

    distribution means scalablearchitecture

    Costs

    High instruction BW supplyNeed RMW and flow orderingsolutions

    ~100Bytes of

    packetcontext sentto PPEs

  • 8/8/2019 Fabric on a Chip

    31/38

    96G

    188 96G

    96G

    96G

    PPEPPE

    PPEPPE

    On-ChipPacketBuffer

    Resource Fabric

    ResourceResource

    ResourceResource

    Metro Architecture BasicsPacketGather

    Gather of Packets involves :Assembly of final packets (at100Gb/s)

    Packet ordering after variablelength processing

    Gathering without new packet

    distribution

  • 8/8/2019 Fabric on a Chip

    32/38

    96G

    188 96G

    96G

    96G

    PPEPPE

    PPEPPE

    On-ChipPacketBuffer

    Resource Fabric

    ResourceResource

    ResourceResource

    Metro Architecture BasicsPacket Bufferaccessible as

    Resource

    Resource Fabric is parallel widemulti-drop busses

    Resources consist ofMemoriesRead-modify-write operationsPerformance heavy

    mechanisms

  • 8/8/2019 Fabric on a Chip

    33/38

    Metro ResourcesStatistics

    512k

    TCAM

    InterfaceTables

    Policing100k+

    Lookup

    Engine2MPrefixes

    Table DRAM(10sMB)

    QueueDepth State

    P1

    P2P3

    P4 P5

    P7

    P8

    P6

    P9

    Root Node

    ChilArra

    Child Pointer

    Child Pointer Child Pointer

    ChildArray

    CCR April 2004 (vol. 34 no. 2) pp 97CCR April 2004 (vol. 34 no. 2) pp 97--123.123. TreeTree

    Bitmap : Hardware/Software IP Lookups withBitmap : Hardware/Software IP Lookups with

    Incremental UpdatesIncremental Updates, Will Eatherton et. Al., Will Eatherton et. Al.

    Lookup Engine usesTreeBitmap Algorithm

    FCRAM and on-chipmemory

    High Update ratesConfigurable performance

    Vs density

    Packet Processing Element

  • 8/8/2019 Fabric on a Chip

    34/38

    16 PPEClusters

    EachCluster of12 PPEs

    Packet Processing Element

    (PPE)

    .5sqmmper PPE

    Packet Processing

  • 8/8/2019 Fabric on a Chip

    35/38

    gElement (PPE)

    32-bit RISC

    ICACHE

    DATA Mem

    CiscoDMA

    instruction bus

    Memory mapped Regs

    Distribution Hdr

    Pkt Hdr

    Scratch Pad

    Processor Core

    ClusterInstruction

    MemoryGlobal

    Instruction

    Memory

    Cluster

    DataMux Unit

    To12

    PPEs

    Pkt Distribution

    From Resources

    Pkt Gather

    To Resources

    TensilicaXtensa core

    with Ciscoenhancements

    32-bit, 5-stagepipeline

    Code Density: 16/24 bitinstructions

    Smallinstruction

    To12

    PPEs

    PPE

    Programming Model and

  • 8/8/2019 Fabric on a Chip

    36/38

    Programming Model and

    EfficiencyMetro Programming Model Run to completion programming model

    Queued descriptor interface toresources

    Industry leveraged tool flow

    Efficiency Data Points

    1 ucoder for 6 months: IPv4 withcommon features (ACL, PBR, QoS,

    ..

  • 8/8/2019 Fabric on a Chip

    37/38

  • 8/8/2019 Fabric on a Chip

    38/38

    Summary

    Distributed architecture challenges Fabric Flow

    Control CRS Cell based Multi-Stage Benes

    Switch Fabric challenges Metro Architecture Basics

    Reassembly Window

    Challenges