fabric on a chip
TRANSCRIPT
-
8/8/2019 Fabric on a Chip
1/38
Presentation by:
C.AnnaduraiSSN College of Engineering
Fabric on a Chip: A
Memory-managementPerspective
-
8/8/2019 Fabric on a Chip
2/38
Objectives
Distributed architecture challenges Fabric Flow
Control CRS Cell based Multi-Stage Benes
Switch Fabric challenges
Metro Architecture Basics
Reassembly Window
Challenges
-
8/8/2019 Fabric on a Chip
3/38
Agenda
Ciscoshigh endrouter
CRS-1
Futuredirections
CRS-1s NPMetro (SPP)
CRS-1sFabric
CRS-1sLine Card
-
8/8/2019 Fabric on a Chip
4/38
What drove the CRS?A sample taxonomy
OC768
Multi chassis Improved BW/Watt & BW/Space
New OS (IOS-XR) Scalable control plane
-
8/8/2019 Fabric on a Chip
5/38
Multiple router flavoursA sample taxonomy
Core OC-12 (622Mbps) and up (to OC-768 ~= 40Gbps)
Big, fat, fast, expensive
E.g. Cisco HFR, Juniper T-640 HFR: 1.2Tbps each, interconnect up to 72 giving 92Tbps, start at $450k
Transit/Peering-facing OC-3 and up, good GigE density
ACLs, full-on BGP, uRPF, accounting Customer-facing
FR/ATM/
Feature set as above, plus fancy queues, etc
Broadband aggregator High scalability: sessions, ports, reconnections Feature set as above
Customer-premises (CPE) 100Mbps
NAT, DHCP, firewall, wireless, VoIP,
Low cost, low-end, perhaps just software on a PC
-
8/8/2019 Fabric on a Chip
6/38
Routers are pushed to the
edge
A sample taxonomy
Over time routers are pushed to the edge as: BW requirements grow
# of interfaces scale Different routers have different offering
Interfaces types (core is mostly Eathernet)
Features. Sometimes the same feature is implemented differently
User interface
Redundancy models
Operating system
Costumers look for: investment protection
Stable network topology Feature parity
-
8/8/2019 Fabric on a Chip
7/38
What does Scaling means A sample taxonomy
Interfaces (BW, number, variance)
BW
Packet rate
Features (e.g. Support link BW in a flexible manner)
More Routes
Wider ECO system Effective Management (e.g. capability to support more BGPpeers and more events)
Fast Control (e.g. distribute routing information)
Availability
Serviceability
Scaling is both up and down (logical routers)
-
8/8/2019 Fabric on a Chip
8/38
RouteTableCPU BufferMemory
LineInterface
MAC
LineInterface
MAC
LineInterface
MAC
Typically
-
8/8/2019 Fabric on a Chip
9/38
High BW distributed
LineCard
MAC
LocalBufferMemory
CPUCard
LineCard
MAC
LocalBufferMemory
Crossbar: Switched Backplane
LineInterface
CPU
Mem
ory FwdingTable
RoutingTable
FwdingTable
Typically
-
8/8/2019 Fabric on a Chip
10/38
Distributed architecture challenges
(examples) HW wise
Switching fabric
High BW switching QOS Traffic loss Speedup
Data plane (SW) High BW / packet rate Limited resources (cpu, memory)
Control plane (SW)
High event rate Routing information distribution (e.g. forwarding tables)
-
8/8/2019 Fabric on a Chip
11/38
-
8/8/2019 Fabric on a Chip
12/38
-
8/8/2019 Fabric on a Chip
13/38
Switch Fabric challenges
Scale - many ports
Fast Distributed arbitration
Minimum disruption with QOS model
Minimum blocking
Balancing
Redundancy
P e io s sol tion GSR Cell
-
8/8/2019 Fabric on a Chip
14/38
Previous solution: GSRCellbased XBAR w centralized
scheduling Each LC hasvariable widthlinks to and
from the XBAR,depending onits bandwidthrequirement
CentralschedulingISLIP based Two request-
grant-acceptrounds
Each arbitrationround lasts onecell time
Per destinationLC virtualoutput queues
Linecard(emphasizing fabric interface)
XBAR SwitchingMatrix(showing
connections for justone linecard)
Fabric Scheduler(showing
connections for justone linecard)
grant
request
XBAR Control
Request/Grant
Control
Virtual Output
Queues
Cellavailabilityinformation
Celltransmitcontrol
Reassembly
Queues
Ingressdata
Egressdata
1 to 16 transmit andreceive lanes
# of lanes varies per linecardtype based on bandwidth
One Output Queue perdestination linecard
One Reassembly Queueper source linecard (and
per unicast/multicast)
To-Fabric Lane(s)
From-Fabric Lane(s)
-
8/8/2019 Fabric on a Chip
15/38
CRS Cell based Multi-Stage
Benes
Multiple paths to a destination
For a given input to output port, the no. of paths is equal to the no. of centerstage elements
Distribution between S1 and S2 stages. Routing at S2 and S3Cell routing
-
8/8/2019 Fabric on a Chip
16/38
Fabric speedup
Q-fabric tries to approximate an
output buffered switch to minimize sub-port blocking
Buffering at output allows betterscheduling
In single stage fabrics a 2X speedup
very closely approximates an outputbuffered fabric *
For multi-stage the speedup factor toapprox output buffered behavior is
-
8/8/2019 Fabric on a Chip
17/38
Fabric Flow Control
Overview
Discard - time constant in the 10s of
mS range Originates from from fab and is directedat to fab.
Is a very fine level of granularity, discardto the level of individual destination rawqueues.
Back Pressure - time constant in the10s ofS range. Originates from the Fabric and is directed
at to fab. Operates per priority at increasingly
-
8/8/2019 Fabric on a Chip
18/38
Reassembly Window
Cells transitioning the Fabric take
different paths between Sprayer andSponge.
Cells for the same packet will arrive
out of order. The Reassembly Window for a given
Source is defined as the the worst-case differential delay two cells from apacket encounter as they traverse the
Fabric. The Fabric limits the Reassembly
-
8/8/2019 Fabric on a Chip
19/38
Linecard challenges
Power
COGS Multiple interfaces
Intermediate buffering
Speed up
CPU subsystem
-
8/8/2019 Fabric on a Chip
20/38
Cisco CRS-1 Line Card
MODULAR SERVICES CARD PLIM
MIDPLAN
E
MIDPLAN
E
CPUSquidGW
OC192
FramerandOptics
OC192
FramerandOptics
OC192Framer
andOptics
OC192Framer
andOptics
OC192Framerand
Optics
OC192Framerand
Optics
OC192FramerandOptics
OC192FramerandOptics
Egress Packet FlowFrom Fabric
InterfaceModuleASIC
RXMETRO
RXMETRO
IngressQueuing
IngressQueuing
TXMETRO
TXMETRO
FromFabricASIC
FromFabricASIC
EgressQueuingEgress
Queuing
4
1
8
76
5
23
-
8/8/2019 Fabric on a Chip
21/38
MODULAR SERVICES CARD PLIM
MIDPLAN
E
MIDPLAN
E
CPUSquidGW
OC192
FramerandOptics
OC192
FramerandOptics
OC192Framer
andOptics
OC192Framer
andOptics
OC192Framerand
Optics
OC192Framerand
Optics
OC192FramerandOptics
OC192FramerandOptics
Egress Packet FlowFrom Fabric
InterfaceModuleASIC
RXMETRO
RXMETRO
IngressQueuing
IngressQueuing
TXMETRO
TXMETRO
FromFabricASIC
FromFabricASIC
EgressQueuin
g
EgressQueuin
g
4
1
8
76
5
23
LineCardCPU
EgressMetro
IngressMetro
IngressQueuing
PowerRegulators
FabricSerdes
From
Fabric
EgressQueuing
Cisco CRS-1 Line Card
-
8/8/2019 Fabric on a Chip
22/38
EgressMetro
IngressMetro
LineCardCPU
IngressQueuing
PowergulatorsRe
FabricSerdes
From
Fabric
EgressQueuing
Cisco CRS-1 Line Card
-
8/8/2019 Fabric on a Chip
23/38
-
8/8/2019 Fabric on a Chip
24/38
Metro Subsystem
-
8/8/2019 Fabric on a Chip
25/38
Metro Subsystem
What is it ?
Massively Parallel NP Codename Metro
Marketing name SPP
(Silicon PacketProcessor)
What were the Goals ?
Programmability Scalability
Who designed &programmed it ?
-
8/8/2019 Fabric on a Chip
26/38
Metro Subsystem
Metro2500 Balls250Mhz
35W
TCAM125MSPS128kx144-
bit entries
2 channels
FCRAM166Mhz DDR9 Channels
Lookups andTableMemory
QDR2 SRAM250Mhz DDR5 Channels
Policing stateClassificationresults Queuelength state
-
8/8/2019 Fabric on a Chip
27/38
Metro Top Level
Packet Out
96 Gb/s BW
Packet In
96 Gb/s BW
18mmx18mm - IBM.13um
18M gates
8Mbit SRAM and RAs
ControlProcessorInterface
Proprietary
2Gb/s
-
8/8/2019 Fabric on a Chip
28/38
Gee-whiz numbers
188 32-bit
embedded Risccores
~50Bips
175
78 MPPS peakperformance
Why Programmability ?
-
8/8/2019 Fabric on a Chip
29/38
Why Programmability ?
Simple forwarding not sosimple Example FEATURES:
MPLS3 Labels Link Bundling (v4)
Load Balancing L3 (v4)
1 Policier Check
Marking
TE/FRR
Sampled Netflow
WRED
ACL
IPv4 Multicast
IPv6 Unicast
Per prefix accounting
GRE/L2TPv3 Tunneling
RPF check (loose/strict) v4
Load Balancing V3 (v6)
Link Bundling (v6)
Congestion Control
IPv4 Unicastlookup algorithm
Hundreds of
Load
balancingEntries per
Millions of
Routes
100k+ of
adjacenciesPointer to
Statistics
Counters
L3
loadbalanceentry L2
info
Increasing pressure to add
1-2 level of increasedindirection for High
Availability and increased
update rates
Lookup
L3info
Load Balancing and Adjacencies : Sram/DRAMSram/Dram
leaf
policy based
routing TCAMtable
TCAM
PBR associative
Sram/DRAM
1:1
data
L2Adjacency
Programmability also means
Ability to juggle feature orderingSupport for heterogeneous mixes of feature chainsRapid introduction of new features (Feature Velocity)
-
8/8/2019 Fabric on a Chip
30/38
96G
188 96G
96G
96G
PPEPPE
PPEPPE
On-ChipPacketBuffer
Resource Fabric
ResourceResource
ResourceResource
Metro Architecture BasicsPacket tailsstored on-
chip Packet
Distribution
Run-to-completion (RTC)simple SW modelefficient heterogeneous feature
processingRTC and Non-Flow based Packet
distribution means scalablearchitecture
Costs
High instruction BW supplyNeed RMW and flow orderingsolutions
~100Bytes of
packetcontext sentto PPEs
-
8/8/2019 Fabric on a Chip
31/38
96G
188 96G
96G
96G
PPEPPE
PPEPPE
On-ChipPacketBuffer
Resource Fabric
ResourceResource
ResourceResource
Metro Architecture BasicsPacketGather
Gather of Packets involves :Assembly of final packets (at100Gb/s)
Packet ordering after variablelength processing
Gathering without new packet
distribution
-
8/8/2019 Fabric on a Chip
32/38
96G
188 96G
96G
96G
PPEPPE
PPEPPE
On-ChipPacketBuffer
Resource Fabric
ResourceResource
ResourceResource
Metro Architecture BasicsPacket Bufferaccessible as
Resource
Resource Fabric is parallel widemulti-drop busses
Resources consist ofMemoriesRead-modify-write operationsPerformance heavy
mechanisms
-
8/8/2019 Fabric on a Chip
33/38
Metro ResourcesStatistics
512k
TCAM
InterfaceTables
Policing100k+
Lookup
Engine2MPrefixes
Table DRAM(10sMB)
QueueDepth State
P1
P2P3
P4 P5
P7
P8
P6
P9
Root Node
ChilArra
Child Pointer
Child Pointer Child Pointer
ChildArray
CCR April 2004 (vol. 34 no. 2) pp 97CCR April 2004 (vol. 34 no. 2) pp 97--123.123. TreeTree
Bitmap : Hardware/Software IP Lookups withBitmap : Hardware/Software IP Lookups with
Incremental UpdatesIncremental Updates, Will Eatherton et. Al., Will Eatherton et. Al.
Lookup Engine usesTreeBitmap Algorithm
FCRAM and on-chipmemory
High Update ratesConfigurable performance
Vs density
Packet Processing Element
-
8/8/2019 Fabric on a Chip
34/38
16 PPEClusters
EachCluster of12 PPEs
Packet Processing Element
(PPE)
.5sqmmper PPE
Packet Processing
-
8/8/2019 Fabric on a Chip
35/38
gElement (PPE)
32-bit RISC
ICACHE
DATA Mem
CiscoDMA
instruction bus
Memory mapped Regs
Distribution Hdr
Pkt Hdr
Scratch Pad
Processor Core
ClusterInstruction
MemoryGlobal
Instruction
Memory
Cluster
DataMux Unit
To12
PPEs
Pkt Distribution
From Resources
Pkt Gather
To Resources
TensilicaXtensa core
with Ciscoenhancements
32-bit, 5-stagepipeline
Code Density: 16/24 bitinstructions
Smallinstruction
To12
PPEs
PPE
Programming Model and
-
8/8/2019 Fabric on a Chip
36/38
Programming Model and
EfficiencyMetro Programming Model Run to completion programming model
Queued descriptor interface toresources
Industry leveraged tool flow
Efficiency Data Points
1 ucoder for 6 months: IPv4 withcommon features (ACL, PBR, QoS,
..
-
8/8/2019 Fabric on a Chip
37/38
-
8/8/2019 Fabric on a Chip
38/38
Summary
Distributed architecture challenges Fabric Flow
Control CRS Cell based Multi-Stage Benes
Switch Fabric challenges Metro Architecture Basics
Reassembly Window
Challenges