gigabit routing on a software- exposed tiled-microprocessor james w anderson, anthony degangi, anant...
Post on 24-Dec-2015
216 Views
Preview:
TRANSCRIPT
Gigabit Routing on a Software-Gigabit Routing on a Software-exposed Tiled-Microprocessorexposed Tiled-Microprocessor
James W Anderson, Anthony Degangi, Anant Agarwal
Umar Saif
MIT Computer Science and AI Laboratory
Network RoutersNetwork Routers
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
xKb/sec xGb/sec
~5 ports ~102 ports
Network “Switch” Network “Processor”
Three ChallengesThree Challenges
Performance– 5 -- 10Gb/sec (OC-192)
Architectural Scalability– Throughput: x2.2/year– Port count: 10 -- 100 for edge routers
Programmability– Network Services: NAT, firewalls, VPN “Layer 7”
switches
– Monitoring: Loss rate, link utilization, traffic patterns
Network ProcessorsNetwork Processors
Conventional Wisdom
Tiled “all-purpose”architectures
MIT RAW MicroprocessorMIT RAW Microprocessor
ComputePipeline
Registered at input longest wire = length of tile
8 32-bit channels•2 DOR dynamic networks
•Memory Dynamic(MDN)•General Dynamic(GDN)
•2 Static Networks•Streaming Tile-Multicast
•Tiled-architecture •Low-latency mesh networks•Software-exposed pins
8 stage 32bMIPS-stylesingle-issuein-order computeprocessor
4-stage 32bpipelined FPU
32 KB DCache
32 KB IMem
Routers and wires for threeon-chip mesh networks
RAW MicroprocessorRAW Microprocessor
RAW
Software-exposed tiled-architecture
Software exposed Pins
Software-exposed point-to-point
networks
Network Routing
Parallel processing
Flexible buffering
Efficient, scalable switching
However ..However ..
Network Processors
RAW Microprocessor
Processing Special-purpose hardware
Software running on RAW general-purpose tiles
Switching Special-purpose switching fabric
RAW general-purpose on-chip networks
Buffering Centrally-accessible- specialized memory-controllers
- dedicated
interconnects
External to the chip, - connected to Software-exposed pins- Accessed via RAW on-chip networks
IPv4 Router: RFC 1812IPv4 Router: RFC 1812
Look-up– DIR-24-8-BASIC [Gupta98]
Header verificationTTL update, header re-compute
– Incremental Checksum [RFC 1141]Switch to destination
Evaluation MethodologyEvaluation Methodology
Maximum Loss Free Forwarding Rate MLFFR– Minimum-sized 64-byte packets
• Millions of packets per second (mpps)
– Maximum-sized 1500-byte packets:• Gigabit/sec
Captured Internet Trace: ~128 bytes Packet Latency RAW Clocked at 425 Mhz Comparison with IXP1200 as a reference
point
RAW Router, Take 1: RAW Router, Take 1: ParallelismParallelism
DRAMSRAM
SRAM DRAM
Line Card
Line Card
Line Card
Line Card
Line Card
Line Card
Line Card
Line Card
Lookup tables
Lookup tables
Packet Buffer
Packet Buffer
Lookup2 stage lookup
Header Verify
Header recomputeInterrupt Drain-tile
Drain FIFO
Flow of PacketsFlow of Packets
DRAMLookup
Lookup DRAM
Line Card
Line Card
Line Card
Line Card
Line Card
Line Card
Line Card
Line Card
L UV D
L: LookupV: VerifyU: UpdateD: Drain
RAW Router, Take 1RAW Router, Take 1
• Static Network for Streaming Packets• Feed the pipeline• Stream the payload to DRAM
• General Dynamic Network• Header Forwarding 3 -> 4
• Memory Dynamic Network
• From memory to line-card
DRAMSRAM
SRAM DRAM
Line Card
Line Card
Line Card
Line Card
Line Card
Line Card
Line Card
Line Card
StaticMDNGDN
Version I PerformanceVersion I Performance1.8 Gb/sec -- > 6.17Gb/sec
2.9 mpps -- > 6.23 mpps
RAW Router Version 1RAW Router Version 1
DRAMSRAM
SRAM DRAM
Line Card
Line Card
Line Card
Line Card
Line Card
Line Card
Line Card
Line Card
Bus Contention
Shared Buffering
Memory Dynamic Network
DOR: x --> y
RAW Router, Take 2: RAW Router, Take 2: Buffering and SwitchingBuffering and Switching
SDRAM
SDRAMSDRAM
SDRAM
Line Card
Line Card
Line Card
Line Card
Line Card
Line Card
Line Card
Line Card
Lookup Lookup
Lookup2 stage lookup
Header Verify
Header recomputeInterrupt Drain-tile
Drain FIFO
RAW Router, Take 2RAW Router, Take 2
SDRAM
SDRAMSDRAM
SDRAM
Line Card
Line Card
Line Card
Line Card
Line Card
Line Card
Line Card
Line Card
Lookup Lookup
• Respects DOR
• No “bus contention” for DMAs (bottleneck is shared SDRAMs)
• 2x Memory BW
• No need to look at packet length
• Dynamic networks for “out-of-band” communication
StaticMDNGDN
Optimized buffering and switchingOptimized buffering and switching
6.17 Gb/sec -- > 8.68Gb/sec
6.17 mpps -- > 6.77 mpps
RAW Router, take 3:RAW Router, take 3:Reducing Memory TransactionsReducing Memory Transactions
DRAM
SDRAMSDRAM
SDRAM
Line Card
Line Card
Line Card
Line Card
Line Card
Line Card
Line Card
Line Card
SDRAM SDRAM
Streaming DDRNo fragmentation of frames
Pipelined Memory Requests
Streaming packet buffers Streaming packet buffers + 64-byte minimum buffering+ 64-byte minimum buffering
8.68 Gb/sec -- > 9.57Gb/sec
6.77 mpps -- > 9.79 mpps
Buffering on Line-cardsBuffering on Line-cards9.57 Gb/sec -- > 15.03Gb/sec
9.79 mpps -- > 9.79mpps
All dynamic networksAll dynamic networks9.57 Gb/sec -- > 8.50Gb/sec
9.57 mpps -- > 6.94 mpps
Evaluation with captured Trace Evaluation with captured Trace
Packet LatencyPacket Latency
Router Packet size Cycles Time(ns)
RAW null 64 416 177
RAW IPv4 64 690 293
RAW null 1500 3490 1483
RAW IPv4 1500 5394 2292
ConclusionsConclusions
Tiled-architectures = NPU performance + enhanced programmability
RAW’s low-level software-control was vital for deriving performance:– Layout of routing functions
• 30% improvement by altering layout
– Role and behavior of the on-chip networks
• 15% improvement by using GDN and static networks in place of MDN
ConclusionsConclusions
Network oblivious: 30-35% degradation
No Static networks: 10-30% degradation
Buffering on line-cards: 35% improvement
Thank you!
Questions: umar@mit.edu
top related