may 28th, 2002nick mckeown 1 scaling routers: where do we go from here? hpsr, kobe, japan may 28 th,...
Post on 19-Dec-2015
214 views
TRANSCRIPT
1Nick McKeownMay 28th, 2002
High PerformanceSwitching and RoutingTelecom Center Workshop: Sept 4, 1997.
Scaling routers: Where do we go from here?
HPSR, Kobe, JapanMay 28th, 2002
Nick McKeownProfessor of Electrical Engineering and Computer Science, Stanford University
[email protected]/~nickm
May 28th, 2002 Nick McKeown 2
Relative performance increase
1
10
100
1000
1990 1992 1994 1996 1998 2000 2002
Router capacityx2.2/18 months
Moore’s lawx2/18 m
May 28th, 2002 Nick McKeown 3
Relative performance increase
1
10
100
1000
1990 1992 1994 1996 1998 2000 2002
Router capacityx2.2/18 months
DRAM access rate x1.1/18 m
Moore’s lawx2/18 m
May 28th, 2002 Nick McKeown 4
Router vital statisticsCisco GSR 12416 Juniper M160
6ft
19”
2ft
Capacity: 160Gb/sPower: 4.2kW
3ft
2.5ft
19”
Capacity: 80Gb/sPower: 2.6kW
May 28th, 2002 Nick McKeown 5
Relative performance increase
0
200
400
600
800
1000
1200
2002 2004 2006 2008 2010 2012
Internettraffic x2/yr
Router capacityx2.2/18 months
5x
May 28th, 2002 Nick McKeown 6
POP with smaller routersPOP with large routers
Interfaces: Price >$200k, Power > 400W About 50-60% of interfaces are used for interconnection
within the POP. Industry trend is towards large, single router per POP.
Fast (large) routers Big POPs need big routers
May 28th, 2002 Nick McKeown 7
Job of router architect
For a given set of features:
3
. . 5
2
Maximize capacity,
Power,
Volume,
C
P kW
V
t
m
s
May 28th, 2002 Nick McKeown 8
Mind the gap
Operators are unlikely to deploy 5 times as many POPs, or make them 5 times bigger, with 5 times the power consumption.
Our options:1. Make routers simple2. Use more parallelism3. Use more optics
May 28th, 2002 Nick McKeown 9
Mind the gap
Operators are unlikely to deploy 5 times as many POPs, or make them 5 times bigger, with 5 times the power consumption.
Our options:1. Make routers simple2. Use more parallelism3. Use more optics
May 28th, 2002 Nick McKeown 10
Make routers simple
We tell our students that Internet routers are simple. All routers do is make a forwarding decision, update a header, then forward packets to the correct outgoing interface.
But I don’t understand them anymore. List of required features is huge and still growing, Software is complex and unreliable, Hardware is complex and power-hungry.
May 28th, 2002 Nick McKeown 11
Router linecard
PhysicalLayer
Framing&
Maintenance
PacketProcessing
Buffer Mgmt&
Scheduling
Buffer Mgmt&
Scheduling
Buffer & StateMemory
Buffer & StateMemory
OC192c linecard
30M gates 2.5Gbits of memory 1m2
$25k cost, $200k price.
LookupTables
Optics
SchedulerScheduler
May 28th, 2002 Nick McKeown 12
Things that slow routers down
250ms of buffering Requires off-chip memory, more board space, pins and power.
Multicast Affects everything! Complicates design, slows deployment.
Latency bounds Limits pipelining.
Packet sequence Limits parallelism.
Small internal cell size Complicates arbitration.
DiffServ, IntServ, priorities, WFQ etc. Others: IPv6, Drop policies, VPNs, ACLs, DOS traceback,
measurement, statistics, …
May 28th, 2002 Nick McKeown 13
An example: Packet processing
1
10
100
1000
1996 1997 1998 1999 2000 2001
CPU Instructions per minimum length packet since 1996
May 28th, 2002 Nick McKeown 14
Reducing complexityConclusion
Need aggressive reduction in complexity of routers.
Get rid of irrelevant requirements and irrational tests.
It is not clear who has the right incentive to make this happen.
Else, be prepared for core routers to be replaced by optical circuit switches.
May 28th, 2002 Nick McKeown 15
Mind the gap
Operators are unlikely to deploy 5 times as many POPs, or make them 5 times bigger, with 5 times the power consumption.
Our options:1. Make routers simpler2. Use more parallelism3. Use more optics
May 28th, 2002 Nick McKeown 16
Use more parallelism
Parallel packet buffers Parallel lookups Parallel packet switches
Things that make parallelism hard: Maintaining packet order, Making throughput guarantees, Making delay guarantees, Latency requirements, Multicast.
May 28th, 2002 Nick McKeown 17
Parallel Packet Switches
1
2
k
1
N
rate, R
rate, R
rate, R
rate, R
1
N
Router
Bufferless
May 28th, 2002 Nick McKeown 18
Characteristics
Advantages kmemory bandwidth klookup/classification rate k routing/classification table size
With appropriate algorithms Packets remain in order, 100% throughput, Delay guarantees (at least in theory).
May 28th, 2002 Nick McKeown 19
Mind the gap
Operators are unlikely to deploy 5 times as many POPs, or make them 5 times bigger, with 5 times the power consumption.
Our options:1. Make routers simpler2. Use more parallelism3. Use more optics
May 28th, 2002 Nick McKeown 20
A router is a packet-switch, and so requires A switch fabric, Per-packet address lookup, Large buffers for times of congestion.
Packet processing/buffering infeasible with
optics A typical 10 Gb/s router linecard has 30 Mgates and 2.5
Gbits of memory.
Research Problem How to optimize the architecture of a router that uses an
optical switch fabric?
All-optical routers don’t make sense
May 28th, 2002 Nick McKeown 21
100Tb/s optical routerStanford University Research Project
Collaboration 4 Professors at Stanford (Mark Horowitz, Nick
McKeown, David Miller and Olav Solgaard), and our groups.
Objective To determine the best way to incorporate optics into
routers. Push technology hard to expose new issues.
• Photonics, Electronics, System design Motivating example: The design of a 100 Tb/s
Internet router• Challenging but not impossible (~100x current commercial
systems)• It identifies some interesting research problems
May 28th, 2002 Nick McKeown 22
Arbitration
160Gb/s
40Gb/s
40Gb/s
40Gb/s
40Gb/s
OpticalSwitch
• Line termination
• IP packet processing
• Packet buffering
• Line termination
• IP packet processing
• Packet buffering
160-320Gb/s
160-320Gb/s
Electronic
Linecard #1ElectronicLinecard #625
Request
Grant
(100Tb/s = 625 * 160Gb/s)
100Tb/s optical router
May 28th, 2002 Nick McKeown 23
Research Problems
Linecard Memory bottleneck: Address lookup and packet
buffering.
Architecture Arbitration: Computation complexity.
Switch Fabric Optics: Fabric scalability and speed, Electronics: Switch control and link electronics, Packaging: Three surface problem.
May 28th, 2002 Nick McKeown 24
160Gb/s Linecard: Packet Buffering
Problem Packet buffer needs density of DRAM (40 Gbits) and speed of SRAM (2ns
per packet)
Solution Hybrid solution uses on-chip SRAM and off-chip DRAM. Identified optimal algorithms that minimize size of SRAM (12 Mbits). Precisely emulates behavior of 40 Gbit, 2ns SRAM.
DRAM DRAM DRAM
160 Gb/s 160 Gb/s
Queue Manager
klamath.stanford.edu/~nickm/papers/ieeehpsr2001.pdf
SRAM
May 28th, 2002 Nick McKeown 25
The Arbitration Problem
A packet switch fabric is reconfigured for every packet transfer.
At 160Gb/s, a new IP packet can arrive every 2ns.
The configuration is picked to maximize throughput and not waste capacity.
Known algorithms are too slow.
May 28th, 2002 Nick McKeown 26
Approach
We know that a crossbar with VOQs, and uniform Bernoulli i.i.d. arrivals, gives 100% throughput for the following scheduling algorithms: Pick a permutation uar from all permutations. Pick a permutation uar from the set of size N in which each
input-output pair (i,j) are connected exactly once in the set. From the same set as above, repeatedly cycle through a
fixed sequence of N different permutations.
Can we make non-uniform, bursty traffic uniform “enough” for the above to hold?
May 28th, 2002 Nick McKeown 27
2-Stage SwitchExternal Outputs
Internal Inputs
1
N
ExternalInputs
Spanning Set of Permutations
Spanning Set of Permutations
1
N
1
N
Recently shown to have 100% throughput Mild conditions: weakly mixing arrival processes
C.S.Chang et al.: http://www.ee.nthu.edu.tw/~cschang/PartI.pdf
May 28th, 2002 Nick McKeown 28
2-Stage SwitchExternal Outputs
Internal Inputs
1
N
ExternalInputs Spanning Set of
PermutationsSpanning Set of Permutations
1
N
1
N
( )a t ( )b t
1( )t 2 ( )t( )q t
1 1
21
1.
1 1 1li
( ) ( ) ( ) ( ) ( )
( ) (m 0.)
Long-term, service opportunities exceed arrivals:
t
b tE E E E eN
e eN
t a t t a
N
t
b t t
May 28th, 2002 Nick McKeown 29
Problem: Unbounded Mis-sequencing
External Outputs
Internal Inputs
1
N
ExternalInputs
Spanning Set of Permutations
Spanning Set of Permutations
1
N
1
N
11
2
2
Side-note: Mis-sequencing is maximized when arrivals are uniform.
May 28th, 2002 Nick McKeown 30
Preventing Mis-sequencing
1
N
Spanning Set of Permutations
Spanning Set of Permutations
1
N
1
N
The Full Frames First algorithm: Keep packets ordered and Guarantees a delay bound within the optimum
Infocom’02: klamath.stanford.edu/~nickm/papers/infocom02_two_stage.pdf
Small CoordinationBuffers & ‘FFF’ Algorithm
Large CongestionBuffers
May 28th, 2002 Nick McKeown 31
1
2
3
Phase 2
Phase 1
Idea: Use a single-stage twice
ExampleOptical 2-stage Switch
Lookup
Buffer
Lookup
Buffer
Lookup
Buffer
Linecards
May 28th, 2002 Nick McKeown 32
ExamplePassive Optical 2-Stage “Switch”
R/N
R/N
R/N
R/N
R/N
R/N
IngressLinecard 1
IngressLinecard 2
IngressLinecard n
MidstageLinecard 1
MidstageLinecard 2
MidstageLinecard n
EgressLinecard 1
EgressLinecard 2
EgressLinecard n
It is helpful to think of it as spreading rather than switching.
May 28th, 2002 Nick McKeown 34
Passive Optical Switching
1, , n
1, , n
1, , n
1 1
2 2
n n
1, , n
1, , n
1, , n
1
2
n
MidstageLinecard 1
MidstageLinecard 2
MidstageLinecard n
IngressLinecard 1
IngressLinecard 2
IngressLinecard n
1, , n
1, , n
1, , n
1, , n
1, , n
1, , n
1
2
n
EgressLinecard 1
EgressLinecard 2
EgressLinecard n
1 1
2 2
n n
Integrated AWGR ordiffraction grating based wavelength router
May 28th, 2002 Nick McKeown 35
100Tb/s Router
Optical SwitchFabric
Racks of 160Gb/sLinecards
Optical links
May 28th, 2002 Nick McKeown 36
Racks with 160Gb/s linecards
DRAM DRAM DRAM
Queue ManagerSRAM
Lookup
DRAM DRAM DRAM
Queue ManagerSRAM
Lookup
May 28th, 2002 Nick McKeown 37
Additional Technologies
Demonstrated or in development Chip to chip optical interconnects with total
power dissipations of several mW. Demonstration of wavelength division
multiplexed chip interconnect. Integrated laser modulators. 8Gsample/s serial links. Low-power variable power supply serial links. Integrated array waveguide routers.
TX TX TX TX
TX
-PL
L
TX
-DL
L
TestingInterface
RX
RX-DLL
RXRX-PLL
TX/RXFeedback
Biasing
data gen data gen
Dig
ita
l Slid
ing
Co
ntr
olle
r
Bu
ck
Co
nv
ert
er
Po
we
rT
ran
sis
tors
PRBSPRBS
40 μm
May 28th, 2002 Nick McKeown 38
Mind the gap
Operators are unlikely to deploy 5 times as many POPs, or make them 5 times bigger, with 5 times the power consumption.
Our options:1. Make routers simpler2. Use more parallelism3. Use more optics
May 28th, 2002 Nick McKeown 39
Some predictions about core Internet routers
The need for more capacity for a given power and volume budget will mean:
Fewer functions in routers: Little or no optimization for multicast, Continued overprovisioning will lead to little or no support
for QoS, DiffServ, …, Fewer unnecessary requirements:
Mis-sequencing will be tolerated, Latency requirements will be relaxed.
Less programmability in routers, and hence no network processors.
Greater use of optics to reduce power in switch.
May 28th, 2002 Nick McKeown 40
What I believe is most likely
The need for capacity and reliability will mean:
Widespread replacement of core routers with transport switching based on circuits: Circuit switches have proved simpler, more reliable,
lower power, higher capacity and lower cost per Gb/s. Eventually, this is going to matter.
Internet will evolve to become edge routers interconnected by rich mesh of WDM circuit switches.