may 28th, 2002nick mckeown 1 scaling routers: where do we go from here? hpsr, kobe, japan may 28 th,...

40
1 Nick McKeown May 28th, 2002 Scaling routers: Where do we go from here? HPSR, Kobe, Japan May 28 th, 2002 Nick McKeown Professor of Electrical Engineering and Computer Science, Stanford University [email protected] www.stanford.edu/~nickm

Post on 19-Dec-2015

214 views

Category:

Documents


0 download

TRANSCRIPT

1Nick McKeownMay 28th, 2002

High PerformanceSwitching and RoutingTelecom Center Workshop: Sept 4, 1997.

Scaling routers: Where do we go from here?

HPSR, Kobe, JapanMay 28th, 2002

Nick McKeownProfessor of Electrical Engineering and Computer Science, Stanford University

[email protected]/~nickm

May 28th, 2002 Nick McKeown 2

Relative performance increase

1

10

100

1000

1990 1992 1994 1996 1998 2000 2002

Router capacityx2.2/18 months

Moore’s lawx2/18 m

May 28th, 2002 Nick McKeown 3

Relative performance increase

1

10

100

1000

1990 1992 1994 1996 1998 2000 2002

Router capacityx2.2/18 months

DRAM access rate x1.1/18 m

Moore’s lawx2/18 m

May 28th, 2002 Nick McKeown 4

Router vital statisticsCisco GSR 12416 Juniper M160

6ft

19”

2ft

Capacity: 160Gb/sPower: 4.2kW

3ft

2.5ft

19”

Capacity: 80Gb/sPower: 2.6kW

May 28th, 2002 Nick McKeown 5

Relative performance increase

0

200

400

600

800

1000

1200

2002 2004 2006 2008 2010 2012

Internettraffic x2/yr

Router capacityx2.2/18 months

5x

May 28th, 2002 Nick McKeown 6

POP with smaller routersPOP with large routers

Interfaces: Price >$200k, Power > 400W About 50-60% of interfaces are used for interconnection

within the POP. Industry trend is towards large, single router per POP.

Fast (large) routers Big POPs need big routers

May 28th, 2002 Nick McKeown 7

Job of router architect

For a given set of features:

3

. . 5

2

Maximize capacity,

Power,

Volume,

C

P kW

V

t

m

s

May 28th, 2002 Nick McKeown 8

Mind the gap

Operators are unlikely to deploy 5 times as many POPs, or make them 5 times bigger, with 5 times the power consumption.

Our options:1. Make routers simple2. Use more parallelism3. Use more optics

May 28th, 2002 Nick McKeown 9

Mind the gap

Operators are unlikely to deploy 5 times as many POPs, or make them 5 times bigger, with 5 times the power consumption.

Our options:1. Make routers simple2. Use more parallelism3. Use more optics

May 28th, 2002 Nick McKeown 10

Make routers simple

We tell our students that Internet routers are simple. All routers do is make a forwarding decision, update a header, then forward packets to the correct outgoing interface.

But I don’t understand them anymore. List of required features is huge and still growing, Software is complex and unreliable, Hardware is complex and power-hungry.

May 28th, 2002 Nick McKeown 11

Router linecard

PhysicalLayer

Framing&

Maintenance

PacketProcessing

Buffer Mgmt&

Scheduling

Buffer Mgmt&

Scheduling

Buffer & StateMemory

Buffer & StateMemory

OC192c linecard

30M gates 2.5Gbits of memory 1m2

$25k cost, $200k price.

LookupTables

Optics

SchedulerScheduler

May 28th, 2002 Nick McKeown 12

Things that slow routers down

250ms of buffering Requires off-chip memory, more board space, pins and power.

Multicast Affects everything! Complicates design, slows deployment.

Latency bounds Limits pipelining.

Packet sequence Limits parallelism.

Small internal cell size Complicates arbitration.

DiffServ, IntServ, priorities, WFQ etc. Others: IPv6, Drop policies, VPNs, ACLs, DOS traceback,

measurement, statistics, …

May 28th, 2002 Nick McKeown 13

An example: Packet processing

1

10

100

1000

1996 1997 1998 1999 2000 2001

CPU Instructions per minimum length packet since 1996

May 28th, 2002 Nick McKeown 14

Reducing complexityConclusion

Need aggressive reduction in complexity of routers.

Get rid of irrelevant requirements and irrational tests.

It is not clear who has the right incentive to make this happen.

Else, be prepared for core routers to be replaced by optical circuit switches.

May 28th, 2002 Nick McKeown 15

Mind the gap

Operators are unlikely to deploy 5 times as many POPs, or make them 5 times bigger, with 5 times the power consumption.

Our options:1. Make routers simpler2. Use more parallelism3. Use more optics

May 28th, 2002 Nick McKeown 16

Use more parallelism

Parallel packet buffers Parallel lookups Parallel packet switches

Things that make parallelism hard: Maintaining packet order, Making throughput guarantees, Making delay guarantees, Latency requirements, Multicast.

May 28th, 2002 Nick McKeown 17

Parallel Packet Switches

1

2

k

1

N

rate, R

rate, R

rate, R

rate, R

1

N

Router

Bufferless

May 28th, 2002 Nick McKeown 18

Characteristics

Advantages kmemory bandwidth klookup/classification rate k routing/classification table size

With appropriate algorithms Packets remain in order, 100% throughput, Delay guarantees (at least in theory).

May 28th, 2002 Nick McKeown 19

Mind the gap

Operators are unlikely to deploy 5 times as many POPs, or make them 5 times bigger, with 5 times the power consumption.

Our options:1. Make routers simpler2. Use more parallelism3. Use more optics

May 28th, 2002 Nick McKeown 20

A router is a packet-switch, and so requires A switch fabric, Per-packet address lookup, Large buffers for times of congestion.

Packet processing/buffering infeasible with

optics A typical 10 Gb/s router linecard has 30 Mgates and 2.5

Gbits of memory.

Research Problem How to optimize the architecture of a router that uses an

optical switch fabric?

All-optical routers don’t make sense

May 28th, 2002 Nick McKeown 21

100Tb/s optical routerStanford University Research Project

Collaboration 4 Professors at Stanford (Mark Horowitz, Nick

McKeown, David Miller and Olav Solgaard), and our groups.

Objective To determine the best way to incorporate optics into

routers. Push technology hard to expose new issues.

• Photonics, Electronics, System design Motivating example: The design of a 100 Tb/s

Internet router• Challenging but not impossible (~100x current commercial

systems)• It identifies some interesting research problems

May 28th, 2002 Nick McKeown 22

Arbitration

160Gb/s

40Gb/s

40Gb/s

40Gb/s

40Gb/s

OpticalSwitch

• Line termination

• IP packet processing

• Packet buffering

• Line termination

• IP packet processing

• Packet buffering

160-320Gb/s

160-320Gb/s

Electronic

Linecard #1ElectronicLinecard #625

Request

Grant

(100Tb/s = 625 * 160Gb/s)

100Tb/s optical router

May 28th, 2002 Nick McKeown 23

Research Problems

Linecard Memory bottleneck: Address lookup and packet

buffering.

Architecture Arbitration: Computation complexity.

Switch Fabric Optics: Fabric scalability and speed, Electronics: Switch control and link electronics, Packaging: Three surface problem.

May 28th, 2002 Nick McKeown 24

160Gb/s Linecard: Packet Buffering

Problem Packet buffer needs density of DRAM (40 Gbits) and speed of SRAM (2ns

per packet)

Solution Hybrid solution uses on-chip SRAM and off-chip DRAM. Identified optimal algorithms that minimize size of SRAM (12 Mbits). Precisely emulates behavior of 40 Gbit, 2ns SRAM.

DRAM DRAM DRAM

160 Gb/s 160 Gb/s

Queue Manager

klamath.stanford.edu/~nickm/papers/ieeehpsr2001.pdf

SRAM

May 28th, 2002 Nick McKeown 25

The Arbitration Problem

A packet switch fabric is reconfigured for every packet transfer.

At 160Gb/s, a new IP packet can arrive every 2ns.

The configuration is picked to maximize throughput and not waste capacity.

Known algorithms are too slow.

May 28th, 2002 Nick McKeown 26

Approach

We know that a crossbar with VOQs, and uniform Bernoulli i.i.d. arrivals, gives 100% throughput for the following scheduling algorithms: Pick a permutation uar from all permutations. Pick a permutation uar from the set of size N in which each

input-output pair (i,j) are connected exactly once in the set. From the same set as above, repeatedly cycle through a

fixed sequence of N different permutations.

Can we make non-uniform, bursty traffic uniform “enough” for the above to hold?

May 28th, 2002 Nick McKeown 27

2-Stage SwitchExternal Outputs

Internal Inputs

1

N

ExternalInputs

Spanning Set of Permutations

Spanning Set of Permutations

1

N

1

N

Recently shown to have 100% throughput Mild conditions: weakly mixing arrival processes

C.S.Chang et al.: http://www.ee.nthu.edu.tw/~cschang/PartI.pdf

May 28th, 2002 Nick McKeown 28

2-Stage SwitchExternal Outputs

Internal Inputs

1

N

ExternalInputs Spanning Set of

PermutationsSpanning Set of Permutations

1

N

1

N

( )a t ( )b t

1( )t 2 ( )t( )q t

1 1

21

1.

1 1 1li

( ) ( ) ( ) ( ) ( )

( ) (m 0.)

Long-term, service opportunities exceed arrivals:

t

b tE E E E eN

e eN

t a t t a

N

t

b t t

May 28th, 2002 Nick McKeown 29

Problem: Unbounded Mis-sequencing

External Outputs

Internal Inputs

1

N

ExternalInputs

Spanning Set of Permutations

Spanning Set of Permutations

1

N

1

N

11

2

2

Side-note: Mis-sequencing is maximized when arrivals are uniform.

May 28th, 2002 Nick McKeown 30

Preventing Mis-sequencing

1

N

Spanning Set of Permutations

Spanning Set of Permutations

1

N

1

N

The Full Frames First algorithm: Keep packets ordered and Guarantees a delay bound within the optimum

Infocom’02: klamath.stanford.edu/~nickm/papers/infocom02_two_stage.pdf

Small CoordinationBuffers & ‘FFF’ Algorithm

Large CongestionBuffers

May 28th, 2002 Nick McKeown 31

1

2

3

Phase 2

Phase 1

Idea: Use a single-stage twice

ExampleOptical 2-stage Switch

Lookup

Buffer

Lookup

Buffer

Lookup

Buffer

Linecards

May 28th, 2002 Nick McKeown 32

ExamplePassive Optical 2-Stage “Switch”

R/N

R/N

R/N

R/N

R/N

R/N

IngressLinecard 1

IngressLinecard 2

IngressLinecard n

MidstageLinecard 1

MidstageLinecard 2

MidstageLinecard n

EgressLinecard 1

EgressLinecard 2

EgressLinecard n

It is helpful to think of it as spreading rather than switching.

May 28th, 2002 Nick McKeown 33

2-Stage spreading

1 11

N

Buffer stage

N N

May 28th, 2002 Nick McKeown 34

Passive Optical Switching

1, , n

1, , n

1, , n

1 1

2 2

n n

1, , n

1, , n

1, , n

1

2

n

MidstageLinecard 1

MidstageLinecard 2

MidstageLinecard n

IngressLinecard 1

IngressLinecard 2

IngressLinecard n

1, , n

1, , n

1, , n

1, , n

1, , n

1, , n

1

2

n

EgressLinecard 1

EgressLinecard 2

EgressLinecard n

1 1

2 2

n n

Integrated AWGR ordiffraction grating based wavelength router

May 28th, 2002 Nick McKeown 35

100Tb/s Router

Optical SwitchFabric

Racks of 160Gb/sLinecards

Optical links

May 28th, 2002 Nick McKeown 36

Racks with 160Gb/s linecards

DRAM DRAM DRAM

Queue ManagerSRAM

Lookup

DRAM DRAM DRAM

Queue ManagerSRAM

Lookup

May 28th, 2002 Nick McKeown 37

Additional Technologies

Demonstrated or in development Chip to chip optical interconnects with total

power dissipations of several mW. Demonstration of wavelength division

multiplexed chip interconnect. Integrated laser modulators. 8Gsample/s serial links. Low-power variable power supply serial links. Integrated array waveguide routers.

TX TX TX TX

TX

-PL

L

TX

-DL

L

TestingInterface

RX

RX-DLL

RXRX-PLL

TX/RXFeedback

Biasing

data gen data gen

Dig

ita

l Slid

ing

Co

ntr

olle

r

Bu

ck

Co

nv

ert

er

Po

we

rT

ran

sis

tors

PRBSPRBS

40 μm

May 28th, 2002 Nick McKeown 38

Mind the gap

Operators are unlikely to deploy 5 times as many POPs, or make them 5 times bigger, with 5 times the power consumption.

Our options:1. Make routers simpler2. Use more parallelism3. Use more optics

May 28th, 2002 Nick McKeown 39

Some predictions about core Internet routers

The need for more capacity for a given power and volume budget will mean:

Fewer functions in routers: Little or no optimization for multicast, Continued overprovisioning will lead to little or no support

for QoS, DiffServ, …, Fewer unnecessary requirements:

Mis-sequencing will be tolerated, Latency requirements will be relaxed.

Less programmability in routers, and hence no network processors.

Greater use of optics to reduce power in switch.

May 28th, 2002 Nick McKeown 40

What I believe is most likely

The need for capacity and reliability will mean:

Widespread replacement of core routers with transport switching based on circuits: Circuit switches have proved simpler, more reliable,

lower power, higher capacity and lower cost per Gb/s. Eventually, this is going to matter.

Internet will evolve to become edge routers interconnected by rich mesh of WDM circuit switches.