department of computer and it engineering university of kurdistan computer networks ii

56
Department of Computer and IT Department of Computer and IT Engineering Engineering University of Kurdistan University of Kurdistan Computer Networks II Router Architecture By: Dr. Alireza Abdollahpouri By: Dr. Alireza Abdollahpouri

Upload: madison-harrington

Post on 01-Jan-2016

20 views

Category:

Documents


0 download

DESCRIPTION

Department of Computer and IT Engineering University of Kurdistan Computer Networks II Router Architecture By: Dr. Alireza Abdollahpouri. What is Routing and forwarding?. R3. R1. R4. D. A. B. E. R2. C. R5. F. 2. Introduction. History …. 3. Introduction. History …. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Department of Computer and IT Engineering University of Kurdistan Computer Networks II

Department of Computer and IT EngineeringDepartment of Computer and IT EngineeringUniversity of KurdistanUniversity of Kurdistan

Computer Networks IIRouter Architecture

By: Dr. Alireza AbdollahpouriBy: Dr. Alireza Abdollahpouri

Page 2: Department of Computer and IT Engineering University of Kurdistan Computer Networks II

What is Routing and forwarding?

A

B

C

R1

R2

R3

R4 D

E

FR5

2

Page 3: Department of Computer and IT Engineering University of Kurdistan Computer Networks II

History …

Introduction

3

Page 4: Department of Computer and IT Engineering University of Kurdistan Computer Networks II

History …

And future trends!And future trends!

Introduction

4

Page 5: Department of Computer and IT Engineering University of Kurdistan Computer Networks II

Cisco GSR 12416 Juniper M160

6ft

19”

2ft

Capacity: 160Gb/sPower: 4.2kW

3ft

2.5ft

19”

Capacity: 80Gb/sPower: 2.6kW

What a Router Looks Like

5

Page 6: Department of Computer and IT Engineering University of Kurdistan Computer Networks II

Basic network system functionality Address lookup Packet forwarding and routing Fragmentation and re-assembly Security Queuing Scheduling Packet classification Traffic measurement …

Packet Processing FunctionsPacket Processing Functions

6

Page 7: Department of Computer and IT Engineering University of Kurdistan Computer Networks II

1. Accept packet arriving on an ingress line.

2. Lookup packet destination address in the

forwarding table, to identify outgoing interface(s).

3. Manipulate packet header: e.g., decrement TTL,

update header checksum.

4. Send packet to outgoing interface(s).

5. Queue until line is free.

6. Transmit packet onto outgoing line.

Per-packet Processing in a Router

7

Page 8: Department of Computer and IT Engineering University of Kurdistan Computer Networks II

Control Plane May be Slow“Typically in Software”

Data plane (per-packet processing) Must be fast“Typically in Hardware”

• Switching•Arbitration•Scheduling

• Routing Lookup• Packet Classifier

Routing - Routing table update (OSPF, RIP, IS-IS) - Admission Control - Congestion Control - Reservation

SwitchingSwitching

Basic Architecture of a Router

How packets get forwarded

How routing protocols establish routes/etc

8

Page 9: Department of Computer and IT Engineering University of Kurdistan Computer Networks II

9

Generic Router Architecture

Data Hdr

Data Hdr

Data Hdr

BufferManager

BufferMemory

BufferMemory

BufferManager

BufferMemory

BufferMemory

BufferManager

BufferMemory

BufferMemory

Data Hdr

Data Hdr

Data Hdr

Page 10: Department of Computer and IT Engineering University of Kurdistan Computer Networks II

Interconnect scheduling

Route lookup

TTL proces

sing

Buffering

Buffering

QoS sched

uling

Control plane

Ingress linecard Egress linecardInterconnect

Framing

Framing

Data path

Control path

Scheduling path

Functions in a Packet Switch

usually multiple usage of memory (DRAM for packet buffer,

SRAM for queues and tables)10

Page 11: Department of Computer and IT Engineering University of Kurdistan Computer Networks II

Line Card Picture

11

Page 12: Department of Computer and IT Engineering University of Kurdistan Computer Networks II

Major Components of Routers: Interconnect

Interconnect Input Ports to Output Ports, includes 3 modes Bus

All Input ports transfer data through the shared bus. Problem : Often cause in data flow congestion.

Shared Memory Input port write data into the share memory. After destination lookup is performed, the

output port read data from the memory. Problem : Require fast memory read/write and management technology.

Crossbar N input ports has dedicated data path to N output ports. Result in N*N switching matrix. Problem : Blocking (Input, Output, Head-of-line HOL). Max switch load for random traffic

is about 59%.

BusBusShared MemoryShared Memory

CrossbarCrossbar

MemoryMemory

12

Page 13: Department of Computer and IT Engineering University of Kurdistan Computer Networks II

Interconnects: Two basic techniques

Input Queueing Output Queueing

Usually a non-blockingswitch fabric (e.g. crossbar)

13

Page 14: Department of Computer and IT Engineering University of Kurdistan Computer Networks II

Output Queued (OQ) Switch

How an OQ Switch Works

14

Page 15: Department of Computer and IT Engineering University of Kurdistan Computer Networks II

Input Queueing: Head of Line Blocking

Del

ay

Load58.6% 100%

15

Page 16: Department of Computer and IT Engineering University of Kurdistan Computer Networks II

Head of Line Blocking

16

Page 17: Department of Computer and IT Engineering University of Kurdistan Computer Networks II

17

Page 18: Department of Computer and IT Engineering University of Kurdistan Computer Networks II

18

Page 19: Department of Computer and IT Engineering University of Kurdistan Computer Networks II

Virtual Output Queues (VoQ)

Virtual Output Queues: At each input port, there are N queues – each

associated with an output port Only one packet can go from an input port at a time Only one packet can be received by an output port

at a time It retains the scalability of FIFO input-queued switches It eliminates the HoL problem with FIFO input Queues

19

Page 20: Department of Computer and IT Engineering University of Kurdistan Computer Networks II

Input Queueing: Virtual output queues

20

Page 21: Department of Computer and IT Engineering University of Kurdistan Computer Networks II

Del

ay

Load100%

Input Queueing: Virtual output queues

21

Page 22: Department of Computer and IT Engineering University of Kurdistan Computer Networks II

The Evolution of Router Architecture

First Generation Routers

Modern Routers

22

Page 23: Department of Computer and IT Engineering University of Kurdistan Computer Networks II

RouteTableCPU Buffer

Memory

LineInterface

MAC

LineInterface

MAC

LineInterface

MAC

First Generation Routers

Shared Backplane

Line Interfac

eCPU

Mem

ory

Bus-based Router Architectures with Single ProcessorBus-based Router Architectures with Single Processor23

Page 24: Department of Computer and IT Engineering University of Kurdistan Computer Networks II

Based on software implementations on a single CPU.

Limitations: Serious processing bottleneck in the central

processor Memory intensive operations (e.g. table lookup

& data movements) limits the effectiveness of processor power

First Generation Routers

24

Page 25: Department of Computer and IT Engineering University of Kurdistan Computer Networks II

Second Generation Routers

RouteTableCPU

LineCard

BufferMemory

LineCard

MAC

BufferMemory

LineCard

MAC

BufferMemory

FwdingCache

FwdingCache

FwdingCache

MAC

BufferMemory

Bus-based Router Architectures with Multiple Processors

Bus-based Router Architectures with Multiple Processors

25

Page 26: Department of Computer and IT Engineering University of Kurdistan Computer Networks II

Architectures with Route Caching Distribute packet forwarding operations Network interface cards

Processors Route caches

Packets are transmitted once over the shared bus Limitations:

The central routing table is a bottleneck at high-speeds Traffic dependent throughput (cache) Shared bus is still a bottleneck

Second Generation Routers

26

Page 27: Department of Computer and IT Engineering University of Kurdistan Computer Networks II

LineCard

MAC

LocalBuffer

Memory

CPUCard

LineCard

MAC

LocalBuffer

Memory

Switched Backplane

Line Interfac

eCPU

Mem

ory FwdingTable

RoutingTable

FwdingTable

Third Generation Routers

Switch-based Architectures with Fully Distributed ProcessorsSwitch-based Architectures with Fully Distributed Processors27

Page 28: Department of Computer and IT Engineering University of Kurdistan Computer Networks II

To avoid bottlenecks:

Processing power

Memory bandwidth

Internal bus bandwidth

Each network interface is equipped with appropriate processing power and buffer space.

Data vs. control plane

• Data plane – line cards

• Control plane - processor

Third Generation Routers

28

Page 29: Department of Computer and IT Engineering University of Kurdistan Computer Networks II

Switch Core Linecards

Optical links

100sof metres

0.3 - 10Tb/s routers in development

Fourth Generation Routers/Switches

Optics inside a router for the first timeOptics inside a router for the first time

29

Page 30: Department of Computer and IT Engineering University of Kurdistan Computer Networks II

Do we still higher processing power in networking devices?

Of course, YESBut why? and how?

Demand for More Powerful Routers

30

Page 31: Department of Computer and IT Engineering University of Kurdistan Computer Networks II

Processing Complexity

Hundreds of instructions per

packet

Thousands of instructions per

packetLayer 2

switchingIPv4

routingFlow

ClassificationEncryption

Intrusiondetection

packet inter-arrival time (for 40Gbps):Big packet: 300 nsSmall packet: 12 ns

Beyond the moore’s lawBeyond the moore’s law

Demands for Faster Routers (why?)

31

Page 32: Department of Computer and IT Engineering University of Kurdistan Computer Networks II

Future applications will demand TIPS

Demands for Faster Routers (why?)

32

Page 33: Department of Computer and IT Engineering University of Kurdistan Computer Networks II

Future applications will demand TIPS Power? Heat?

Demands for Faster Routers (why?)

33

Page 34: Department of Computer and IT Engineering University of Kurdistan Computer Networks II

Technology push:- Link bandwidth scaling much faster than CPU and memory

technology

- Transistor scaling and VLSI technology help but not enough

Demands for Faster Routers (summary)

Application pull:

- More complex applications are required

- Processing complexity is defined as the number of instructions

and number of memory access to process one packet

34

Page 35: Department of Computer and IT Engineering University of Kurdistan Computer Networks II

“Future applications will demand TIPS”

“Think platform beyond a single processor”

“Exploit concurrency at multiple levels”

“Power will be the limiter due to complexity and leakage”

Distribute workload on multiple cores

Demands for faster routers (How?)

35

Page 36: Department of Computer and IT Engineering University of Kurdistan Computer Networks II

Symmetric multi-processors allow multi-threaded applications to achieve higher performance at less die area and power consumption than single-core processors

Asymmetric multi-processors consume power and provide increased computational power only on demand

Multi-Core Processors

36

Page 37: Department of Computer and IT Engineering University of Kurdistan Computer Networks II

Performance Bottlenecks

Memory Bandwidth available, but access time too slow Increasing delay for off-chip memory

I/O High-speed interfaces available Cost problem with optical interfaces

Internal Bus Can be solved with an effective switch, allowing simultaneous transfers between network interfaces

Processing power Individual cores are getting more complex Problems with access to shared resources Control processor can become bottleneck

37

Page 38: Department of Computer and IT Engineering University of Kurdistan Computer Networks II

Different Solutions

• ASIC• FPGA• NP• GPP

Flexibility

Performance

ASIC

GPP

FPGA

NP

38

Page 39: Department of Computer and IT Engineering University of Kurdistan Computer Networks II

By: Niraj Shah

Different Solutions

39

Page 40: Department of Computer and IT Engineering University of Kurdistan Computer Networks II

“It is always something(corollary). Good, Fast, Cheap:

Pick any two (you can’t have all three).”

RFC1925“The Twelve Networking Truths”

“It is always something(corollary). Good, Fast, Cheap:

Pick any two (you can’t have all three).”

RFC1925“The Twelve Networking Truths”

40

Page 41: Department of Computer and IT Engineering University of Kurdistan Computer Networks II

High cost to develop Network processing moderate quantity market

Long time to market Network processing quickly changing services

Difficult to simulate Complex protocol

Expensive and time-consuming to change Little reuse across products Limited reuse across versions No consensus on framework or supporting chips Requires expertise

Why not ASIC?Why not ASIC?

41

Page 42: Department of Computer and IT Engineering University of Kurdistan Computer Networks II

• Introduced several years ago (1999+)

• A way to introduce flexibility and programmability

in network processing

• Many players were there (Intel, Motorola, IBM)

• Only a few players still there

Network Processors

42

Page 43: Department of Computer and IT Engineering University of Kurdistan Computer Networks II

Intel IXP 2800

Initial release August 200343

Page 44: Department of Computer and IT Engineering University of Kurdistan Computer Networks II

CPU-level flexibility – A giant step forward compared to ASICs

How? – Hardware coprocessors – Memory hierarchies – Multiple hardware threads (zero context switching overhead) – Narrow (and multiple) memory buses – Some other ad-hoc solutions for network processing, e.g., Fast switching fabric, memory accesses, etc

What Was Correct With NPs?

44

Page 45: Department of Computer and IT Engineering University of Kurdistan Computer Networks II

What Was Wrong With NPs?

Programmability issues

– Completely new programming paradigm

– Developers are not familiar with the unprecedented

parallelism of the NPU, They do not know how to

exploit it at best

– New (proprietary) languages

– Portability among different network processors

families

45

Page 46: Department of Computer and IT Engineering University of Kurdistan Computer Networks II

What Happened in NP Market?

Intel went out of the market in 2007

Many other small players disappeared

High risk when selecting a NP maker that may disappear

46

Page 47: Department of Computer and IT Engineering University of Kurdistan Computer Networks II

Every old idea will be proposed again with a different name and a

different presentation, regardless of whether it works.

RFC1925“The Twelve Networking Truths”

Every old idea will be proposed again with a different name and a

different presentation, regardless of whether it works.

RFC1925“The Twelve Networking Truths”

47

Page 48: Department of Computer and IT Engineering University of Kurdistan Computer Networks II

Processing in General-purpose CPUs

CPUs optimized for few threads, high performance per thread

– High CPU frequencies – Maximize instruction-level parallelism • Pipeline • Superscalar • Out-of-order execution • Branch prediction • Speculative loads

Software Routers

48

Page 49: Department of Computer and IT Engineering University of Kurdistan Computer Networks II

Aim: Low cost, flexibility and extensibility

Linux on PC with a bunch of NICs

Changing a functionality is as simple as a

software upgrade

Software Routers

49

Page 50: Department of Computer and IT Engineering University of Kurdistan Computer Networks II

• RouteBricks [SOSP’09] Uses Intel Nehalem architecture

• Packet shader [SIGCOMM’10] GPU-Accelerated Developed in KAIST, Korea

Software Routers (examples)

50

Page 51: Department of Computer and IT Engineering University of Kurdistan Computer Networks II

Intel Nehalem Architecture

C0

L3 Common Cache

C1

C2

C3

51

Page 52: Department of Computer and IT Engineering University of Kurdistan Computer Networks II

NUMA architecture: The latency to access the local memory is, approximately, 65 nano-seconds. The latency to access the remote memory is, approximately, 105 nano-seconds

Bandwidth through of the QPI link is 12.8 GB/s

Three DDR3 channels to local DRAM support a bandwidth of 31.992GB/s

Intel Nehalem Architecture

52

Page 53: Department of Computer and IT Engineering University of Kurdistan Computer Networks II

Shared L3 Cache

I/Ocontroller

hub

IMC3 channels

DRAM

DRAM

DRAM

PCI slots

PCI slots

PCI slots

QPI

PCI busnetwork card

disk

file system

communication system

application

file systemcommunication

system

application

disk network cardL2

cache

QPI2

QPI1

Powerand clock

L2cache

L2cache

L2cache

Core

0

Core

1

Core

2

Core

3

Nehalem Quadcore

L1-I L1-D L1-I L1-D L1-I L1-D L1-I L1-D

Intel Nehalem Architecture

53

Page 54: Department of Computer and IT Engineering University of Kurdistan Computer Networks II

Other Possible PlatformsOther Possible Platforms

Intel Westmere-EP Intel Jasper Forest

54

Page 55: Department of Computer and IT Engineering University of Kurdistan Computer Networks II

Pipeline Parallel

Hybrid

Workload Partitioning (parallelization)Workload Partitioning (parallelization)

55

Page 56: Department of Computer and IT Engineering University of Kurdistan Computer Networks II

Questions!Questions!Questions!Questions!