dd2490 p4 2010 routing architecture and forwarding · •bridging - forwarding on layer 2 –a mac...

36
1 Routing architecture and forwarding (Intro to Homework 4) Olof Hagsand KTH /CSC DD2490 p4 2010

Upload: others

Post on 08-Oct-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: DD2490 p4 2010 Routing architecture and forwarding · •Bridging - forwarding on layer 2 –A MAC address/ID has a flat structure •many nodes -> large forwarding tables •broadcast

1

Routing architecture and forwarding

(Intro to Homework 4)

Olof Hagsand KTH /CSC

DD2490 p4 2010

Page 2: DD2490 p4 2010 Routing architecture and forwarding · •Bridging - forwarding on layer 2 –A MAC address/ID has a flat structure •many nodes -> large forwarding tables •broadcast

2

Connecting devices

Connectingdevices

Networkingdevices

Internetworkingdevices

Applicationgateway

RouterBridge/Switch

Hub/Repeater

L1 L2 L3 L4-L7

Page 3: DD2490 p4 2010 Routing architecture and forwarding · •Bridging - forwarding on layer 2 –A MAC address/ID has a flat structure •many nodes -> large forwarding tables •broadcast

3

IEEE 802 vs IPv4 addresses

1011110110111101

Group/Individual bit

Global/Localbit

1011110101110101 1011110110111101 1011110101110101 1011110110111101 1011110101110101

vendor code vendor assigned

IEEE802

IPv4 addr1011110111000000 1011110100100100 1011110101111101 1011110100010010

netid hostid

192.36.125.18

00:0E:35:64:E9:E7

Page 4: DD2490 p4 2010 Routing architecture and forwarding · •Bridging - forwarding on layer 2 –A MAC address/ID has a flat structure •many nodes -> large forwarding tables •broadcast

4

Routing vs bridging

•Bridging - forwarding on layer 2

–A MAC address/ID has a flat structure

• many nodes -> large forwarding tables

• broadcast reaches all nodes

–Simple to configure and manage, cheaper

–Loops detected by spanning tree protocol

•Routing – forwarding on layer 3

–The netid of the IP addresses can be aggregated

• many nodes -> smaller forwarding tables than bridging

• routers partition broadcast domains

–Routing is more difficult to configure

–Loops detected by routing protocols and TTL decrementation

Page 5: DD2490 p4 2010 Routing architecture and forwarding · •Bridging - forwarding on layer 2 –A MAC address/ID has a flat structure •many nodes -> large forwarding tables •broadcast

5

What does a router do?

• Packet forwarding• Not only IPv4:• IPv6, MPLS, Bridging/VLAN, Tunneling,...• Filter packets•Access lists• Metering/Shaping/Policing• Compute routes: build forwarding table• In the background: routing• In real-time: forwarding

Classifier Lookup Metering Shaping

Page 6: DD2490 p4 2010 Routing architecture and forwarding · •Bridging - forwarding on layer 2 –A MAC address/ID has a flat structure •many nodes -> large forwarding tables •broadcast

6

Router Components

CPU

RoutingTable

Memory

Line cards

External links

CPU module”Control Processor”

”Routing Engine”

MAC

Memory

PacketProcessing

MAC

Memory

PacketProcessing

Interconnect

MAC

Memory

PacketProcessing

Examine headers, routing decision...

Input buffering,waiting for access to output port...

Output buffering, waiting for

transmission...

QoS scheduling...

Execute routing protocols,compute routing table,configure line cards...

Page 7: DD2490 p4 2010 Routing architecture and forwarding · •Bridging - forwarding on layer 2 –A MAC address/ID has a flat structure •many nodes -> large forwarding tables •broadcast

7

CPU

RoutingTable

Memory

Fast path, slow path

•Fast path

–If line cards can determine outgoing port

•Slow path

–Control processor must determine outgoing port

Control Processor

LineCard

LineCard

LineCard

LineCard

Fast path

Slow path

Page 8: DD2490 p4 2010 Routing architecture and forwarding · •Bridging - forwarding on layer 2 –A MAC address/ID has a flat structure •many nodes -> large forwarding tables •broadcast

8

Inside a router, 1st Generation

•Every packet goes twice over the shared bus

•Constrained by Bus and memory bandwidth (per byte cost)

•And CPU cycles (per packet cost)

LineCard

LineCard

LineCard

BufferMemoryCPU RIB

Shared bus backplane

Page 9: DD2490 p4 2010 Routing architecture and forwarding · •Bridging - forwarding on layer 2 –A MAC address/ID has a flat structure •many nodes -> large forwarding tables •broadcast

9

Line Card

BufferMemory

forwarder

Line Card

BufferMemory

forwarder

Line Card

BufferMemory

forwarder

Line Card

BufferMemory

forwarder

Inside a hardware-based router

•Multiple simultaneous transfers over the backplane

•Specialized hardware: ASICs (Application Specific IC)

•Wirespeed at 100 Gb/s and beyond

CPU

RIB

CPU Card

Switched backplane

Page 10: DD2490 p4 2010 Routing architecture and forwarding · •Bridging - forwarding on layer 2 –A MAC address/ID has a flat structure •many nodes -> large forwarding tables •broadcast

10

Crossbar Architecture

•Space division approach

•Switched interconnection between input and output

•Centralized controller

–coordinates input-output ports

–activates paths between ports

•Multiple transfers can proceed simultaneously

•Crossbar is non-blocking

switching fabric

input ports

output ports

1

2

N

.

.

.

1 2 M

controller

. . .

interface logic

Page 11: DD2490 p4 2010 Routing architecture and forwarding · •Bridging - forwarding on layer 2 –A MAC address/ID has a flat structure •many nodes -> large forwarding tables •broadcast

11

Shared Bus Architecture

•Relies on time division – internal data path is shared

•Address, control, and data lines and a bus protocol

•Granularity

–Packet granularity: simple, but may result in delay problems

–Block granularity: more overhead, but avoids long delays

output ports

1 2 M. . .

input ports

1 2 N. . .

shared bus

Page 12: DD2490 p4 2010 Routing architecture and forwarding · •Bridging - forwarding on layer 2 –A MAC address/ID has a flat structure •many nodes -> large forwarding tables •broadcast

12

Routing table lookup •Longest prefix first

•Divide table in 32 ”buckets” - one for each netmask length

•Match destination with longest prefixes first

•SW algorithms: tree, binary trees, tries (different data structures)

•HW support: TCAMs – Content Addressable Memory

Netid

Netid

...

0

1

32

31

Masklen

destination IP address

Page 13: DD2490 p4 2010 Routing architecture and forwarding · •Bridging - forwarding on layer 2 –A MAC address/ID has a flat structure •many nodes -> large forwarding tables •broadcast

13

Using a Trie for lookup

•Binary tree

–Nodes are prefixes

–Left branch represents ´0´in the string

–Right branch represents ´1´

e

011*

f g

c01*

0*

a*

10*

110* d

b

1*

0010 0110 0111

a *

b 10*

c 01*

d 110*

e 0010

f 0110

g 0111

00*

000*

11*

Page 14: DD2490 p4 2010 Routing architecture and forwarding · •Bridging - forwarding on layer 2 –A MAC address/ID has a flat structure •many nodes -> large forwarding tables •broadcast

14

Elimination of Internal Prefixes

•No overlapping prefixes

•Prefix expansion with ”leaf pushing”

•Simplifies lookup at expense of larger memory

a a e a

00*

c c f g

01*

*

b b b b

10*

d d a a

11*

a *

b 10*

c 01*

d 110*

e 0010

f 0110

g 0111

Page 15: DD2490 p4 2010 Routing architecture and forwarding · •Bridging - forwarding on layer 2 –A MAC address/ID has a flat structure •many nodes -> large forwarding tables •broadcast

15

TCAM

Linear Search on Values—TCAM

•Ternary Content-Addressable Memory

–Fully associative memory

•Three values for each bit—’0’, ’1’, and ’x’ (don’t care)

•Compare input with all words in parallel

–First match gives the result

•Up to 100 million searches per second

a *

b 10*

c 01*

d 110*

e 0010

f 0110

g 0111

0010 gfedcba

01100111110x01xx10xxxxxx

input

=

=

=

=

=

=

=

Page 16: DD2490 p4 2010 Routing architecture and forwarding · •Bridging - forwarding on layer 2 –A MAC address/ID has a flat structure •many nodes -> large forwarding tables •broadcast

16

24-bit prefixes

TCAM layout

•Route lookup in one memory access

•Prefixes ordered by length

•First match first

•Contents need to be sorted

32-bit prefixes

31-bit prefixes

8-bit prefixes

Page 17: DD2490 p4 2010 Routing architecture and forwarding · •Bridging - forwarding on layer 2 –A MAC address/ID has a flat structure •many nodes -> large forwarding tables •broadcast

17

Packet classification

•Map a packet to a class

•Class defined by filters, usually a 5-tuple: –<source IP, destination IP, source port, destination port, protocol>

•For example, all packets:

–From subnet N

–To TCP port 80 on web-server S

–From subnet N to port 666 on subnet M

•Applications:

–Firewall & NAT

–Blocking

–Accounting

–Policy routing

–QoS—metering, policing, DiffServ marking, ...

Page 18: DD2490 p4 2010 Routing architecture and forwarding · •Bridging - forwarding on layer 2 –A MAC address/ID has a flat structure •many nodes -> large forwarding tables •broadcast

18

Cisco 12816

6ft

19”

2ft

Capacity: 1.28 Tb/sPower: 4.7 kW

Port density examples

•30xOC-192 (10 Gb/s) ports

•120xOC-48 (2.5 Gb/s) ports

•15x10 Gigabit Ethernet ports

•60x1 Gigabit Ethernet ports

Page 19: DD2490 p4 2010 Routing architecture and forwarding · •Bridging - forwarding on layer 2 –A MAC address/ID has a flat structure •many nodes -> large forwarding tables •broadcast

19

Cisco CRS-1

CISCO's current flagship:Carrier- Routing System

3-stage multi-stage switching plane>50% of cost

Trie prefix lookup7.5kWEach slot has 40Gbps32Tbps raw bandwidthDistributed RPSeveral Logical Routers

Optical_Electric transitions:O-E-O-E-O-E-O

Page 20: DD2490 p4 2010 Routing architecture and forwarding · •Bridging - forwarding on layer 2 –A MAC address/ID has a flat structure •many nodes -> large forwarding tables •broadcast

20

Juniper Routers

•M-series

–Shipping started 1998

–M5, M10, M20, M40e, M160, M320

–8xOC-192 or 32xOC-48ports in a M160

•T-series

–Shipping started 2002

–T320, T640

–32xOC-192 or 128xOC-48ports in a T640

Juniper M160

3ft

2.5ft

19”

Capacity: 80Gb/sPower: 2.6kW

Page 21: DD2490 p4 2010 Routing architecture and forwarding · •Bridging - forwarding on layer 2 –A MAC address/ID has a flat structure •many nodes -> large forwarding tables •broadcast

21

Page 22: DD2490 p4 2010 Routing architecture and forwarding · •Bridging - forwarding on layer 2 –A MAC address/ID has a flat structure •many nodes -> large forwarding tables •broadcast

22

Juniper J-series

•J-series

–Routers used in labs

–Emulates M/T series

–Full routing software

Page 23: DD2490 p4 2010 Routing architecture and forwarding · •Bridging - forwarding on layer 2 –A MAC address/ID has a flat structure •many nodes -> large forwarding tables •broadcast

23

Open source routing•Linux, BSD platforms•Most routing protocols exist as open source projects (eg Quagga)•But PC hw has traditionally been a limiting factor•But now up to 2x12 core CPUs, inter-processor buses (HT, QPI), non-uniform memory (numa),multiple buses (PCI-E), 10Gbps NICs enables 10s of gigabit forwarding speeds.•Example: the Bifrost open source router (UU/KTH)

Page 24: DD2490 p4 2010 Routing architecture and forwarding · •Bridging - forwarding on layer 2 –A MAC address/ID has a flat structure •many nodes -> large forwarding tables •broadcast

24

Example: PC routing architecture

CPU 0CPU 1CPU 2CPU 3

CPU 4CPU 5CPU 6CPU 7

DDR3

DDR3

DDR3 DDR3

DDR3

DDR3

QPI

I/OHandler(North Bridge)

I/OHandler(North Bridge)

QPI QPI

PCI-Ex16

x4

x16

10Gb/s10Gb/s

10Gb/s10Gb/s

10Gb/s10Gb/s

10Gb/s10Gb/s

10Gb/s10Gb/s

10Gb/s10Gb/s

10Gb/s10Gb/s

10Gb/s10Gb/s

PCI-Ex16

x4

x16

• Multi-core CPUs: (Intel Nehalem) 8 cores, 16 with 'Hyper-threading'• Multi-channel: each network card has 8 DMA queues• NUMA: Non-local memory (many memory banks)• Inter-processor bus: QPI 2.4GHz ~76 GB/s• Memory: 1066 DDR3 68 GB/s x3 channels• I/O Bus: PCI-E gen2 x1 ~4GB/s: x16 ~64GB/s

Page 25: DD2490 p4 2010 Routing architecture and forwarding · •Bridging - forwarding on layer 2 –A MAC address/ID has a flat structure •many nodes -> large forwarding tables •broadcast

25

Homework 4

•4a) Write a report on how forwarding works•4b) Make a programming assignment in C

Part 1: Print out IPv4 destination addressPart 2: Make an IPv4 forwarding lookup

Page 26: DD2490 p4 2010 Routing architecture and forwarding · •Bridging - forwarding on layer 2 –A MAC address/ID has a flat structure •many nodes -> large forwarding tables •broadcast

26

Homework 4a) Report•The assignment is a report about forwarding for students with little programming experience.

•The report should in a terse (not wordy) format, describe the forwarding performed by a router in the form of an algorithm description. That is, a specification for an implementation.

•The report should list the necessary steps a router performs to forward a packet from an input Ethernet interface card to an output Ethernet interface card.

•The following steps should be covered:

MAC address lookup

IPv4 and IPv6 forwarding

Header sanity checks

Header modifications

Limited ICMP handling (at least one error case)

L2 header decapsulation and encapsulation

ARP lookup

Statistics: Interface packet and error counters.

Local delivery.

•The following steps need not be covered:

Full ICMP handling

Other protocols

IP options

Transport protocols/Socket handling

Page 27: DD2490 p4 2010 Routing architecture and forwarding · •Bridging - forwarding on layer 2 –A MAC address/ID has a flat structure •many nodes -> large forwarding tables •broadcast

27

Homework 4b: Part 1

You should read an Ethernet frame, identify it as an IPv4 packet, and print the IPv4 destination address.

Input: Ethernet packet. Example:0200 0000 00110200 000c 0001 0800 4500 0026 17d4 0000 ff01 8ffc 0a01 0002 0a02 0002 0000 e802 c04b 0004 3e89 339a 0786 d0ff 0009

Output: IPv4 address.Example:10.2.0.2

Errors:Error: packet too short: length of frame in bytesError: Not ipv4 payload: payload type

Page 28: DD2490 p4 2010 Routing architecture and forwarding · •Bridging - forwarding on layer 2 –A MAC address/ID has a flat structure •many nodes -> large forwarding tables •broadcast

28

Homework 4b: Part 2•The program should read a forwarding table and an Ethernet packet and extract the destination IP address, make a lookup in the forwarding table, and write the outgoing interface name.

•The assignment is a step towards a full forwarding but lacks several sanity checks, MAC address lookups and ARP. It is intended to illustrate how to inspect packet header, the use of pointers, buffers, and IP longest prefix match.

•The program should do the following:

Read a routing table from stdin. The routing table consists of a list of prefix, nexthop interface triples.

Read a single Ethernet (RFC894) packet from stdin.

Verify that the packet is long enough to contain an EThernet and IPv4 header

Verify that the Ethernet payload type is 0x0800 (IPv4)

Verify that the IP version field is 4

Extract the destination address from the IPv4 header and make a longest prefix match lookup and return the outgoing interface name.

•Example: Input

fib 10.1.0.0/24 e1

fib 10.2.0.0/24 e2

fib 10.3.0.0/24 e3

fib 0.0.0.0/0 e1

input 0200 0000 0001 0200 0000 0011 0800 4500 0026 17d4 0000 ff01 8ffc 0a01 0002 0a02 0002 0000 e802 c04b 0004 3e89 339a 0786 d0ff 0009

•Example Output:

e2

Page 29: DD2490 p4 2010 Routing architecture and forwarding · •Bridging - forwarding on layer 2 –A MAC address/ID has a flat structure •many nodes -> large forwarding tables •broadcast

29

Homework 4b: Kattis

•If you have registered, you should get a Kattis account •Use the link on the homework page and login•Submit by selecting

language: CSelect problem: forwarding (part 1), forwarding2 (part2)upload the fileSubmit

•You can see the status on the web-pageCompile-errorRuntime errorWrong outputOK

•You will also get a mail• Submit solution electronically, or on paper lab assistants or course leader before the deadline.• Append a receipt that you passed both forwarding and forwarding2 test of Kattis.

Page 30: DD2490 p4 2010 Routing architecture and forwarding · •Bridging - forwarding on layer 2 –A MAC address/ID has a flat structure •many nodes -> large forwarding tables •broadcast

30

Extracting correct info

•The ethernet header is 14 bytespayload type is in bytes 13-14IPv4 is 0x0800

•The IP header is 20 bytes (without options)The destination IP address is in bytes 17-20

struct ethhdr{

char da[6], sa[6];

uint16_t pt;

};

struct iphdr{

unsigned int ip_v:4,ip_hl:4; /* version, header length

uint8_t ip_tos; /* type of service */

uint16_t ip_len; /* total length */

uint16_t ip_id; /* identification */

uint16_t ip_off; /* fragment offset field */

uint8_t ip_ttl; /* time to live */

uint8_t ip_p; /* protocol */

uint16_t ip_sum; /* checksum */

uint32_t ip_src, ip_dst; /* source and dest address */

};

Page 31: DD2490 p4 2010 Routing architecture and forwarding · •Bridging - forwarding on layer 2 –A MAC address/ID has a flat structure •many nodes -> large forwarding tables •broadcast

31

Forwarding: DetailsEthernet decoding•Check Ethernet header length

•Check ethernet destination address

•Dispatch on payload type for IPv4.

IP header sanity checks•IP header length checks (check buffer length vs hdr-len field vs total length field).

•IP packets containing IP options should be relayed without action.

•Check IP header version

•Check checksum

Forwarding•FIB lookup for outgoing interface and nexthop/ directed connected host

•TTL check, decrementation and checksum recalculation

Page 32: DD2490 p4 2010 Routing architecture and forwarding · •Bridging - forwarding on layer 2 –A MAC address/ID has a flat structure •many nodes -> large forwarding tables •broadcast

32

Forwarding: Details (cont)

Encoding of Ethernet header•Get the correct ethernet destination address

•Get the correct ethernet source address

Transmission on outgoing interface

Statistics•A limited set of statistics can be gathered, as follows (these are a subset of the IP SNMP MIB):

–ipInReceives - total number of IP packets received.–ipInHdrErrors - Packets with errors in IP header (length, checksum, version, etc).–ipForwDatagrams - Number of successfully forwarded datagrams.

Page 33: DD2490 p4 2010 Routing architecture and forwarding · •Bridging - forwarding on layer 2 –A MAC address/ID has a flat structure •many nodes -> large forwarding tables •broadcast

33

Byte ordering / Endianness

CPU:s represent numbers they load/store from memory differently–Most significant byte in first byte: Big-endian (Big end first)–Most significant byte in last byte: Little-endian–There is also middle-endian and bi-endian

0A0B0C0D 0A0B0C0DRegister Register

0A0B0C0D

n

n+1n+2n+3

Big-Endian

Little-Endian

0D0C0B0A

Memory

Page 34: DD2490 p4 2010 Routing architecture and forwarding · •Bridging - forwarding on layer 2 –A MAC address/ID has a flat structure •many nodes -> large forwarding tables •broadcast

34

Network byte order•The way the CPU stores/loads numbers from memory is called host byte order•But in communication system, we sometimes have to transfer numbers in binary format (character arrays is never a problem)•We have to agree on a format to encode numbers•This is called network-byte order

–In IP network-byte order is big-endian

•Therefore, in portable code, if you transfer binary numbers between nodes, always translate between host-byte order and network-byte order.•BSD has the following help functions:

–htonl, ntohl (4-byte numbers)–htons, ntohs (2-byte numbers)

Page 35: DD2490 p4 2010 Routing architecture and forwarding · •Bridging - forwarding on layer 2 –A MAC address/ID has a flat structure •many nodes -> large forwarding tables •broadcast

35

Alignment

•Data structures must be aligned in memory when accessed as several bytes. •In particular 2­byte, 4­byte, 8­byte numbers must be aligned on word boundaries

–Otherwise a bus error occurs (in serious cases, eg SPARC)–Or a performance degradation (as in x86)

•Typically, –2­byte numbers must be 2­byte aligned–4­byte numbers must be 4­byte aligned–Etc

•In Eth+IP, the Eth header is 14 bytes which makes the IP header misaligned (actually, the fields of the IP header)

Page 36: DD2490 p4 2010 Routing architecture and forwarding · •Bridging - forwarding on layer 2 –A MAC address/ID has a flat structure •many nodes -> large forwarding tables •broadcast

36

Alignment example

0A0B0C0D

0A0B0C0D

99100101102103104105

Memory

OK

0A0B0C0D

0A0B0C0D

97 98 99100101102103

Memory

BUS ERROR!