cns 2640 lecture 6/7 assembled by m. ryan byrd
DESCRIPTION
CNS 2640 Lecture 6/7 Assembled by M. Ryan Byrd. NSFNET Backbone. Stanford. ISU. BARNET. MidNet. Regional. Regional. Westnet. Regional. Berkeley. PARC. UNL. KU. UNM. NCAR. UA. Service Provided. Backbone. Stanford. ISU. BARNET. MidNet. Regional. Regional. Westnet. - PowerPoint PPT PresentationTRANSCRIPT
CNS 2640 CNS 2640 Lecture 6/7Lecture 6/7Assembled by M. Ryan ByrdAssembled by M. Ryan Byrd
Internet StructureInternet Structure
Recent Past
Today
NSFNET Backbone
WestnetRegional
BARNETRegional
. . .
UA
UNM
Stanford
BerkeleyPARC
NCARUNL
ISU
MidNetRegional
KU
WestnetRegional
BARNETRegional
UA
UNM
Stanford
BerkeleyPARC
Service Provided Backbone
NCARUNL
ISU
MidNetRegional
KU
NumbersNumbers
www.icann.org Internet Corporation for Assigned Names and Numbers
www.arin.net is our authority and has more details
Names and numbers have been privatized. The US government used to allocate them
The big pictureThe big picture
CurrentCurrent
DestinationsDestinations
Top Level DomainsTop Level Domains
Popular Interior Gateway Popular Interior Gateway ProtocolsProtocols
RIP: Route Information Protocol– developed at Berkeley– distributed with Unix– distance-vector algorithm- neighbors– based on hop-count
OSPF: Open Shortest Path First– recent Internet standard– uses link-state algorithm-bcast– supports load balancing– supports authentication
IP Service ModelIP Service Model
Packet Delivery Model– Best Effort
Global Addressing Scheme– IP Addresses
Packet Delivery ModelPacket Delivery Model
Connectionless (datagram-based)Best-effort delivery (unreliable service)
– packets are lost– packets are delivered out of order– duplicate copies of a packet are delivered– packets can be delayed for a long time
NSFNETNSFNET
A Partial Internet Trunk MapA Partial Internet Trunk MapShowing ISP Access LinesShowing ISP Access Lines
US Main Internet connectionsUS Main Internet connections
Visual TracerouteVisual Traceroute
IP Datagram formatIP Datagram format– Version (4): currently 4– Hlen (4): number of 32-bit words in header– TOS (8): type of service (not widely used QoS)– Length (16): number of bytes in this datagram– Ident (16): different for each datagram– Flags/Offset (16): used by fragmentation– TTL (8): number of hops this datagram has travelled– Protocol (8): demux key (TCP=6, UDP=17)– Checksum (16): of the header only– DestAddr & SrcAddr (32)
Fragmentation and Fragmentation and ReassemblyReassembly
Each network has some MTUStrategy
– fragment when necessary (MTU < Datagram)– try to avoid fragmentation at source host– refragmentation is possible– fragments are self-contained datagrams– use CS-PDU (not cells) for ATM– delay reassembly until destination host– do not recover from lost fragments– Fragment on 8 byte boundaries– Drop the last 3 bits of the offset field
ExampleExample
Datagram ForwardingDatagram Forwarding
Strategy– every datagram contains destination's address– if directly connected to destination network, then
forward to host– if not directly connected to destination network, then
forward to some router– forwarding table maps network number into next
hop– each host has a default router– each router maintains a forwarding table
Error DetectionError Detection
Error Control OverviewError Control Overview
Errors occur due to – Noise or interference in the communication channel– Congestion in the network where packets musts be
dropped
Error Control Strategies– Error Correcting codes (Forward Error Correction
(FEC))– Error detection and retransmission Automatic Repeat
Request (ARQ)
Cyclic Redundancy CheckCyclic Redundancy Check
Add k bits of redundant data to an n-bit message.
Represent n-bit message as an n-1 degree polynomial; e.g., MSG=10011010 corresponds to M(x) = x7+ x4 + x3 + x1.
Let k be the degree of some divisor polynomial C(x); e.g., C(x) = x3+ x2 + 1.
CRCCRC
Transmit polynomial P(x) that is evenly divisible by C(x), and receive polynomial P(x) + E(x); E(x)=0 implies no errors.
Recipient divides (P(x) + E(x)) by C(x); the remainder will be zero in only two cases: E(x) was zero (i.e. there was no error), or E(x) is exactly divisible by C(x). Choose C(x) to make second case extremely rare.
ExampleExample
Make all legal messages divisible by 3 If you want to send 10
– First multiply by 4 to get 40– Now add 2 to make it divisible by 3 = 42
When the data is received ..– Divide by 3, if there is no remainder there is no error– If no error, divide by 4 to get sent message
If we receive 43, 44, 41, 40, then error 45 would not be recognized as an error
TCP Congestion TCP Congestion ControlControl
TCP Congestion ControlTCP Congestion Control
Idea– assumes best-effort network – each source determines network capacity for
itself– uses implicit feedback– ACKs pace transmission (self-clocking)
Algorithm:Algorithm:
– increment CongestionWindow by one packet per RTT (linear increase)
– divide CongestionWindow by two whenever a timeout occurs (multiplicative decrease)
Underlying best-effort networkUnderlying best-effort network
drops messagesre-orders messagesdelivers duplicate copies of a given
messagelimits messages to some finite sizedelivers messages after an arbitrarily long
delay
Common end-to-end servicesCommon end-to-end services guarantee message delivery deliver messages in the same order they are sent deliver at most one copy of each message support arbitrarily large messages support synchronization allow the receiver to apply flow control to the sender support multiple application processes on each host
Simple Demultiplexor (User Simple Demultiplexor (User Datagram Protocol UDP)Datagram Protocol UDP)
Unreliable and unordered datagram service Adds multiplexing No flow control Endpoints identified by ports
– servers have well-known ports– see /etc/services on Unix
Optional checksum– pseudo header + udp header + data
Header format
SrcPort DstPort
Checksum Length
Data
0 16 31
UDP HeaderUDP Header
Bits 0 - 15 Bits 16 -31
Source Port Destination Port
Length Checksum
Data :::
TCP Header TCP Header (for contrast)(for contrast)
0-15 16-31
Source Port Destination Port
Sequence Number
Acknowledgment Number
Data Offset reserved ECN Control Bits
Window
Checksum Urgent Pointer
Options and padding :::
Data :::
Demux ProcessDemux Process (value of “port”) (value of “port”)
Applicationprocess
Applicationprocess
Applicationprocess
UDP
Packets arrive
Ports
Queues
Packetsdemultiplexed
Port 2000Port 3000
Port 3100
Reliable Byte-Stream Reliable Byte-Stream (TCP)(TCP)
OverviewOverviewConnection-orientedByte-stream
– sending process writes some number of bytes– TCP breaks into segments and sends via IP– receiving process reads some number of bytes– Full duplex
Flow control: keep sender from overrunning receiver
Congestion control: keep sender from overrunning network
Read p 272-287 in Cisco book
TCP Segment FormatTCP Segment Format Each connection identified
with 4-tuple:– <SrcPort, SrcIPAddr, DstPort, DstIPAddr>
Sliding window + flow control– Acknowledgment, SequenceNum, AdvertisedWindow
Flags: SYN, FIN, RESET, PUSH, URG, ACK
Checksum: pseudo header + tcp header + data
Src Port Dest Port
AdvertisedWindow
Acknowledgement
SequenceNum
CheckSum
Flags
options
UrgPtr
0(4) (6) (6)
(variable)
data
HdrLen
TCP Connection Establishment and TCP Connection Establishment and TerminationTermination
Three-Way Handshake-random number so that packets from consecutive sessions are unique
Active Participant Passive Participant
SYN, SequenceNum = x
Acknowledgement = x + 1
SYN + ACK, SequenceNum = y,
ACK, Acknowledgement = y + 1
Sliding Window RevisitedSliding Window Revisited
Each byte has a sequence number ACKs are cumulative
Sliding Window (details)Sliding Window (details)
Sending side– LastByteAcked LastByteSent– LastByteSent LastByteWritten– bytes between LastByteAcked and LastByteWritten
must be buffered
Receiving side– LastByteRead < NextByteExpected– bytes between NextByteRead and LastByteRcvd must
be buffered
ARP (again!)ARP (again!)
ARP
H1 H2 H3
H4 H5 H6
H7 H8 H9
H10 H11 H12
Switch
171
172
173
174
D ff.ff.ff.ff.ff.ffSP 128.187.171.2SH fe.34.56.32.d5.29DP 128.187.174.10DH 0.0.0.0.0.0
H10= IP 128.187.174.10, Ethernet 44.fe.34.56.32.d5
56.47.ef.c6.34.78
55.7e.c6.11.78.99
D ff.ff.ff.ff.ff.ffSP 128.187.171.2SH fe.34.56.32.d5.29DP 128.187.174.10DH 0.0.0.0.0.0
D ff.ff.ff.ff.ff.ffSP 128.187.171.2SH fe.34.56.32.d5.29DP 128.187.174.10DH 0.0.0.0.0.0
D ff.ff.ff.ff.ff.ffSP 128.187.171.2SH fe.34.56.32.d5.29DP 128.187.174.10DH 0.0.0.0.0.0
D fe.34.56.32.d5.29SP 128.187.174.10SH 44.fe.34.56.32.d5DP 128.187.171.2DH fe.34.56.32.d5.29
D fe.34.56.32.d5.29SP 128.187.174.10SH 44.fe.34.56.32.d5DP 128.187.171.2DH fe.34.56.32.d5.29
D 128.187.174.10D 44.fe.34.56.32.d5 S 128.187.171.2S fe.34.56.32.d5.29
D 128.187.174.10D 44.fe.34.56.32.d5 S 128.187.171.2S fe.34.56.32.d5.29
ARP
H1 H2 H3
H4 H5 H6
H7 H8 H9
H10 H11 H12
Router
171
172
173
174
H10= IP 128.187.174.10, Ethernet 44.fe.34.56.32.d5
56.47.ef.c6.34.78
55.7e.c6.11.78.99D ff.ff.ff.ff.ff.ff
SP 128.187.171.2SH 55.7e.c6.11.78.99DP 128.187.174.10
DH 0.0.0.0.0.0
D 128.187.174.10D 56.47.ef.c6.34.78 S 128.187.171.2S fe.34.56.32.d5.29
D ff.ff.ff.ff.ff.ffSP 128.187.171.2SH fe.34.56.32.d5.29DP 128.187.174.10DH 0.0.0.0.0.0
D 55.7e.c6.11.78.99SP 128.187.174.10SH 44.fe.34.56.32.d5DP 128.187.171.2DH 55.7e.c6.11.78.99
D fe.34.56.32.d5.29SP 128.187.174.10SH 56.47.ef.c6.34.78 DP 128.187.171.2DH fe.34.56.32.d5.29
D 128.187.174.10D 44.fe.34.56.32.d5S 128.187.171.2S 55.7e.c6.11.78.99
IP Routing Definitions and IP Routing Definitions and TerminologyTerminology
Routers are Layer 3 (Network Layer) devices Traditionally routers were called gateways Routers are used for information exchange within
a group of networks under the same administrative authority and control (Autonomous Systems)
Routing can be both dynamic and static Routing involves the determination of routing
paths and the transport of information groups (packets) through an internetwork
RIP Routing TableRIP Routing Table
RIP Packet FormatRIP Packet Format
RIP Packet Fields DescriptionRIP Packet Fields Description Command:
– Indicates that the packet is a request or a response. The request command requests the responding system to send all or part of its routing table. Destinations for which a response is requested are listed later in the packet. The response command represents a reply to a request or, more frequently, an unsolicited regular routing update. In the response packet, a responding system includes all or part of its routing table. Regular routing update messages include the entire routing table.
Version number:– Specifies the RIP version being implemented. With the potential
for many RIP implementations in the Internet, this field can be used to signal different, potentially incompatible, implementations.
RIP Packet Fields DescriptionRIP Packet Fields Description Address family identifier:
– Follows a 16-bit field of all zeros and specifies the particular address family being used. On the Internet, this address family is typically IP (value = 2), but other network types may also be represented
Address:– Follows another 16-bit field of zeros. In Internet RIP
implementations, this field typically contains an IP address Metric:
– Follows two more 32-bit fields of zeros and specifies the hop count. The hop count indicates how many internetwork hops (routers) must be traversed before the destination can be reached
OSPF - Open Shortest Path OSPF - Open Shortest Path FirstFirst
OSPF is a relatively recent intra-domain, link state, hierarchical routing protocol developed for IP networks by the Internet Engineering Task Force (IETF)
OSPF was derived from an early version of OSI's IS-IS routing protocol
IGRPIGRP
IGRP is an intra-domain distance vector routing protocol developed in the mid-1980s by Cisco Systems, Inc. It is designed for use in large, complex IP networks.
IGRP uses a combination (vector) of metrics. Internetwork delay, bandwidth, reliability, MTU, and load are all factored into the routing decision.
What is a routing protocol?What is a routing protocol?
Delivers information about networks this router knows to other routers
Receives and records information about other networks from other routers
Used to construct a virtual path through a series of routers
Provide end to end connectivity for a set of nodes or hosts
What is a distance vector What is a distance vector routing protocol?routing protocol?
Based on the Bellman-Ford algorithm Routes are advertised as a vector in the form
(distance, direction)– Distance is a metric (Usually hop count)– Direction is the next-hop router
Relies upon information learned from neighbors Common distance vector protocols include RIP, IPX
RIP(Novell), IGRP, RTMP(Appletalk)
Review of RIPReview of RIP
RIP is an example of a DV routing protocolWidely deployed in 1982, RFC’d in 1988RIP uses hop count for its metricPoor load balancing supportNot designed for unequally balanced
bandwidth linksLets look at an example…
Improved DV ProtocolsImproved DV Protocols
IGRP, EIGRPCisco ProprietaryDeveloped in the mid 1980s/mid 1990’sFeatures an advanced metric systemUnequal cost load sharing
(E)IGRP Metrics(E)IGRP Metrics
Bandwidth – How big the channel isDelay – Knob to adjust channel useLoad – Percent utilization on the channelReliability – Percent time the channel has
been upProvides constants which can be modified
to make one variable more important than another
Load BalancingLoad Balancing
Balances over equal cost paths by defaultBalances over unequal cost paths with
minor configurationYou can specify the variance of unequal
cost paths to distribute over
Routing Protocol TypesRouting Protocol TypesDistance Vector Passes routes by next hop and path cost or metric. Takes up little memory Ease of implementation
Link State Each router meets all others and passes information about
the links that attach it to the network Each router contains complete information about the
topology and from this information uses Dijkstra’s Algorithm to calculate forwarding decisions
Faster Convergence time Uses more memory, complex to implement
DHCPDHCP
Dynamic Host Configuration Protocol uses the same frame format and transport mechanism as BOOTP. It is supposed to provide a complete set of parameters to a host that queries the server. The neat new capability that DHCP adds is that it can assign addresses and reuse them.
BOOTP : assigns a host an address it can use “forever”
DHCP : loans an address to a host and is available again if the host does not renew.
NATNATA Network Address Translator sits on a network and
translates IP addresses of multiple stations into one address viewable by the outside world.
WAN
Workstation
Workstation
Workstation
Workstation
NAT
The TranslatorThe TranslatorThe NAT has a set of one or more globally unique IP
addresses that it can assign to nodes in the masked network.
If the NAT has a pool of globally unique IP addresses that is less than the number of nodes in the masked network, it can do Network Address Port Translation (NAPT). This translates between address and port pairs, allowing thousands of connections through the translator.
NAT and NAPT have helped delay the deployment of IPv6. These protocols can get ugly when encryption is used.
Autonomous SystemsAutonomous Systems
An Autonomous System (or AS) is a set of routers under a single technical administration, like an internet service provider. The administration of an AS appears to other ASs to have a single coherent interior routing plan and presents a consistent picture of what destinations are reachable through its network.
Examples of an AS are Sprint, Qwest, and MCIWorldCom
Gateway ProtocolsGateway Protocols
There are 2 types of Gateway Protocols
1. Interior Gateway Protocols are used within Autonomous Systems
2. Exterior Gateway Protocols are used between Autonomous Systems
A Problem with Distance Vector A Problem with Distance Vector ProtocolsProtocols
Counting to infinity: the LONG convergence time.
A B C
Assume each link has a metric of one. A knows that its cost to get to C is 2, B knows that its cost to get to C is 1. C Crashes! B discards its distance vector from C and recalculates, using the advertisement of 2 from router A and incrementing it one. Router A receives a metric of 3 to C from router B and changes its distance vector to 4. This Process continues until routers A and B determine that the metric to C is infinity. This problem is called counting to infinity.
SolutionsSolutions Hold Down: When a route that is in use goes
down, the connected router advertises that path as infinity to the rest of the network. Once a period of time passes, the connected router finds an alternative path. The issue here is that it slows down convergence time and doesn’t always work.
Report the entire path: this is expensive and you might as well use a link state algorithm
Split Horizon: When Router C Crashes, Router B stops advertising a route to C to Router A. Problem solved…somewhat (next slide)
Split HorizonSplit HorizonSplit Horizon still does not solve the count to infinity problem
in the topology below if the link to router D goes down. Router B will stop advertising the route to D, but routers A and C will continue to advertise routes to D, counting to infinity.
Counting to infinity doesn’t break the protocol, it just slows network convergence time.
A
B
C
D
Poison ReversalPoison Reversal
Poison Reversal is used with Split Horizon. Instead of not advertising the route, the router advertises the route with a metric of infinity. This solves the counting to infinity problem.
““Classful” Inter Domain Classful” Inter Domain RoutingRouting
In the early days of the internet, the network masks were inferred from the class of IP address to the destination network. They were not passed around by the routing protocols. In this age network masks were not needed by routers, as they could implicitly determine the network mask by looking at the first byte of the IP address in question.
It was assumed that Class A netmasks were 255.0.0.0, Class B netmasks were 255.255.0.0 and Class C netmasks were 255.255.255.0.
Classless Inter Domain Classless Inter Domain RoutingRouting
CIDR routing protocols pass subnet information with their routing information, eliminating the need to infer an address class.
Subnets are thus no longer limited to the classes. They could now be any arbitrary length that the network manager configured (within reason).
More on CIDRMore on CIDR The concept of address “classes” goes away. A 16 bit
address block is the same as an old class B network. Modern networks can be arbitrarily grouped (multiple
class C sized networks, into a class B sized network), or divided (one class B sized network into any set of arbitrary smaller sized networks).
A single routing advertisement can cover a block of old style addresses. This makes the size of routing tables smaller.
Larger blocks of addresses can be divided and allocated, increasing the lifetime of IPv4.
Routing Information ProtocolRouting Information Protocol
RIP is an Interior Gateway Protocol that uses the Distance Vector approach to routing. The most primitive version (1) was a class oriented routing protocol. RIP Version 2 adds support for subnet masks and authentication.
RIP works just as I have explained Distance Vector routing protocols.
RIP uses split horizon with poison reversal.
More on RIPMore on RIP
RIP is designed for smaller simple networks, as the infinity metric is 16 hops. Thus the protocol is limited to networks whose longest path (the network's diameter) is 15 hops with a metric of 1. RIP should not be used in larger networks.
Like other Distance Vector routing protocols, RIP counts to infinity to resolve unusual situations.
RIP is not appropriate for situations where routes need to be chosen based on real-time parameters such a measured delay, reliability, or load.
Why Use RIP?Why Use RIP?
Because of the simplicity of the protocol:There are many good, interoperable
implementations.These implementations have a minimal
number of bugs.There is minimal configuration.
Open Shortest Path FirstOpen Shortest Path FirstOSPF is an Interior Gateway Protocol that uses the Link
State approach to routing. In OSPF, each router maintains a database
describing the Autonomous System's topology. This database is referred to as the link-state
database. Each participating router has an identical
database. Each individual piece of this database is a
particular router's local state (e.g., the router's usable interfaces and reachable neighbors).
The router distributes its local state throughout the Autonomous System by flooding.
Hello ProtocolHello ProtocolRouters discover other OSPF capable routers through the
Hello Protocol.Once 2 routers have detected each other, a partial
adjacency has been formed. They can now share link state information through the exchange of database description packets.
During and after the Database Exchange Process, each router has a list of those LSAs (Link State Advertisements ) for which the neighbor has more up-to-date instances. Requests are sent until the database is updated, now the routers are fully adjacent.
What happens when there is What happens when there is more than one router on the more than one router on the
subnet? subnet? In this situation, the Hello Protocol has the
capability to elect a designated router. The router that is first to initialize becomes
designated router Or if 2 routers initialize at the same time, DR is
determined by priority or router ID. The designated router is the only router on that
specific network that shares the database with other routers on that network.
The CalculationThe Calculation
OSPF uses Djikstra’s Algorithm to construct a tree of shortest path routes across an autonomous system.
This is performed by all routers on the network in parallel.
The route costs or metrics are configured by the network administrator.
The tree that is calculated determines the entire path, but the router only uses this to determine forwarding of data packets to the next hop router.
Multicast RoutingMulticast Routing
Multicast hosts register with their local router through a protocol called Internet Group Management Protocol.
There are several routing protocols that can route IP multicast packets.
(DVMRP)Distance Vector Multicast Routing Protocol creates source routed tree structures.
Link-State Routing AlgorithmsLink-State Routing AlgorithmsNet topology, link costs known to all nodes
Link state distribution accomplished via “link state broadcast” – all nodes have same info
Compute least cost paths from a node to all other nodes– use Dijkstra’s algorithm
Dijkstra’s AlgorithmDijkstra’s Algorithm1 Initialization: 2 N = {A} 3 for all nodes v 4 if v adjacent to A 5 then D(v) = c(A,v) 6 else D(v) = infty 7 8 Loop 9 find w not in N such that D(w) is a minimum 10 add w to N 11 update D(v) for all v adjacent to w and not in N: 12 D(v) = min( D(v), D(w) + c(w,v) ) 13 /* new cost to v is either old cost to v or known 14 shortest path cost to w plus cost from w to v */ 15 until all nodes in N
Notation:- c(i,j): link cost from node i to j; cost infinite
if not direct neighbors- D(v): current value of cost of path from
source to destination v- N: set of nodes whose least cost path
definitively known
Dijkstra’s Algorithm: ExampleDijkstra’s Algorithm: ExampleStep
012345
start NA
ADADE
ADEBADEBC
ADEBCF
D(B),p(B)2,A2,A2,A
D(C),p(C)5,A4,D3,E3,E
D(D),p(D)1,A
D(E),p(E)infinity
2,D
D(F),p(F)infinityinfinity
4,E4,E4,E
A
ED
CB
F
2
2
13
1
1
2
53
5
B
A
S E
F
H
J
D
C
G
IK
represents link
represents a node that has received update
M
N
L
Link State BroadcastLink State Broadcast
B
A
S E
F
H
J
D
C
G
IK
send to neighbors
M
N
L
Link State BroadcastLink State Broadcast
Link State BroadcastLink State Broadcast
B
A
S E
F
H
J
D
C
G
IK
M
N
L
To avoid forwarding the same update multiple times, each update has a sequence number. If an arrived update does not have a higher seq., discard!- The packet received by E from C is discarded- The packet received by C from E is discarded as well - Node H receives packet from two neighbors, and will discard one of them
Summary of Link State BroadcastSummary of Link State Broadcast Link updates are given sequence numbers Each router maintains the highest seq. seen for each
router If the seq. of an arrived update is not higher than the
stored seq., discard the update; otherwise, update seq. of the src, and forward the update to all the links except the incoming link
To avoid corrupted seq. (or a router reboot) and therefore prevent any update, the state at each router has an age field
Updates are sent periodically
Routing in the InternetRouting in the Internet
The Global Internet consists of Autonomous Systems (AS) interconnected with each other– An AS is identified by an AS Number (ASN), e.g. Yale
ASN is 29
– Try %whois or– http://www.cs-ipv6.lancs.ac.uk/ftp-archive/
6Bone/Whois/internic-asn/asn.txt
Different Types of ASDifferent Types of AS
Stub AS: single service provider, e.g. small corporation– does not participate in inter-AS protocol– has one default route and sends
non-local traffic to service provider
Multihomed AS: large corporation (no transit)– does not participate in inter-AS routing protocol– has more than one service providers
Transit AS: provider
Qwest
Yaledefault routes 0.0.0.0/0
pointing to provider.
132.130.0.0/16
Routing with ASRouting with AS Intra-AS
– Routers in the same AS run the same routing protocol– Routers in different AS’s can run different intra-AS routing
protocols– Such protocols are called Interior Gateway Protocols (IGP)
RIP: Routing Information Protocol OSPF: Open Shortest Path First IS-IS: very similar to OSPF (or should we say OSPF is very similar to
IS-IS?) IGRP: Interior Gateway Routing Protocol (Cisco)
Inter-AS– A protocol runs among AS’s is also called an Exterior Gateway
Protocol (EGP) Unique standard in the current Internet: Border Gateway Protocol (BGP)
Inter-AS routing
between A and B
Intra-AS and Inter-AS RoutingIntra-AS and Inter-AS Routing
Host h2
a
b
b
aaC
A
Bd c
A.a
A.c
C.bB.a
cb
Hosth1
Intra-AS routingwithin AS A
Intra-AS routingwithin AS B
border (exterior gateway) routers
interior (gateway) routers
Why different Intra- and Inter-AS Routing?Why different Intra- and Inter-AS Routing?
Policy: Inter-AS: admin wants control over how its traffic routed,
who routes through its net Intra-AS: single admin, so no policy decisions needed
Scale: hierarchical routing saves table size and reduces update
traffic
Performance: Intra-AS: can focus on performance Inter-AS: policy may dominate over performance
Many Routing Processes Can Run on a Single Many Routing Processes Can Run on a Single Router Router
Forwarding Table
OSPFdomain
RIPdomain
BGP
OS kernel
RIP process
RIP routing table
Forwarding Table Manager
OSPF process
OSPF Routing table
BGP process
BGP routing table
RIP ( Routing Information Protocol)RIP ( Routing Information Protocol) Distance vector algorithm Included in BSD-UNIX
Distribution in 1982 Link cost: 1 Distance metric: # of
hops (max = 15 hops)– why?
Distance vectors– exchanged every 30 sec via Response Message (also called
advertisement) using UDP– Each advertisement: route to up to 25 destination nets
RIP (Routing Information Protocol) RIP (Routing Information Protocol)
Destination Network Next Router Num. of hops to dest. w A 2
y B 2 z B 7
x -- 1…. …. ....
w x y
z
A
C
D B
Routing table in D
RIP: Link Failure and RecoveryRIP: Link Failure and Recovery If no advertisement heard after 180 sec --> neighbor/link
declared dead– routes via neighbor invalidated
– new advertisements sent to neighbors
– neighbors in turn send out new advertisements (if tables changed)
– link failure info quickly propagates to entire net
– Reverse-poison used to prevent ping-pong loops (infinite distance = 16 hops)
OSPF (Open Shortest Path First)OSPF (Open Shortest Path First) “Open”: publicly available
Uses Link State algorithm – Link state (LS) packet dissemination– Topology map at each node– Route computation using Dijkstra’s algorithm
OSPF “Advanced” Features (not in OSPF “Advanced” Features (not in RIP)RIP)
Multiple same-cost paths allowed (only one path in RIP)
For each link, multiple cost metrics for different Type Of Service (eg, satellite link cost set “low” for best effort; high for real time)
Security: all OSPF messages authenticated (to prevent malicious intrusion); TCP connections used
Hierarchical OSPF in large domains
Hierarchical OSPFHierarchical OSPF
“summarize” distances to nets in own area, advertise to other Area Border routers.
run OSPF routing limited to backbone.
- Link-state advertisements only in area each nodes has detailed area topology;- only know direction (shortest path) to nets in other areas.
Two-level hierarchy: local area, backbone.
Internet Inter-AS Routing: BGPInternet Inter-AS Routing: BGP BGP (Border Gateway Protocol): the de facto
standard Path Vector protocol:
– Similar to Distance Vector protocol– Each Border Gateway broadcasts to neighbors
(peers) entire path (i.e., sequence of AS’s) to destination
– e.g., Gateway X may send its path to dest. Z:
Path (X,Z) = X,Y1,Y2,Y3,…,Z
BGP: Policy RoutingBGP: Policy RoutingSuppose: gateway X sends its path to peer gateway W W may or may not select path offered by X
– cost, policy (e.g., don’t route via competitor’s AS), loop prevention reasons
If W selects path advertised by X, then:Path (W,Z) = W, Path (X,Z)
Note: X can control incoming traffic by controlling its route advertisements to peers:– e.g., don’t want to route traffic to Z -> don’t advertise
any routes to Z
Selective TransitSelective Transit
NET BNET C
NET A provides transitbetween NET B and NET Cand between NET D and NET C
NET A
NET D
NET A DOES NOTprovide transitBetween NET D and NET B
IP traffic
advertise path to C, but not D
advertise path to B and D
advertise path to C, but not B
Suppose Net C is a paying costumer of Net A
BGP: Policy Interactions Could Lead to OscillationsBGP: Policy Interactions Could Lead to Oscillations
2
0
31
2 1 02 0
1 3 01 0
3 2 03 0
4
3
• If each one chooses the first choice, not consistent;• If one chooses the second choice, say 1 chooses 10, then 2 will choose 210, the only valid for 3 is 30; however, the choice of 3 forces 1 to change to 130
• Have not seen oscillations in practice, but this is a hidden threat!• Solution: check for dependency!
preferred
lesspreferred
Each router has a choiceamong two paths;The policy is to prefer its counter clock-wise neighbor
BGP Operations (Simplified) BGP Operations (Simplified)
Establish session on TCP port 179
Exchange all active routes
Exchange incremental updates
AS1
AS2
While connection is ALIVE exchangeroute UPDATE messages
BGP session
IGRP (Interior Gateway Routing IGRP (Interior Gateway Routing Protocol)Protocol)
CISCO proprietary; successor of RIP (mid 80s) Distance Vector, like RIP Several cost metrics (delay, bandwidth, reliability,
load etc) Uses TCP to exchange routing updates Loop-free routing via Distributed Updating Alg.
(DUAL) based on diffused computation
BGP MessagesBGP Messages Four types of messages
– OPEN: opens TCP connection to peer and authenticates sender
– UPDATE: advertises new path (or withdraws old)– KEEPALIVE keeps connection alive in absence of
UPDATES; also ACKs OPEN request– NOTIFICATION: reports errors in previous msg;
also used to close connection
Why is a routing protocol needed?Why is a routing protocol needed?
Early requirements to exchanges data between computers over interconnected networks.
Routing entities had to make a judgement on which path to route traffic to destination.
Background to RIPBackground to RIP
RIP dates back to 1969, the early networking days and ARPNET when Xerox and Berkley’s Unix implemented it broadly similar protocols.
RIP distributed through ‘route d’ application, included in early Unix O.S.
RIP uses a single class of routing algorithm known as distance vector - based on a simple hop count algorithm (derived from Bellman’s equation).
Although superseded by more complex algorithms, its simplicity means is still found widely in smaller autonomous systems.
Purpose of Routing ProtocolPurpose of Routing Protocol
The purpose of RoutING protocols is to supply information needed to do routing of datagrams from router to router.
RIP intended for use in IP based network environment. Operating at layer 3 of OSI (Network) RIP makes no formal distinction between networks and
hosts. Routers typically provide a gateway for datagrames to
leave one network or AS and be forwarded onward to another network.
Routers therefore, have to make decisions if there is a choice of forwarding path on offer.
Routing metricsRouting metrics
Routing entities keep a database (look up table) of basic information based on numeric result s (metric) of an algorithm to forward a datagram onward to its next destination.
Each entity participating in routing decisions sends update messages to its neighbour.
In order to provide complete network routing information every router within the AS must participate in the protocol.
Each router has a lookup table which contains one entry for every destination that is reachable.
How does a metric work?How does a metric work?
Metrics are the result of a formula based on a choice of measurement criteria.
Example, travel cost by taxi:
£10 to go by taxi from Edinburgh to Livingston. (P1)£25 to go from Livingston to Glasgow (P2)£15 to go from Edinburgh to Falkirk (P3)£30 to go from Falkirk to Glasgow (P4)
Cost (Edinburgh, Glasgow) = [P1+P2] = £35also/or [P3+P4] = £45
What is in a RIP routing table?What is in a RIP routing table?
Address - IP address (IPv4) of host or network destination. Router - First router along the route to destination. Interface - The physical network which must be used to
reach the next router. Metric - A number indicating the distance to the
destination. This number is the sum of the ‘costs’ that have to be transversed to get to the destination.
Timers - Time since entry was last updated and others. Flags - Various flags to indicate status of various adjacent
routers (for example).
Other entries in the routing tableOther entries in the routing table
The entries for directly connected networks typically have a value of 1 (a simple hop count).
Initially subnet masks were not included in RIP protocol implementations, but were included later to support feature extensions and to identify different subnets within local and distant networks.
Administrators may also add static routes for example, which are outside the scope of the routing system.
The RIP datagramThe RIP datagram
RIP is a UDP-based protocol.
Small regular messages, no need for windowing, handshaking or re-transmission.
Frames received and transmitted on UDP port number 520 (Rip 1&2)
1 - 25 RIP routing entries RTEs.
Gateway HierarchyGateway Hierarchy
InternetCore
AutonomousSystem
(AS)
AutonomousSystem
(AS)
Two levels of Routing Two levels of Routing ProtocolsProtocols
RoutingDomain
RoutingDomain
RoutingDomain
EGP EGP
EGP
IGP
IGP
IGP
Intra-domainrouting protocol
Exteriorrouting protocol
Routing ProtocolsRouting Protocols
Intra-domain Gateway Protocols– RIP– RIP V2– OSPF - open shortest path first– IS-IS (similar to OSPF)
Exterior Gateway Protocols– EGP– BGP
RIPRIP
Distance vector routing algorithm based on hops that communicates between routers using UDP
On initialization, router determines all available interfaces and sends a REQUEST packet out each interface. Special request for “send everything”
On receipt of request,– Either return everything– Or, for each requested destination, return distance + 1
On response– Update routing tables
RIP V1 ProtocolRIP V1 ProtocolCommand Version MBZ
32-bit IP address
Address Family MBZ
MBZ
MBZ
Metric (value of 1..16)
Up to 24 more routes in same format...
MetricsMetrics
R1
R2
N1
N2
N3
N2 is 1 hop
N3 is 1 hop
N1 is 1 hop
N2 is 1 hop
Route to N3via R2 with
hop count of 2
ProblemsProblems
Hop count limited to 15– Can only be used within an AS where
maximum network diameter of 15
It’s based on HOPS, not e.g., latency or bandwidth
No notion of subnet addressing in RIP V1
RIP V2 ProtocolRIP V2 ProtocolCommand Version Routing domain
32-bit IP address
Address Family Route tag
32-bit subnet mask
32-bit next-hop IP address
Metric (value of 1..16)
Up to 24 more routes in same format...
RIP V2RIP V2 Routing domain is an identifier of the routing daemon
– Process ID in UNIX– …So you can run multiple instances of RIP
Route tag carries an autonomous system number for EGP and BGP
Next op address is where packets corresponding to that (sub)network should be sent. A value of zero means send to the system sending RIP info.
Simple authentication scheme with clear-text password
Distance Vector RoutingDistance Vector Routing
Also called Bellman-Ford or Ford-Fulkerson algorithms Used by RIP
Each router is responsible for keeping track and informing it’s neighbors of its distance to each destination
The router computes its distance to a destination based on its neighbors distance to the destination
Router must know it’s own ID and the cost of its links to each neighbor
Distance Vector Routing For Distance Vector Routing For Address “D”Address “D”
R
12
3
4
5
172
35
541
Link cost
Link number
Distance Vector Routing For Distance Vector Routing For Address “D”Address “D”
R
12
3
4
5
97
62
11829
81
172
35
541
Cost from neighbor to
destination D
Distance Vector Routing For Distance Vector Routing For Address “D”Address “D”
R
12
3
4
5
97
62
11829
81
172
35
541
98
99
97
123 70
Cost for Rto get to Dvia this link
Minimumcost route
Distance Vector Routing For Distance Vector Routing For Address “D”Address “D”
R
12
3
4
5
70
70
7070
70
172
35
541
Cost fromR to D
Problems With Distance Problems With Distance VectorVector
Slow convergence to the lowest cost route
Slow recovery time
Slow recovery leads to routing problems during recovery– Router loops– Count to infinity
Routing LoopsRouting Loops
A
B
C
D
A
B
C
D
Count To Infinity (worse case Count To Infinity (worse case loop)loop)
A
B
C
A
B1
2
1
2 A
B2
3 A
B3
4
OSPF - Open Shortest Path OSPF - Open Shortest Path FirstFirst
OSPF uses IP directly (I.e., like ICMP) Routes calculated based on TOS Each interface is assigned a dimensionless cost,
for each TOS If several equal-cost routes are available, traffic is
load-balanced Subnets are associated with each advertised route Supports authentication Uses multicast to distribute information
Link State RoutingLink State Routing
Used by OSPF and IS-IS
Construct a Link State Packet that lists neighbors and costs to get to those neighbours
Use Dijkstra’s algorithm to compute global routes as a tree from the current router
BGPBGP
Uses TCP
Distance vector protocol, but BGP enumerates the route to each destination (using a sequence of AS numbers)
Each AS is identified by a 16-bit number