large scale ip routing lecture by sebastian graf … · large scale ip routing lecture by sebastian...

55
1 | © by Xantaro LARGE SCALE IP ROUTING LECTURE BY SEBASTIAN GRAF MODULE 3 – BORDER GATEWAY PROTOCOL

Upload: ledat

Post on 24-Jun-2018

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: LARGE SCALE IP ROUTING LECTURE BY SEBASTIAN GRAF … · LARGE SCALE IP ROUTING LECTURE BY SEBASTIAN GRAF ... §RFC 4271, A Border Gateway Protocol 4 ... §RFC 3107, Carrying Label

1 | © by Xantaro

LARGE SCALE IP ROUTINGLECTURE BY SEBASTIAN GRAF

MODULE 3 – BORDER GATEWAY PROTOCOL

Page 2: LARGE SCALE IP ROUTING LECTURE BY SEBASTIAN GRAF … · LARGE SCALE IP ROUTING LECTURE BY SEBASTIAN GRAF ... §RFC 4271, A Border Gateway Protocol 4 ... §RFC 3107, Carrying Label

2 | © by Xantaro

Interdomain Routing

§ The Internet is a collection of autonomous systems§ An autonomous system (AS) is a collection of networks under a single technical administration

§ An interior gateway protocol (IGP) is run inside an autonomous system resulting in optimal intra-AS routing

► either IS-IS or OSPF

§ An exterior gateway protocol (EGP) is run between autonomous systems to enable routing policies, improve scalability and provide security

► BGP only choice today

Page 3: LARGE SCALE IP ROUTING LECTURE BY SEBASTIAN GRAF … · LARGE SCALE IP ROUTING LECTURE BY SEBASTIAN GRAF ... §RFC 4271, A Border Gateway Protocol 4 ... §RFC 3107, Carrying Label

3 | © by Xantaro

To use or not to use?

§ Interdomain routing protocol is used …§ when an AS is a transit AS§ when an AS is multi-homed to several internet service providers (using public AS numbers

and provider-independent address space)§ when an AS is multi-homed to the same internet service provider for fault detection and

traffic optimization (maybe using private AS numbers)§ when you need extensive filtering and manipulation possibilities that IGPs like OSPF and

ISIS do not offer§ when your network needs to carry too much prefixes for an IGP

§ When you are the administrator of a single-homed AS, think about static routing for simplicity

Page 4: LARGE SCALE IP ROUTING LECTURE BY SEBASTIAN GRAF … · LARGE SCALE IP ROUTING LECTURE BY SEBASTIAN GRAF ... §RFC 4271, A Border Gateway Protocol 4 ... §RFC 3107, Carrying Label

4 | © by Xantaro

In the Beginning ... Three Napkins Protocol (TNP)

§ Design by Yakov Rekhter and Kirk Lougheed during IETF-12 meeting in 1989§ BGP-4 currently defined in RFC 4271§ Ongoing discussion and development within IDR WG of IETF

Page 5: LARGE SCALE IP ROUTING LECTURE BY SEBASTIAN GRAF … · LARGE SCALE IP ROUTING LECTURE BY SEBASTIAN GRAF ... §RFC 4271, A Border Gateway Protocol 4 ... §RFC 3107, Carrying Label

5 | © by Xantaro

BGP Details

§ BGP is an exterior gateway protocol or interdomain routing protocol used between routing domains (aka autonomous systems)§ BGP supports classless interdomain routing (CIDR) § BGP is a path vector protocol using incremental updates§ BGP runs on top of TCP, port 179

§ BGP was built on a few fairly simple ideas:§ Provide loop-free routing by carrying information about the path the routing information

traverses§ Minimize volume of routing information by using incremental updates§ Use TCP as reliable transport § Encode information as collection of attributes using <type,length,value> style

Page 6: LARGE SCALE IP ROUTING LECTURE BY SEBASTIAN GRAF … · LARGE SCALE IP ROUTING LECTURE BY SEBASTIAN GRAF ... §RFC 4271, A Border Gateway Protocol 4 ... §RFC 3107, Carrying Label

6 | © by Xantaro

Autonomous System Numbers

§ Autonomous System numbers identify a network within the public internet that is under a common administration

§ AS numbers are attached to every Prefix routed in the Internet and are one of the most important BGP attributes

§ AS numbers help to identify the origin of a prefix, as well as the transit networks to reach it

§ Original a 16 bit value§ AS 1 to AS 64511 for public use in the Internet§ AS 64512 to AS 65534 for private use within autonomous systems

► like private (RFC1918), they should never appear in the Internet

§ Today extended to 32 bit, as we ran out of 16 bit AS numbers§ Values up to 4.294.967.295§ RFC6996 also reserved a private range for 32 bit AS numbers

► AS 4200000000 to AS 4294967294 can be used similar as the range from 64512 to 65534

Page 7: LARGE SCALE IP ROUTING LECTURE BY SEBASTIAN GRAF … · LARGE SCALE IP ROUTING LECTURE BY SEBASTIAN GRAF ... §RFC 4271, A Border Gateway Protocol 4 ... §RFC 3107, Carrying Label

7 | © by Xantaro

Managing Autonomous System numbers

§ AS numbers are registered to their operators like public IP addresses

sgraf@gollum:~$ gwhois AS203520

aut-num: AS203520as-name: XANTAROorg: ORG-XHG1-RIPEimport: from AS13237 accept ANYexport: to AS13237 announce AS203520import: from AS20773 accept ANYexport: to AS20773 announce AS203520admin-c: DUMY-RIPEtech-c: DUMY-RIPEremarks: For information on "status:" attribute read https://www.ripe.net/data-tools/db/faq/faq-status-values-legacy-resourcesstatus: ASSIGNEDmnt-by: RIPE-NCC-END-MNTmnt-by: MNT-XANmnt-routes: MNT-XANcreated: 2015-12-18T16:07:01Zlast-modified: 2016-04-14T10:54:35Zsource: RIPEremarks: ****************************remarks: * THIS OBJECT IS MODIFIEDremarks: * Please note that all data that is generally regarded as personalremarks: * data has been removed from this object.remarks: * To view the original object, please query the RIPE Database at:remarks: * http://www.ripe.net/whoisremarks: ****************************

Page 8: LARGE SCALE IP ROUTING LECTURE BY SEBASTIAN GRAF … · LARGE SCALE IP ROUTING LECTURE BY SEBASTIAN GRAF ... §RFC 4271, A Border Gateway Protocol 4 ... §RFC 3107, Carrying Label

8 | © by Xantaro

BGP Standards

§ Selection of BGP standards (highly incomplete list):§ RFC 4271, A Border Gateway Protocol 4 (BGP-4)§ RFC 1997, BGP Communities Attribute§ RFC 2385, Protection of BGP Sessions via the TCP MD5 Signature Option§ RFC 2439, BGP Route Flap Damping§ RFC 2545, Use of BGP-4 Multiprotocol Extensions for IPv6 Inter-Domain Routing§ RFC 2796, BGP Route Reflection§ RFC 2858, Multiprotocol Extensions for BGP-4§ RFC 2918, Route Refresh Capability for BGP-4§ RFC 3065, Autonomous System Confederations for BGP§ RFC 3107, Carrying Label Information in BGP-4§ RFC 3392, Capability Advertisement with BGP-4§ RFC 4724, Graceful Restart Mechanism for BGP-4§ RFC 6793, 4 bytes AS Number§ RFC 6811, BGP Prefix Origin Validation§ RFC 7454, BGP Operations and Security

Page 9: LARGE SCALE IP ROUTING LECTURE BY SEBASTIAN GRAF … · LARGE SCALE IP ROUTING LECTURE BY SEBASTIAN GRAF ... §RFC 4271, A Border Gateway Protocol 4 ... §RFC 3107, Carrying Label

9 | © by Xantaro

BGP Peering

§ BGP exchanges routing information between two routers called peers or neighbors or BGP speakers

§ BGP Session§ BGP session runs on top of a TCP session (port 179), i.e. BGP neighbors do not have to be

directly connected to each other§ Normally, BGP neighbors require configuration explicitly § External BGP (eBGP) session is running between routers in different autonomous systems.

► usually directly connected § Internal BGP (iBGP) session is running between BGP neighbors within the same

autonomous system.► direct or indirectly connected

Page 10: LARGE SCALE IP ROUTING LECTURE BY SEBASTIAN GRAF … · LARGE SCALE IP ROUTING LECTURE BY SEBASTIAN GRAF ... §RFC 4271, A Border Gateway Protocol 4 ... §RFC 3107, Carrying Label

10 | © by Xantaro

§ Idle§ Connect§ Active§ OpenSent§ OpenConfirmed§ Established

BGP Finite State Model

Page 11: LARGE SCALE IP ROUTING LECTURE BY SEBASTIAN GRAF … · LARGE SCALE IP ROUTING LECTURE BY SEBASTIAN GRAF ... §RFC 4271, A Border Gateway Protocol 4 ... §RFC 3107, Carrying Label

11 | © by Xantaro

BGP / IGP relation

§ Typically you would use an IGP like ISIS or OSPF as well as iBGP inside your network

§ use IGP to propagate internal prefixes like► loopback addresses► interface addresses

§ IGP will find the most efficient path between two iBGP speakers

§ use BGP to carry internet prefixes§ BGP next hops will point to addresses resolved by the IGP, hence a recursive lookup is

used to construct the FIB§ iBGP sessions are usually set up between loopbacks as this increases stability

► do not go down as long as there is a working path between two routers► can use load sharing for BGP prefixes, if multiple equal cost paths are available between

two iBGP peers

Page 12: LARGE SCALE IP ROUTING LECTURE BY SEBASTIAN GRAF … · LARGE SCALE IP ROUTING LECTURE BY SEBASTIAN GRAF ... §RFC 4271, A Border Gateway Protocol 4 ... §RFC 3107, Carrying Label

12 | © by Xantaro

BGP / IGP relation – Step 1 IGP convergence

§ Before iBGP can operate the IGP needs to converge§ Afterwards § R3 will know that 10.0.0.1/32 is reachable via R2 using next-hop 20.20.20.1§ R2 knows that it can reach 10.0.0.1/32 using it’s directly connected interface using next-hop

10.10.10.1§ similar operations will happen on R1 and R2 for the other IP addresses

Loopback 10.0.0.1/32 Loopback 10.0.0.2/32 Loopback 10.0.0.3/32IGP IGP

10.10.10.1/30 10.10.10.2/30 20.20.20.1/30 20.20.20.2/30R3R2R1

Page 13: LARGE SCALE IP ROUTING LECTURE BY SEBASTIAN GRAF … · LARGE SCALE IP ROUTING LECTURE BY SEBASTIAN GRAF ... §RFC 4271, A Border Gateway Protocol 4 ... §RFC 3107, Carrying Label

13 | © by Xantaro

BGP / IGP relation – Step 2 iBGP Session

§ After IGP convergence R3 and R1 can establish an iBGP session§ Once the iBGP session is created, R1 sends a prefix towards R3 using it’s loopback

address as next-hop§ R3 will do a recursive routing lookup by§ checking it’s routing table for the next-hop of 10.10.10.1§ than it will use the resolved next-hop (20.20.20.1) for the prefix 185.16.196.0

§ As traffic from R3 towards 185.16.196.0/24 will hit R2, it also needs to know the next hop§ R2 needs to have an iBGP session with R1 as well (not shown here)

Loopback 10.0.0.1/32 Loopback 10.0.0.2/32 Loopback 10.0.0.3/32IGP IGP

10.10.10.1/30 10.10.10.2/30 20.20.20.1/30 20.20.20.2/30R3R2R1

iBGP

Prefix 185.16.196.0/24Next-Hop 10.0.0.1

Page 14: LARGE SCALE IP ROUTING LECTURE BY SEBASTIAN GRAF … · LARGE SCALE IP ROUTING LECTURE BY SEBASTIAN GRAF ... §RFC 4271, A Border Gateway Protocol 4 ... §RFC 3107, Carrying Label

14 | © by Xantaro

BGP Messages Overview

§ Open (1)§ Update (2)§ Notification (3)§ Keepalive (4)§ Route Refresh (5)§ Defined in RFC 2918

Page 15: LARGE SCALE IP ROUTING LECTURE BY SEBASTIAN GRAF … · LARGE SCALE IP ROUTING LECTURE BY SEBASTIAN GRAF ... §RFC 4271, A Border Gateway Protocol 4 ... §RFC 3107, Carrying Label

15 | © by Xantaro

BGP Open Message

§ Allows BGP peers to negotiate session parameters§ Hold time§ Authentication data§ Capabilities§ Support for Network Layer Reachability Information (NLRI)

Page 16: LARGE SCALE IP ROUTING LECTURE BY SEBASTIAN GRAF … · LARGE SCALE IP ROUTING LECTURE BY SEBASTIAN GRAF ... §RFC 4271, A Border Gateway Protocol 4 ... §RFC 3107, Carrying Label

16 | © by Xantaro

BGP Update Message

§ Update messages contain a single set of BGP attributes and a couple of prefixes using those attributes§ BGP Attributes are encoded using TLV syntax

§ Update messages may contain prefixes which are no longer valid (withdrawn routes)

Page 17: LARGE SCALE IP ROUTING LECTURE BY SEBASTIAN GRAF … · LARGE SCALE IP ROUTING LECTURE BY SEBASTIAN GRAF ... §RFC 4271, A Border Gateway Protocol 4 ... §RFC 3107, Carrying Label

17 | © by Xantaro

BGP Notification Messages

§ When BGP peer detects an error, it sends a notification message and immediately closes both the BGP and TCP session

§ Notification message contains the reason for closing the session

Page 18: LARGE SCALE IP ROUTING LECTURE BY SEBASTIAN GRAF … · LARGE SCALE IP ROUTING LECTURE BY SEBASTIAN GRAF ... §RFC 4271, A Border Gateway Protocol 4 ... §RFC 3107, Carrying Label

18 | © by Xantaro

Notification Error Codes and Subcodes

Code Subcode Description Code Subcode Description

1 Message Header Error 3 Update Message Error

1 Connection not synchronized 1 Malformed attribute

2 Bad message length 2 Unrec. well-known attribute

3 Bad message type 3 Missing well-known attribute

2 Open Message Error 4 Attribute flag error

1 Unsupported version 5 Attribute length error

2 Bad peer AS 6 Invalid ORIGIN attribute

3 Bad BGP Identifier 8 Invalid NEXT_HOP attribute

4 Unsupported optional Parameter 9 Optional attribute error

5 Authentication failure 10 Invalid network field

6 Unacceptable hold timer 11 Malformed AS_PATH

7 Unsupported Capability 4 Hold timer

5 Finite State Machine Error

6 Cease

Page 19: LARGE SCALE IP ROUTING LECTURE BY SEBASTIAN GRAF … · LARGE SCALE IP ROUTING LECTURE BY SEBASTIAN GRAF ... §RFC 4271, A Border Gateway Protocol 4 ... §RFC 3107, Carrying Label

19 | © by Xantaro

BGP Keepalive Messages

§ BGP does not use any TCP-based keepalive messages to determine if peer is reachable

§ BGP Keepalive messages are sent one third of the hold timer§ If hold timer is set to zero, no BGP Keepalive messages are exchanged

Page 20: LARGE SCALE IP ROUTING LECTURE BY SEBASTIAN GRAF … · LARGE SCALE IP ROUTING LECTURE BY SEBASTIAN GRAF ... §RFC 4271, A Border Gateway Protocol 4 ... §RFC 3107, Carrying Label

20 | © by Xantaro

BGP Route Refresh Messages

§ Route refresh messages are sent to a peer to request retransmission of previously sent BGP updates

§ When willing to accept route refresh messages from its peer, BGP speaker should advertise the Route Refresh capability

§ This is useful if routing filters were updated to re-evaluate the routing policy§ some prefixes might have been dropped with the old policy and the new policy might allow

them

Page 21: LARGE SCALE IP ROUTING LECTURE BY SEBASTIAN GRAF … · LARGE SCALE IP ROUTING LECTURE BY SEBASTIAN GRAF ... §RFC 4271, A Border Gateway Protocol 4 ... §RFC 3107, Carrying Label

21 | © by Xantaro

BGP Capability Advertisement

§ BGP Capability Advertisement instead of mapping a set of supported features to a particular version of BGP§ Defined in RFC 3392§ Advertise support for each such feature at the BGP session establishment§ BGP Capability Advertisement provides a more flexible (and direct) way of introducing new

features§ BGP Multiprotocol extensions were the first application of BGP Capability Advertisement,

not the last one§ Thanks to BGP Capability Advertisement, today we still have BGPv4

Page 22: LARGE SCALE IP ROUTING LECTURE BY SEBASTIAN GRAF … · LARGE SCALE IP ROUTING LECTURE BY SEBASTIAN GRAF ... §RFC 4271, A Border Gateway Protocol 4 ... §RFC 3107, Carrying Label

22 | © by Xantaro

BGP Attributes

§ Four basic types of BGP attributes§ Well-known mandatory attributes: these attributes must be recognized by all BGP speakers

and must be included in all update messages. § Well-known discretionary attributes: these attributes must be recognized by all BGP

speakers and may be carried in updates but are not required in every update.§ Optional transitive attributes: these attributes may be recognized by some BGP speakers,

but not all. They should be preserved and advertised to all peers whether or not they are recognized.

§ Optional non-transitive attributes: these attributes may be recognized by some BGP speakers, but not all. If an update containing an optional transitive attribute is received, the update should be advertised to peers without the unrecognized attributes.

Page 23: LARGE SCALE IP ROUTING LECTURE BY SEBASTIAN GRAF … · LARGE SCALE IP ROUTING LECTURE BY SEBASTIAN GRAF ... §RFC 4271, A Border Gateway Protocol 4 ... §RFC 3107, Carrying Label

23 | © by Xantaro

BGP Attribute Overview

Value Code Ref. Value Code Ref.

1 ORIGIN RFC 4271 10 CLUSTER_LIST RFC 4456

2 AS_PATH RFC 4271 11 DPA unused

3 NEXT_HOP RFC 4271 12 ADVERTISER RFC 1863

4 MULTI_EXIT_DISC RFC 4271 13 CLUSTER_ID RFC 1863

5 LOCAL_PREF RFC 4271 14 MP_REACH_NLRI RFC 4760

6 ATOMIC_AGGREGATOR

RFC 4271 15 MP_UNREACH_NLRI RFC 4760

7 AGGREGATOR RFC 4271 16 EXTENDED COMMUNITIES

RFC 4360

8 COMMUNITY RFC 1997 17 AS4_PATH RFC 4893

9 ORIGINATOR_ID RFC 4456 18 AS4_AGGREGATOR RFC 4893

Page 24: LARGE SCALE IP ROUTING LECTURE BY SEBASTIAN GRAF … · LARGE SCALE IP ROUTING LECTURE BY SEBASTIAN GRAF ... §RFC 4271, A Border Gateway Protocol 4 ... §RFC 3107, Carrying Label

24 | © by Xantaro

BGP Attributes: Origin Code

§ Well-known mandatory attribute§ Attribute is generated by the BGP speaker that originates the routing information

and should not be changed§ Describes how the prefix was injected into BGP§ IGP (0): prefix was originated from interior gateway protocol§ EGP (1): prefix was originated from exterior gateway protocol (RFC 904)§ INCOMPLETE (2): prefix was originated by unknown source

§ Lower Value is preferred § IGP is better than EGP§ EGP is better than incomplete

Page 25: LARGE SCALE IP ROUTING LECTURE BY SEBASTIAN GRAF … · LARGE SCALE IP ROUTING LECTURE BY SEBASTIAN GRAF ... §RFC 4271, A Border Gateway Protocol 4 ... §RFC 3107, Carrying Label

25 | © by Xantaro

BGP Attributes: AS_PATH

§ AS_PATH attribute contains a sequence of autonomous system numbers that represent the path a route has traversed§ Attribute is modified only when a route is advertised to an eBGP peer. Each AS prepends its

own AS number to the path. § routes with shorter AS-PATH are preferred (each AS counts as one, regardless of it‘s

numerical value)

Page 26: LARGE SCALE IP ROUTING LECTURE BY SEBASTIAN GRAF … · LARGE SCALE IP ROUTING LECTURE BY SEBASTIAN GRAF ... §RFC 4271, A Border Gateway Protocol 4 ... §RFC 3107, Carrying Label

26 | © by Xantaro

BGP Attributes: NEXT_HOP

§ NEXT_HOP attribute carries the IP address of the next hop router to the route destination§ By default, NEXT_HOP is only modified across eBGP sessions§ Local router performs recursive lookup to find route to BGP next hop

Page 27: LARGE SCALE IP ROUTING LECTURE BY SEBASTIAN GRAF … · LARGE SCALE IP ROUTING LECTURE BY SEBASTIAN GRAF ... §RFC 4271, A Border Gateway Protocol 4 ... §RFC 3107, Carrying Label

27 | © by Xantaro

Excurse on NEXT_HOP

§ All eBGP peers must be reachable by all BGP-speaking routers within an autonomous system due to NEXT_HOP behavior across iBGP session

1. Redistribute connected interface to outside world into IGP2. Include links to eBGP neighbors into IGP and make them passive

§ Alternate design: modify NEXT_HOP processing at the network edge§ Make edge routers announce themselves as next hop

Page 28: LARGE SCALE IP ROUTING LECTURE BY SEBASTIAN GRAF … · LARGE SCALE IP ROUTING LECTURE BY SEBASTIAN GRAF ... §RFC 4271, A Border Gateway Protocol 4 ... §RFC 3107, Carrying Label

28 | © by Xantaro

BGP Attributes: Multiple Exit Discriminator

§ Prior to BGP-4, attribute was called inter-as metric§ MED is designed to be a tiebreaker for routes received from different external peers

in the same AS

Page 29: LARGE SCALE IP ROUTING LECTURE BY SEBASTIAN GRAF … · LARGE SCALE IP ROUTING LECTURE BY SEBASTIAN GRAF ... §RFC 4271, A Border Gateway Protocol 4 ... §RFC 3107, Carrying Label

29 | © by Xantaro

BGP Attributes: Local Preference

§ Determines the preferred exit out of the autonomous system§ Local Preference only used with iBGP§ Default value is 100 if not specified, higher value is more preferred§ dropped whenever a route is forwarded with eBGP

Page 30: LARGE SCALE IP ROUTING LECTURE BY SEBASTIAN GRAF … · LARGE SCALE IP ROUTING LECTURE BY SEBASTIAN GRAF ... §RFC 4271, A Border Gateway Protocol 4 ... §RFC 3107, Carrying Label

30 | © by Xantaro

BGP Attributes: Communities

§ A community is a group of destinations which share some common properties§ Before communities, route control was based on IP prefixes and AS_PATH only§ Communities simply control routing information§ Each AS can define which communities a prefix belongs to. By default, all prefixes belong to

the Internet community.§ Administrators may use communities to tag, identify, filter or manipulate routes§ No automatic action based on community values (except for predefined communities)

§ Predefined communities§ NO_EXPORT (0xFFFFFF01)

► instructs receiving BGP router to not advertise this route to external BGP peers § NO_ADVERTISE (0xFFFFFF02)

► instructs receicing BGP router to not advertise the route to any BGP peer§ NO_EXPORT_SUBCONF (0xFFFFFF03)

► instructs receiving BGP router to not advertise the route to external BGP peers (AS external and confederation external)

Page 31: LARGE SCALE IP ROUTING LECTURE BY SEBASTIAN GRAF … · LARGE SCALE IP ROUTING LECTURE BY SEBASTIAN GRAF ... §RFC 4271, A Border Gateway Protocol 4 ... §RFC 3107, Carrying Label

31 | © by Xantaro

BGP Community Encoding

§ 32-bit integers are not easy to handle§ More common convention is to split into two 16-bit values§ First value defines the scope of the community, i.e. it describes for which network the

information is provided (and helps preventing conflicting)§ Second value is an arbitrary tag for the target network

§ Example: 65001:1234§ Used by AS65001§ Community value within this scope is “1234“

§ Problematic if you have a 32bit AS number§ RFC 8092 defined a new community format with 96 bit§ first 32 bit used as global administrator (can encode 32 bit AS numbers)§ remaining 64 bit can be used as community value§ defined as optional transitive attribute

Page 32: LARGE SCALE IP ROUTING LECTURE BY SEBASTIAN GRAF … · LARGE SCALE IP ROUTING LECTURE BY SEBASTIAN GRAF ... §RFC 4271, A Border Gateway Protocol 4 ... §RFC 3107, Carrying Label

32 | © by Xantaro

BGP Route Selection Process

§ A router will use the following list to compare equal prefixes and find a best path:

1. Exclude routes with the local AS number in the AS-PATH (loop)2. Exclude routes with inaccessible Next_Hop attribute value3. Prefer highest Local Preference4. Prefer shortest AS Path length5. Prefer lowest Origin attribute value6. Prefer lowest Multiple Exit Discriminator (MED) attribute value7. Prefer external paths (eBGP) over internal paths (iBGP) - (aka as hot potatoes

routing)8. For iBGP paths, prefer path with lowest IGP metric to the advertised BGP Next

Hop.9. Prefer shortest Cluster-List length (if Route reflection is used) 10. Prefer route from the peer with the lowest Router ID11. Prefer route from the peer with the lowest Peer Address.

Page 33: LARGE SCALE IP ROUTING LECTURE BY SEBASTIAN GRAF … · LARGE SCALE IP ROUTING LECTURE BY SEBASTIAN GRAF ... §RFC 4271, A Border Gateway Protocol 4 ... §RFC 3107, Carrying Label

33 | © by Xantaro

Scaling iBGP Implementation

§ All BGP speakers within a single AS must be fully meshed § Loop prevention is only working between autonomous systems (block routes with our own

AS in the AS-path list)§ this results in a simple split horizon rule

► Routes received from internal BGP (iBGP) peers may be forwarded to external peers, but must not be forwarded to other internal BGP (iBGP) peers

► Routes received from external BGP (eBGP) peers may be forwarded to external andinternal peers

§ As a consequence, BGP speakers need to be fully meshed inside an AS§ For N BGP speakers, a total number of N*(N-1)/2 sessions are required§ Obviously this does not scale well in large networks, therefore 2 Scaling methods

exists:§ Route Reflectors (commonly used today)§ Confederations (more exotic)

Page 34: LARGE SCALE IP ROUTING LECTURE BY SEBASTIAN GRAF … · LARGE SCALE IP ROUTING LECTURE BY SEBASTIAN GRAF ... §RFC 4271, A Border Gateway Protocol 4 ... §RFC 3107, Carrying Label

34 | © by Xantaro

§ Remember: iBGP Split Horizon§ Route reflectors are designed to § simplify the network§ allow easy transition from full-mesh to new topology§ to be backward compatible in case some routers

do not understand route reflection

§ Route reflectors are allowed to re-advertise (or reflect) iBGP-learned routes to some other iBGPneighbors

Route Reflectors

Page 35: LARGE SCALE IP ROUTING LECTURE BY SEBASTIAN GRAF … · LARGE SCALE IP ROUTING LECTURE BY SEBASTIAN GRAF ... §RFC 4271, A Border Gateway Protocol 4 ... §RFC 3107, Carrying Label

35 | © by Xantaro

Route Reflector Operation

§ Internal neighbors of a route reflector are either clients or non-clients§ Routes received from non-clients are reflected to clients only§ Routes received from clients are reflected to clients and non-clients

§ Route reflector and its clients form a cluster§ Non-Clients are considered to be fully meshed between each other, hence no

reflection from non-client to other non-clients

Page 36: LARGE SCALE IP ROUTING LECTURE BY SEBASTIAN GRAF … · LARGE SCALE IP ROUTING LECTURE BY SEBASTIAN GRAF ... §RFC 4271, A Border Gateway Protocol 4 ... §RFC 3107, Carrying Label

36 | © by Xantaro

Route Reflector Redundancy

§ Route reflectors could be single point of failure§ BGP speaker can be client to multiple route reflectors§ Setup at least two route reflectors for redundancy§ Full mesh route reflectors!

Page 37: LARGE SCALE IP ROUTING LECTURE BY SEBASTIAN GRAF … · LARGE SCALE IP ROUTING LECTURE BY SEBASTIAN GRAF ... §RFC 4271, A Border Gateway Protocol 4 ... §RFC 3107, Carrying Label

37 | © by Xantaro

Solving Route Reflector Loops

§ Possible routing loops are avoided by using new BGP attributes§ ORIGINATOR_ID is set to the BGP Identifier of the originator within the local AS§ CLUSTER_LIST is a sequence of cluster IDs representing the reflection path

Page 38: LARGE SCALE IP ROUTING LECTURE BY SEBASTIAN GRAF … · LARGE SCALE IP ROUTING LECTURE BY SEBASTIAN GRAF ... §RFC 4271, A Border Gateway Protocol 4 ... §RFC 3107, Carrying Label

38 | © by Xantaro

Route Reflection Loss of Visibility

§ A Route Reflector will only reflect best routes according to it’s local view

§ Consider a Route Reflector located in Munich that is getting equal BGP routes from peerings in Berlin and Stuttgart§ According to the IGP metric, Munich can

reach ► Stuttgart with a metric of 100► Berlin with a metric of 1000► Hamburg with a metric of 1100

§ Munich will follow BGP bestpath selection process to find the best prefix► for it’s own routing decision► for reflection to route reflector clients

Picture by wikipedia : https://commons.wikimedia.org/wiki/File:Karte_Bundesrepublik_Deutschland.svg

Berlin

Hamburg

StuttgartMunich

(Route Reflector for all other routers)

100

1000

1000

100

Page 39: LARGE SCALE IP ROUTING LECTURE BY SEBASTIAN GRAF … · LARGE SCALE IP ROUTING LECTURE BY SEBASTIAN GRAF ... §RFC 4271, A Border Gateway Protocol 4 ... §RFC 3107, Carrying Label

39 | © by Xantaro

Route Reflection Loss of Visibility

§ Two identical Prefixes are received from Hamburg and Berlin

1. both routes have no AS-loop

2. both routes have valid (known by IGP) next-hops

3. both routes have default local preference (100)

4. both routes have AS-Paths of equal length

5. both routes have IGP as origin code

6. both routes have no Multiple Exit Discriminator (MED)

7. both routes are recieved from iBGP peers

8. IGP metric to Stuttgart is smaller than to Berlin

§ This decision is not optimal for HamburgPicture by wikipedia :

https://commons.wikimedia.org/wiki/File:Karte_Bundesrepublik_Deutschland.svg

200.0.0.0/24AS-PATH = 100 200 400Next-Hop = Loopback Berlin

Berlin

Hamburg

StuttgartMunich

(Route Reflector for all other routers)

200.0.0.0/24AS-PATH = 600 1000 400Next-Hop = Loopback Stuttgart

200.0.0.0/24AS-PATH = 600 1000 400Next-Hop = Loopback Stuttgart

Page 40: LARGE SCALE IP ROUTING LECTURE BY SEBASTIAN GRAF … · LARGE SCALE IP ROUTING LECTURE BY SEBASTIAN GRAF ... §RFC 4271, A Border Gateway Protocol 4 ... §RFC 3107, Carrying Label

40 | © by Xantaro

Using geographically distributed route reflectors

§ One solution is to locate route reflectors in each geographical region

§ In this case Hamburg is made a route reflector as well as Munich

§ As a consequence, Hamburg can see both prefixes and make a local bestpath decision

§ Problems§ does not scale well in larger networks§ with increasing number of route

reflectors, the design get’s similar as iBGP full-mesh with the same challenges

Picture by wikipedia : https://commons.wikimedia.org/wiki/File:Karte_Bundesrepublik_Deutschland.svg

200.0.0.0/24AS-PATH = 100 200 400Next-Hop = Loopback Berlin

Berlin

Hamburg

StuttgartMunich

(Route Reflector for all other routers)

200.0.0.0/24AS-PATH = 600 1000 400Next-Hop = Loopback Stuttgart

200.0.0.0/24AS-PATH = 600 1000 400Next-Hop = Loopback Stuttgart

200.0.0.0/24AS-PATH = 100 200 400Next-Hop = Loopback Berlin

Page 41: LARGE SCALE IP ROUTING LECTURE BY SEBASTIAN GRAF … · LARGE SCALE IP ROUTING LECTURE BY SEBASTIAN GRAF ... §RFC 4271, A Border Gateway Protocol 4 ... §RFC 3107, Carrying Label

41 | © by Xantaro

Using BGP-Add-path

§ RFC7911 defines the advertisement ofmultiple paths to a BGP neighbor

§ Munich can send both paths toHamburg without making a localselection

§ Hamburg can now make a routingdecision based on local IGP metric

§ Problems§ increases the amount of routing

information on route reflector clients§ Needs to be understood by route reflector

client as it uses a different encoding ofnext-hop attribute (negotiated withcapability advertisement)

Picture by wikipedia : https://commons.wikimedia.org/wiki/File:Karte_Bundesrepublik_Deutschland.svg

200.0.0.0/24AS-PATH = 100 200 400Next-Hop = Loopback Berlin

Berlin

Hamburg

StuttgartMunich

(Route Reflector for all other routers)

200.0.0.0/24AS-PATH = 600 1000 400Next-Hop = Loopback Stuttgart

200.0.0.0/24AS-PATH = 600 1000 400Next-Hop = Loopback Stuttgart

200.0.0.0/24AS-PATH = 100 200 400Next-Hop = Loopback Berlin

Page 42: LARGE SCALE IP ROUTING LECTURE BY SEBASTIAN GRAF … · LARGE SCALE IP ROUTING LECTURE BY SEBASTIAN GRAF ... §RFC 4271, A Border Gateway Protocol 4 ... §RFC 3107, Carrying Label

42 | © by Xantaro

Using Optimal Route Reflection

§ defined by draft-ietf-idr-bgp-optimal-route-reflection-13

§ Based on topology information of IGP§ Before reflecting a route towards Hamburg,

Munich will check which path is better (fromIGP perspective) for Hamburg§ Only path from Berlin is send to Hamburg due

to closer IGP metric from Hamburg to Berlin

§ Does not need special support in route reflector clients, as it is a local decision bythe RR

§ Problems§ increases the processing on the route reflector

§ only works effective within the same IGP area

Picture by wikipedia : https://commons.wikimedia.org/wiki/File:Karte_Bundesrepublik_Deutschland.svg

200.0.0.0/24AS-PATH = 100 200 400Next-Hop = Loopback Berlin

Berlin

Hamburg

StuttgartMunich

(Route Reflector for all other routers)

200.0.0.0/24AS-PATH = 600 1000 400Next-Hop = Loopback Stuttgart

200.0.0.0/24AS-PATH = 100 200 400Next-Hop = Loopback Berlin

Page 43: LARGE SCALE IP ROUTING LECTURE BY SEBASTIAN GRAF … · LARGE SCALE IP ROUTING LECTURE BY SEBASTIAN GRAF ... §RFC 4271, A Border Gateway Protocol 4 ... §RFC 3107, Carrying Label

43 | © by Xantaro

Hierarchical Route Reflector-Design

§ Route Reflectors themselves can be clients to other route reflectors§ Scalability can be improved by building hierarchical route reflector designs

Page 44: LARGE SCALE IP ROUTING LECTURE BY SEBASTIAN GRAF … · LARGE SCALE IP ROUTING LECTURE BY SEBASTIAN GRAF ... §RFC 4271, A Border Gateway Protocol 4 ... §RFC 3107, Carrying Label

44 | © by Xantaro

Confederations

§ Alternative to iBGP Route Reflection§ Breaks a global autonomous system into multiple pieces (sub-AS)§ Within each sub-AS§ Use private AS numbers§ iBGP full-mesh topology is still required (may use route relection within sub-AS)

§ Between each sub-AS:§ eBGP-type configurations (CBGP) are required § Most iBGP parameters are not changed. However, AS_PATH attribute is modified to prevent

loops

§ Global AS is still viewed externally as a single AS

Page 45: LARGE SCALE IP ROUTING LECTURE BY SEBASTIAN GRAF … · LARGE SCALE IP ROUTING LECTURE BY SEBASTIAN GRAF ... §RFC 4271, A Border Gateway Protocol 4 ... §RFC 3107, Carrying Label

45 | © by Xantaro

Confederation Peering

Page 46: LARGE SCALE IP ROUTING LECTURE BY SEBASTIAN GRAF … · LARGE SCALE IP ROUTING LECTURE BY SEBASTIAN GRAF ... §RFC 4271, A Border Gateway Protocol 4 ... §RFC 3107, Carrying Label

46 | © by Xantaro

BGP BESTPATH SELECTION EXAMPLES

Page 47: LARGE SCALE IP ROUTING LECTURE BY SEBASTIAN GRAF … · LARGE SCALE IP ROUTING LECTURE BY SEBASTIAN GRAF ... §RFC 4271, A Border Gateway Protocol 4 ... §RFC 3107, Carrying Label

47 | © by Xantaro

Examples for BGP bestpath Selection

§ The following slides show examples on how BGP will select it’s best routes with some examples

§ In all examples we will assume that§ the next-hop is valid (known by IGP or a directly connected interface)§ the local AS of the router which is checking the routes is 100

§ Remember that his process will only start for identical prefixes (IP Subnet Address and subnet mask is the same

§ The following slides show examples where two prefixes are compared, the process for comparing more similar prefixes is analogue

Page 48: LARGE SCALE IP ROUTING LECTURE BY SEBASTIAN GRAF … · LARGE SCALE IP ROUTING LECTURE BY SEBASTIAN GRAF ... §RFC 4271, A Border Gateway Protocol 4 ... §RFC 3107, Carrying Label

48 | © by Xantaro

Example 1 – Prefixes received by AS 100

IP Prefix : 185.16.196.0/24

AS-Path : “123 8777 3250 203580”

Local Preference : 100

Origin : IGP

MED: not set

learned via : eBGP

IGP Metric to next-hop : 10

Cluster-List : not set

Peer Router ID : 30.30.30.30

Peer Address : 23.9.9.1

IP Prefix : 185.16.196.0/24

AS-Path : “455 19 5 203580”

Local Preference : 100

Origin : IGP

MED: not set

learned via : eBGP

IGP Metric to next-hop : 10

Cluster-List : not set

Peer Router ID : 40.40.40.40

Peer Address : 25.10.9.3

2. same Prefix

1. no loop

3. same local pref

4. same length

5. same origin

6. same MED

7. same type (eBGP)

8. not relevant (eBGP)

9. same list

10. smaller ID

11. not checked

Result : Left Prefix selected because Router ID of advertising router is smaller

Page 49: LARGE SCALE IP ROUTING LECTURE BY SEBASTIAN GRAF … · LARGE SCALE IP ROUTING LECTURE BY SEBASTIAN GRAF ... §RFC 4271, A Border Gateway Protocol 4 ... §RFC 3107, Carrying Label

49 | © by Xantaro

Example 2 – Prefixes received by AS 100

IP Prefix : 185.16.196.0/24

AS-Path : “123 100 203580”

Local Preference : 100

Origin : IGP

MED: not set

learned via : eBGP

IGP Metric to next-hop : 10

Cluster-List : not set

Peer Router ID : 30.30.30.30

Peer Address : 23.9.9.1

IP Prefix : 185.16.196.0/24

AS-Path : “455 19 5 203580”

Local Preference : 100

Origin : IGP

MED: not set

learned via : eBGP

IGP Metric to next-hop : 10

Cluster-List : not set

Peer Router ID : 40.40.40.40

Peer Address : 25.10.9.3

2. not checked

1. AS loop!

3. not checked

4. not checked

5. not checked

6. not checked

7. not checked

8. not checked

9. not checked

10. not checked

11. not checked

Result : Right Prefix selected because AS-Path of left prefix contains local AS.

Page 50: LARGE SCALE IP ROUTING LECTURE BY SEBASTIAN GRAF … · LARGE SCALE IP ROUTING LECTURE BY SEBASTIAN GRAF ... §RFC 4271, A Border Gateway Protocol 4 ... §RFC 3107, Carrying Label

50 | © by Xantaro

Example 3 – Prefixes received by AS 100

IP Prefix : 185.16.196.0/24

AS-Path : “123 8777 3250 203580”

Local Preference : 100

Origin : IGP

MED: not set

learned via : iBGP

IGP Metric to next-hop : 10

Cluster-List : not set

Peer Router ID : 30.30.30.30

Peer Address : 23.9.9.1

IP Prefix : 185.16.196.0/24

AS-Path : “455 19 5 203580”

Local Preference : 100

Origin : IGP

MED: not set

learned via : eBGP

IGP Metric to next-hop : 10

Cluster-List : not set

Peer Router ID : 40.40.40.40

Peer Address : 25.10.9.3

2. same Prefix

1. no loop

3. same local pref

4. same length

5. same origin

6. same MED

7. prefer eBGP

8. not checked

9. not checked

10. not checked

11. not checked

Result : Right Prefix selected because eBGP is preferred over iBGP (hot potato routing)

Page 51: LARGE SCALE IP ROUTING LECTURE BY SEBASTIAN GRAF … · LARGE SCALE IP ROUTING LECTURE BY SEBASTIAN GRAF ... §RFC 4271, A Border Gateway Protocol 4 ... §RFC 3107, Carrying Label

51 | © by Xantaro

Example 4 – Prefixes received by AS 100

IP Prefix : 185.16.196.0/24

AS-Path : “455 19 5 203580”

Local Preference : 100

Origin : IGP

MED: 100

learned via : iBGP

IGP Metric to next-hop : 10

Cluster-List : not set

Peer Router ID : 30.30.30.30

Peer Address : 23.9.9.1

IP Prefix : 185.16.196.0/24

AS-Path : “455 19 5 203580”

Local Preference : 100

Origin : IGP

MED: not set

learned via : eBGP

IGP Metric to next-hop : 10

Cluster-List : not set

Peer Router ID : 40.40.40.40

Peer Address : 25.10.9.3

2. same Prefix

1. no loop

3. same local pref

4. same length

5. same origin

6. different MED

7. not checked

8. not checked

9. not checked

10. not checked

11. not checked

Result : Depends on Routing Platform and / or configuration. Missing MED can be interpreted as worst (very high) or 0 (very low). Typically it considered as 0, in which case the right prefix would be preferred.

Page 52: LARGE SCALE IP ROUTING LECTURE BY SEBASTIAN GRAF … · LARGE SCALE IP ROUTING LECTURE BY SEBASTIAN GRAF ... §RFC 4271, A Border Gateway Protocol 4 ... §RFC 3107, Carrying Label

52 | © by Xantaro

Example 5 – Prefixes received by AS 100

IP Prefix : 185.16.196.0/24

AS-Path : “49164 21031 203580”

Local Preference : 100

Origin : IGP

MED: not set

learned via : eBGP

IGP Metric to next-hop : 10

Cluster-List : not set

Peer Router ID : 30.30.30.30

Peer Address : 23.9.9.1

IP Prefix : 185.16.196.0/24

AS-Path : “455 19 5 203580”

Local Preference : 100

Origin : IGP

MED: not set

learned via : eBGP

IGP Metric to next-hop : 10

Cluster-List : not set

Peer Router ID : 40.40.40.40

Peer Address : 25.10.9.3

2. same Prefix

1. no loop

3. same local pref

4. shorter AS-Path

5. not checked

6. not checked

7. not checked

8. not checked

9. not checked

10. not checked

11. not checked

Result : Left Prefix selected because AS-Path is shorter (3 AS versus 4 AS)

Page 53: LARGE SCALE IP ROUTING LECTURE BY SEBASTIAN GRAF … · LARGE SCALE IP ROUTING LECTURE BY SEBASTIAN GRAF ... §RFC 4271, A Border Gateway Protocol 4 ... §RFC 3107, Carrying Label

53 | © by Xantaro

Example 6 – Prefixes received by AS 100

IP Prefix : 185.16.196.0/24

AS-Path : “49164 21031 203580”

Local Preference : 100

Origin : IGP

MED: not set

learned via : eBGP

IGP Metric to next-hop : 10

Cluster-List : not set

Peer Router ID : 30.30.30.30

Peer Address : 23.9.9.1

IP Prefix : 185.16.196.0/24

AS-Path : “455 19 5 203580”

Local Preference : 1000

Origin : IGP

MED: not set

learned via : eBGP

IGP Metric to next-hop : 10

Cluster-List : not set

Peer Router ID : 40.40.40.40

Peer Address : 25.10.9.3

2. same Prefix

1. no loop

3. higher pref

4. not checked

5. not checked

6. not checked

7. not checked

8. not checked

9. not checked

10. not checked

11. not checked

Result : Right Prefix selected because Local Preference is higher (even AS path ofright prefix is longer)

Page 54: LARGE SCALE IP ROUTING LECTURE BY SEBASTIAN GRAF … · LARGE SCALE IP ROUTING LECTURE BY SEBASTIAN GRAF ... §RFC 4271, A Border Gateway Protocol 4 ... §RFC 3107, Carrying Label

54 | © by Xantaro

Example 7 – Prefixes received by AS 100

IP Prefix : 185.16.196.0/25

AS-Path : “49164 21031 203580”

Local Preference : 500

Origin : IGP

MED: not set

learned via : iBGP

IGP Metric to next-hop : 10

Cluster-List : not set

Peer Router ID : 30.30.30.30

Peer Address : 23.9.9.1

IP Prefix : 185.16.196.0/24

AS-Path : “455 19 5 203580”

Local Preference : 120

Origin : IGP

MED: not set

learned via : eBGP

IGP Metric to next-hop : 10

Cluster-List : not set

Peer Router ID : 40.40.40.40

Peer Address : 25.10.9.3

2. different Prefix

1. no loop

3. not checked

4. not checked

5. not checked

6. not checked

7. not checked

8. not checked

9. not checked

10. not checked

11. not checked

Result : Both Prefixes are accepted, as they have different Subnet Masks. IP Traffic to 185.16.196.0-128 will use left prefix, traffic to 185.16.96.129-255 the right prefix

Page 55: LARGE SCALE IP ROUTING LECTURE BY SEBASTIAN GRAF … · LARGE SCALE IP ROUTING LECTURE BY SEBASTIAN GRAF ... §RFC 4271, A Border Gateway Protocol 4 ... §RFC 3107, Carrying Label

55 | © by Xantaro

Outlook

§ BGP was also adopted to carry information for other protocols as IPv4§ Non-IPv4 protocols are encoded using Address Family Identifier (AFI)§ Multiprotocol BGP support is advertised using capability advertisement§ Other uses include transmitting VPN information across a provider core network§ MPLS Layer 3 VPN labels§ MPLS Layer 2 VPN labels§ MPLS EVPN Labels§ MPLS Labeled Unicast§ IPv6 over IPv4 BGP Sessions