protocol implementation next-hop resolution reliability and graceful restart

Protocol implementation

• Next-hop resolution

• Reliability and graceful restart

What is a next-hop• The destination of the packets I am sending

– Not the same as the interface– An ethernet interface will have many nodes behind it – Directly connected next hop is 1 hop away

• E.g. RSVP sends a PATH message to the next downstream node – Next hop may be directly connected (strict ERO)– Or not (loose ERO)

• OSPF sends an LS update to the other end of a link or a neighbor on an eithernet – Always directly connected

• BGP has an iBGP-next hop for each of its paths – Not directly connected

Next-hop

• If the next hop is not directly connected the way to reach it depends on the IGP– May change when IGP routing changes– Will have to use a different interface to reach it – Need to keep track of these changes

• Next hop resolution

Next hop resolution • Periodic resolution

– may take a bit more time • But next-hops will not be too many• Or will they? Tunnels, VLANs …

– Quagga uses this approach • Through the IPV4_LOOKUP_NEXTHOP command

• Registration/notification – RSVP would tell zebra which nexthops it is interested

in – Zebra will notify RSVP when something changes in the

IGP path to it• Better scaling for RSVP • Difficult to ensure good scaling inside zebra

– Various protocols may register 1000s of next hops

• More complex code in zebra

Network Reliability • Availability: How many nines?

– 99.999% is 5.26 min down time/year– 99.9999% is 31.5 sec down time/year

• Telephone networks are between 5 and 6 nines– Internet will have to get there – Currently at 4 nines? (vendors claim 5) – Very important with the new types of traffic

• Voip, Ipvt

• What can go wrong (% of failures for US telephone network ca. 1992): – Hardware failures (19%)– Software failures (14%) – Human errors (49%) – Vandalism/Terrorism – Acts of nature (11%)– Overload (6% but had the largest impact on customers)

Hardware failures • Link failures

– Protocols can cope with that • Re-route, may be slow • More aggressive repair methods

– we will see them later

• Router failures– Can not do much just add redundancy

• Power supplies, fans, disks, etc

– Line-card failure is similar to a link failure– Control processor failure is more serious

• Always have two of them• Primary and backup

Modern Router architectures

• Dual controllers – For running the control plane

• Multiple line-cards – Can operate without the controllers – Router can forward traffic even when the

control plane crashes– Called non-stop forwarding or head-less

operation

Software failures • When primary fails start using backup

– Switchover

• Must be as fast as possible – Things in the network change in the meanwhile – Need to minimize this window

• What happens with the control software – Need to keep primary and backup instance in

sync– How tight is this synchronization?

Tight synchronization• Both primary and backup are active, keep

them in sync by:• Send them both the same input (I.e.

duplicate control packets)– Fastest possible switchover – Expensive, may need to duplicate packets– Does not work for TCP based protocols

• The primary keeps sending state updates to the backup – May need to send too many messages

• Being totally in-sync is not easy – Needs transactional communication

Loose synchronization

• Backup is idle– But we keep configuration up to date – Each configuration change on the primary is mirrored on the

backup

• Backup instance is started when the primary fails– Switchover will take longer

• Much-much simpler – Configuration changes are much less

• Variation:– Keep only the RIB process in sync in both primary and

backup

Non-stop forwarding

• Key concept– forwarding happens in the line cards

– Even if control processor fails forwarding can continue

– Non stop forwarding, head-less operation

• Old Common sense: when router s/w crashes do not use the router – But with head-less operation it is ok to continue using

routers that their s/w crashed

– Assuming their s/w will be operational again soon

Special Case

• Planned restart – For s/w upgrade

• These are a significant percentage of downtime

– For refresh • Memory is leaking but s/w still operational

• Restart to get a clean start

• I can use graceful restart

Graceful Restart• Other routers in the network will keep using a

neighbor router – Even if is looks like its control plane has failed – Assuming it will come back soon

• Needs coordination– The failed router needs to do some special processing

when it comes back– It has to tell its neighbors first that it supports graceful

restart

• Zero impact on the network – The failed router will have the chance to restart its s/w

and come back– Nobody in the rest of the network will know that

something happened

How does it work• Used for all protocols by now

– OSPF, BGP, RSVP-TE…

• The neighbor will discover that the router is dead or it has restarted – HELLO timeout, different information in the

HELLOs etc… – But will ignore it for a certain time period

• If the failed router comes back within this period – It will re-sync its state (database exchange for

OSPF, resend all the LSPs for RSVP, …)– And all is back to normal

Example RSVP • Use HELLOs

• Special recovery label messages

• Restarting router needs to remember the labels it allocated before the crash – Where?

• Shared memory

• recover them from the forwarding plane

– Why? • Must use the same labels again

• Must make sure it does not use an allocated label for some other LSP

Example OSPF

• Trick is to re-establish the adjacencies after a failure

• Remember the set of neighbors – Shared memory or in the backup controller

• After restart do not originate any LSAs

• Just re-establish adjacencies and re-sync database

Graceful restart catches • All routers in the network should implement

this to work

• Mostly for planned restarts:– S/w upgrades – Refreshes (if a router runs low on memory)– But it is possible to use for crashes too!

• It can not work if something changes in the network while the restart is going on– There may be routing loops

Router self-monitoring • Automatically restart failed or stuck

processes

• A separate monitor process – Keeps an eye on other processes– If there is a failure the failed process is restarted

• Of course it may fail again

– Heart-beats to determine liveness – Failure may not necessarily be a crash

• Could be a software bug that causes an infinite loop or very-very slow processing

Why is it important

• Remember the PoP structure – Need dual routers for reliability – If I had a single router that was extra-reliable I

could save a lot of money

Issues

• Strict Isolation – VMs – Other methods

• Global resource coordination– For example memory

protocol implementation next-hop resolution reliability and graceful restart

Documents