fib 2.0: hierarchical, protocol independent. · figure 3: route object diagram fib sources there...

20
FIB 2.0: Hierarchical, Protocol Independent.

Upload: others

Post on 22-Sep-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: FIB 2.0: Hierarchical, Protocol Independent. · Figure 3: Route object diagram FIB sources There are various entities in the system that can add routes to the FIB tables. Each of

FIB2.0:Hierarchical,ProtocolIndependent.

Page 2: FIB 2.0: Hierarchical, Protocol Independent. · Figure 3: Route object diagram FIB sources There are various entities in the system that can add routes to the FIB tables. Each of

TableofContentsPrerequisites..........................................................................................................................................3

Graphs................................................................................................................................................3

Prefixes...............................................................................................................................................3

TheDataModel......................................................................................................................................4

ControlPlane......................................................................................................................................4

ARPEntries.....................................................................................................................................4

Routes............................................................................................................................................5

AttachedExport............................................................................................................................11

GraphWalks.................................................................................................................................13

Data-Plane........................................................................................................................................14

Tunnels.................................................................................................................................................17

MPLSFIB...............................................................................................................................................18

Implementation................................................................................................................................19

Tunnels.............................................................................................................................................19

FastConvergence.................................................................................................................................20

Figure1:ARPdatamodel.......................................................................................................................4Figure2:Routedatamodel–classdiagram..........................................................................................5Figure3:Routeobjectdiagram..............................................................................................................7Figure4:Recursiverouteclassdiagram.................................................................................................9Figure5:RecursiveRoutesobjectdiagram..........................................................................................10Figure6:AttachedExportClassdiagram..............................................................................................12Figure7:AttachedExportobjectdiagram...........................................................................................12Figure8:DPOcontributionsforanon-recursiveroute........................................................................15Figure9:DPOcontributionforarecursiveroute.................................................................................16Figure10:DPOContributionsfromlabelledrecursiveroutes.............................................................16Figure11:Tunnelcontrolplaneobjectdiagram..................................................................................18

Page 3: FIB 2.0: Hierarchical, Protocol Independent. · Figure 3: Route object diagram FIB sources There are various entities in the system that can add routes to the FIB tables. Each of

PrerequisitesThis section describes some prerequisite topics and nomenclature that are foundational tounderstandingtheFIBarchitecture.

GraphsTheFIBisessentiallyacollectionofrelatedgraphs.Terminologyfromgraphtheoryisoftenusedinthesectionsthatfollow.FromWikipedia:

“...agraphisarepresentationofasetofobjectswheresomepairsofobjectsareconnectedbylinks.Theinterconnectedobjectsarerepresentedbymathematicalabstractionscalledvertices(alsocallednodesorpoints),andthelinksthatconnectsomepairsofverticesarecallededges(alsocalledarcsorlines)...edgesmaybedirectedorundirected.”

In a directed graph the edges can only be traversed in one direction – from child to parent. Thenamesarechosentorepresentthemanytoonerelationship.Achildhasoneparent,butaparentmany children. In undirected graphs the edge traversal can be in either direction, but in FIB theparentchildnomenclatureremainstorepresentthemanytoonerelationship.Childrenofthesameparent are termed siblings. When the traversal is from child to parent it is considered to be aforward traversal, orwalk, and fromparent to themany childrenabackwalk. Forwardwalks arecheapsince theystart fromthemanyandmove toward the few.Backwalksareexpensiveas thestartfromthefewandvisitthemany.

Themanytoonerelationshipbetweenchildandparentmeansthatthelifetimeofaparentobjectmustextendto the lifetimeof itschildren. If thecontrolplaneremovesaparentobjectbefore itschildren, then theparentmust remain, in an ‘incomplete’ state,until the childrenare themselvesremoved.Likewise ifachild iscreatedbefore itsparent, theparent iscompleted inan incompletestate. These incomplete objects are needed to maintain the graph dependencies. Without themwhentheparentisaddedfindingtheaffectedchildrenwouldbesearchthroughmanydatabasesforthosechildren.Toextendthelifetimeofparentsallchildrenthereofholda‘lock’ontheparent.Thisis a simple reference count. Children then follow the add-or-lock/unlock semantics for finding aparent,asopposedtoamalloc/free.

PrefixesSomenomenclatureusedtodescribeprefixes;

• 1.1.1.1–thisisanaddresssinceithasnoassociatedmask• 1.1.1.0/24–thisisaprefix.• 1.1.1.1/32–thisisahostprefix(themasklengthisthesizeoftheaddress).

PrefixAismorespecificthanBifitsmasklengthislonger,andlessspecificifthemaskisshorter.Forexample,1.1.1.0/28ismorespecificthan1.1.1.0/24.Alessspecificprefixthatoverlapswithamorespecific is the ‘covering’ prefix. For example, 1.1.1.0/24 is the covering prefix for 1.1.1.0/28 and1.1.1.0/28istermedthe‘covered’prefix.Acoveringprefixisthereforealwayslessspecificthanitscoveredprefixes.

Page 4: FIB 2.0: Hierarchical, Protocol Independent. · Figure 3: Route object diagram FIB sources There are various entities in the system that can add routes to the FIB tables. Each of

TheDataModelTheFIBdatamodelcomprisestwoparts;thecontrol-plane(CP)andthedata-plane(DP).TheCPdatamodel represents the data that is programmed into VPP by the upper layers. The DP modelrepresentshowVPPderivesactionstobeperformedonpacketsaretheyareswitched.

ControlPlaneThecontrolplanefollowsalayereddatarepresentation.Thisdocumentdescribesthemodelstartingfrom the lowest layer. The description uses IPv4 addresses and protocols, but all concepts applyequallytotheIPv6equivalents.ThediagramsallportraytheCLIcommandtoinstalltheinformationin VPP and an [approximation of] a UML diagram1 of the data structures used to represent thatinformation.

ARPEntries

Figure1:ARPdatamodel

Figure 1 shows the datamodel for an ARP entry. An ARP entry contains themapping between apeer,identifiedbyanIPv4address,anditsMACaddressonagiveninterface.TheVRFtheinterfaceisboundto,isnotpartofthedata.VRFsareaningressfunctionnotegress.TheARPentrydescribeshowtosendtraffictoapeer,whichisanegressfunction.

Thearp_entry_trepresentsthecontrol-planeadditionoftheARPentry.Theip_adjacency_tcontainsthedataderived fromthearp_entry_t that isneedto forwardpackets to thepeer.Theadditionaldataintheadjacencyaretherewriteandthelink_type.Thelink_typeisadescriptionoftheprotocolof thepackets thatwillbe forwardedwith thisadjacency; thiscanbe IPv4orMPLS.The link_typemapsdirectly to theether-type in anEthernetheader, or theprotocol filed in aGREheader. Therewriteisabytestringrepresentationoftheheaderthatwillbeprependedtothepacketwhenitis

1 The arrow indicates a ‘has-a’ relationship. The object attached to the arrow head ‘has-a’ instance of theother.Thenumbersnexttothearrowsindicatethemultiplicity,i.e.objectAhasntominstancesofobjectB.The difference between a UML association and aggregation is not conveyed in any diagrams. To UMLaficionados,Iapologize.Wordisnotthebestdrawingtool.

Page 5: FIB 2.0: Hierarchical, Protocol Independent. · Figure 3: Route object diagram FIB sources There are various entities in the system that can add routes to the FIB tables. Each of

senttothatpeer.ForEthernetinterfacesthiswouldbethesrc,dstMACandtheether-type.ForLISPtunnels,theIPsrc,dstpairandtheLISPheader.

Thearp_entry_twill installa link_type=IPv4whentheentry iscreatedanda link_type=MPLSwhentheinterfaceisMPLSenabled.InterfacesmustbeexplicitlyMPLSenabledforsecurityreasons.

Sothatadjacenciescanbesharedbetweenroute,adjacenciesarestoredinasingledata-base,thekeyforwhichis{interface,next-hop,link-type}.

Routes

Figure2:Routedatamodel–classdiagram

Thecontrolplanewill installarouteinatableforaprefixviaalistofpaths.Theprimefunctionofthe FIB is to ‘resolve’ that route. To resolve a route is to construct an object graph that fullydescribesallelementsoftheroute.InFigure3therouteisresolvedasthegraphiscompletefromfib_entry_ttoip_adjacency_t.

In some routing models a VRF will consist of a set of tables for IPv4 and IPv6, and unicast andmulticast. In VPP there is no such grouping. Each table is distinct from each other. A table isindentifiedbyitsnumericalID.TheIDrangeisseparateforeachaddressfamily.

Atableiscomprisedoftworoutedata-bases;forwardingandnon-forwarding.Theforwardingdata-basecontainsroutesagainstwhichapacketwillperformalongestprefixmatch(LPM)inthedata-plane.Thenon-forwardingDBcontainsalltherouteswithwhichVPPhasbeenprogrammed–some

Page 6: FIB 2.0: Hierarchical, Protocol Independent. · Figure 3: Route object diagram FIB sources There are various entities in the system that can add routes to the FIB tables. Each of

oftheseroutesmaybeunresolvedforreasonsthatpreventtheir insertion intotheforwardingDB(seesection:AdjacencysourceFIBentries).

Theroutedataisdecomposedintothreeparts;entry,path-listandpaths;

• Thefib_entry_t,whichcontainstheroute’sprefix,isrepresentationofthatprefix’sentryintheFIBtable.

• The fib_path_t is a description of where to send the packets destined to the route’s prefix.Thereareseveraltypesofpath.

o Attachednext-hop:thepathisdescribedwithaninterfaceandanext-hop.Thenext-hopisinthesamesub-netastherouter’sownaddressonthatinterface,hencethepeerisconsideredtobe’attached’.

o Attached: thepath isdescribedonlybyan interface.AlladdresscoveredbytheprefixareonthesameL2segmenttowhichthatrouter’sinterfaceisattached.Thismeansitispossible toARP for any address coveredby theprefix –which is usually not the case(hencetheproxyARPdebacleinIOS).Anattachedpathisonlyappropriateforapoint-to-point(P2P)interfacewhereARPisnotrequired,i.e.aGREtunnel.

o Recursive:Thepathisdescribedonlyviathenext-hopandtable-id.o De-aggregate:Thepathisdescribedonlyviathespecialallzerosaddressandatable-id.

Thisimpliesasubsequentlookupinthetableshouldbeperformed.• Thefib_path_list_trepresentsthelistofpathsfromwhichtochooseonewhenforwarding.The

path-listisasharedobject,i.e.itistheparenttomultiplefib_entry_tchildren.Inordertoshareany object type it is necessary for a child to search for an existing object matching itsrequirements. For this there must be a data-base. The key to the path-list data-base is acombined description of all of the paths it contains2. Searching the path-list database isrequiredwitheachrouteaddition,so it ispopulatedonlywithpath-listsforwhichsharingwillbringconvergencebenefits(seeSection:FastConvergence).

Figure2 showsanexampleofa routewith twoattached-next-hoppaths.Eachof thesepathswill‘resolve’byfindingtheadjacencythatmatchesthepath’sattributes,whicharethesameasthekeyfor the adjacency data-base3. The ‘forwarding information’ (FI) is the set of adjacencies that areavailable for load-balancing the traffic in thedata-plane.A path ‘contributes’ an adjacency to theroute’s forwarding information, the path-list contributes the full forwarding information for IPpackets.

Error! Reference source not found. shows the object instances and their relationships created inordertoresolvetheroutesalsoshown.Thegraphnatureoftheserelationshipsisevident;childrenaredisplayedatthetopofthediagram,theirparentsbelowthem.Forwardwalksarethusfromtoptobottom,backwalksbottomtotop.Thediagramshowstheobjectsthatareshared,thepath-listandadjacency.Sharingobjectsiscriticaltofastconvergence(seesectionFastConvergence).

2Optimisations3Noteitisvalidforeitherinterfacetobeboundtoadifferenttablethantable1.

Page 7: FIB 2.0: Hierarchical, Protocol Independent. · Figure 3: Route object diagram FIB sources There are various entities in the system that can add routes to the FIB tables. Each of

Figure3:Routeobjectdiagram

FIBsourcesTherearevariousentitiesinthesystemthatcanaddroutestotheFIBtables.Eachoftheseentitiesis termed a ‘source’.When the same prefix is added by different sources the FIBmust arbitratebetween them to determinewhich sourcewill contribute the forwarding information. Since eachsource determines the forwarding information using different best path and loop preventionalgorithms, it is not correct for the forwarding information of multiple sources to be combined.InsteadtheFIBmustchoosetousetheforwardinginformationfromonlyonesource.Thischoiceisbasedonastaticpriorityassignment4.TheFIBmustmaintaintheinformationeachsourcehasaddedso it can be restored should that source become the best source. VPP has two ‘control-plane’sources;theAPIandtheCLI–theAPIhasthehigherpriority.Eachsource’sdataisrepresentedbyafib_entry_src_tobject–ofwhichafib_entry_tmaintainsasortedvector.

Aprefixis‘connected’whenitisappliedtoarouter’sinterface.Thefollowingconfiguration:

set interface address 192.168.1.1/24 GigE0

results in the addition of two FIB entries; 192.168.1.0/24 which is connected and attached, and192.168.1.1/32which is connectedand local (a.k.a receiveor for-us).Bothprefixesare ‘interface’sourced.Theinterfacesourcehasahighpriority,sotheaccidentalornefariousadditionofidenticalprefixesdoesnotpreventtherouterfromcorrectlyforwarding.PacketsmatchingaconnectedprefixwillgenerateanARPrequestforthepacket’sdestinationaddress,thisprocessisknownasa‘glean’.

4Theengagedreadercanseethefullprioritylistinvnet/vnet/fib/fib_entry.h.

Page 8: FIB 2.0: Hierarchical, Protocol Independent. · Figure 3: Route object diagram FIB sources There are various entities in the system that can add routes to the FIB tables. Each of

An‘attached’prefixalsoresultsinaglean,buttherouterdoesnothaveitsownaddressinthatsub-net. The following configuration will result in an attached route, which resolves via an attachedpath;

ip route add table X 10.10.10.0/24 via gre0

asmentionedbefore,theseareonlyappropriateforpoint-to-pointlinks.Anattached-hostprefixiscoveredbyeitheranattachedprefix (notethatconnectedprefixesarealsoattached). If tableX isnot the table to which gre0 is bound, then this is the case of an attached export (see sectionAttachedExport)

AdjacencysourceFIBentriesWheneveranARPentryiscreateditwillsourceafib_entry_t.Inthiscasetherouteisoftheform:

ip route add table X 10.0.0.1/32 via 10.0.0.1 GigEth0/0/0

It is a host prefix with a path whose next-hop address is the same. This route highlights thedistinction between the route’s prefix – a description of the traffic tomatch - and the path – adescriptionofwheretosendthematchedtraffic.TableXisthesametabletowhichtheinterfaceisbound.FIBentriesthataresourcedbyadjacenciesaretermedadj-fibs.ThepriorityoftheadjacencysourceislowerthantheAPIsource,sothefollowingconfiguration:

set interface address 192.168.1.1/24 GigE0

ip arp 192.168.1.2 GigE0 dead.dead.dead

ip route add 192.168.1.2 via 10.10.10.10 GigE1

will forward traffic for 192.168.1.2 via GigE1. That is the route added by the control plane isfavoured over the adjacency discovered by ARP. The control plane, with its associatedauthentication,isconsideredtheauthoritativesource.Tocounterthenefariousadditionofadj-fibs,throughthenefarious injectionofadjacencies, theFIB isalso required toensure thatonlyadj-fibswhose less specific coveringprefix isattachedare installed in forwarding.This requires theuseof'cover tracking',wherea routemaintainsadependency relationshipwith the route that is its lessspecific cover. When this cover changes (i.e. there is a new covering route) or the forwardinginformationofthecover isupdated,thenthecoveredroute isnotified.Adj-fibsthatfail thiscovercheck are not installed in the fib_table_t’s forwarding table, there are only present in the non-forwardingtable.

Overlapping sub-nets are not supported, so no adj-fib has multiple paths. The control plane isexpectedtoremoveaprefixconfiguredforaninterfacebeforetheinterfacechangesVRF.Sowhilethefollowingconfigurationisaccepted:

set interface address 192.168.1.1/32 GigE0

ip arp 192.168.1.2 GigE0 dead.dead.dead

set interface ip table GigE0 2

itdoesnotresultinthedesiredbehaviour,wheretheadj-fibandconnectedsaremovedtotable2.

Page 9: FIB 2.0: Hierarchical, Protocol Independent. · Figure 3: Route object diagram FIB sources There are various entities in the system that can add routes to the FIB tables. Each of

RecursiveRoutesFigure4showsthedatastructuresusedtodescribearecursiveroute.Therepresentationisalmostidentical toattachednext-hoppaths.Thedifferencebeing that the fib_path_thasaparent that isanotherfib_entry_t,termedthe‘via-entry’.

Figure4:Recursiverouteclassdiagram.

Inordertoforwardtrafficto64.10.128.0/20theFIBmustfirstdeterminehowtoforwardtrafficto1.1.1.1/32.Thisisrecursiveresolution.Recursiveresolution,whichisessentiallyacacheofthedata-planeresult,emulatesalongestprefixmatchforthe'via-address'1.1.1.1inthe‘via-table’table05.

Recursiveresolution(RR)willsourceahost-prefixentryinthevia-tableforthevia-address.TheRRsource isa lowprioritysource. Intheunlikely6eventthattheRRsource isthebestsource,then itmustderiveforwardinginformationfromitscoveringprefix.Therearetwocasestoconsider:

- The cover is connected7. The via-address is then an attached host and the RR source canresolvedirectlyviatheadjacencywiththekey{via-address,interface-of-connected-cover}

- Thecover isnotconnected8.TheRRsourcecandirectly inherittheforwarding informationfromitscover.

5Noteitisonlypossibletoaddroutesviaanaddress(i.e.a/32or/128)notviaashortermaskprefix.Thereisnousecaseforthelatter6ForiBGPthevia-addressistheloopbackaddressodthepeerPE,foreBGPitistheadj-fibfortheCE.7AsisthecaseforeBGP8AsisthecaseforiBGP

Page 10: FIB 2.0: Hierarchical, Protocol Independent. · Figure 3: Route object diagram FIB sources There are various entities in the system that can add routes to the FIB tables. Each of

ThisdependencyonthecoveringprefixmeanstheRRsourcewilltrackitscover.Thecoveringprefixwill‘change’when;

- Amorespecificprefix is inserted.For this reasonwheneveranentry is inserted intoaFIBtableitscovermustbefoundsothatitscovereddependentscanbeinformed.

- Theexistingcoverisremoved.Thecoveredprefixesmustformanewrelationshipwiththenextlessspecific.

Thecoverwillbe ‘updated’whentheroute for thecoveringprefix ismodified.Thecover trackingmechanismwillprovidetheRRsourcedentrywithanotificationintheeventofachangeorupdateofthecover,andthesourcecantakethenecessaryaction.

TheRRsourcedFIBentrybecomes theparentof the fib_path_t andwill contribute its forwardinginformationtothatpath,sothatthechild’sFIBentrycanconstructitsownforwardinginformation.

Figure5:RecursiveRoutesobjectdiagram

Figure5showstheobjectinstancescreatedtorepresenttherecursiverouteanditsresolvingroutealsoshown.

Page 11: FIB 2.0: Hierarchical, Protocol Independent. · Figure 3: Route object diagram FIB sources There are various entities in the system that can add routes to the FIB tables. Each of

Ifthesourceaddingrecursiveroutesdoesnotitselfperformrecursiveresolution9thenitispossiblethat thesourcemay inadvertentlyprogrammea recursion loop.Anexampleofa recursion loop isthefollowingconfiguration:

ip route add 5.5.5.5/32 via 6.6.6.6

ip route add 6.6.6.6/32 via 7.7.7.7

ip route add 7.7.7.7/32 via 5.5.5.5

Thisshowsaloopoverthreelevels,butanynumberispossible.FIBwilldetectrecursionloopsbyforward walking the graph when a fib_entry_t forms a child-parent relationship with afib_path_list_t. Thewalk checks to see if the same object instances are encountered.When arecursionloopisformedthecontrolplane10graphbecomescyclic,thusallowingthechild-parentdependenciestoform.Thisisnecessarysothatwhentheloopbreaks,theaffectedchildrenandbeupdated.

OutputlabelsAroutemayhaveassociatedoutMPLSlabels11.Thesearelabelsthatareexpectedtobeimposedona packet as it is forwarded. It is important to note that anMPLS label is per-route and per-path,therefore, even though routes share paths the do not necessarily have the same label for thatpath12. A label is therefore uniquely associated to a fib_entry_t and associated with one of thefib_path_ttowhichitforwards.

MPLSlabelsaremodelledviathegenericconceptofa‘path-extension’.Afib_entry_tthereforehasavectorofzerotomanyfib_path_ext_tobjectstorepresentthelabelswithwhichitisconfigured.

AttachedExportExtranetsmakeprefixesinVRFAalsoreachablefromVRFB.VRFAistheexportVRF,Btheimport.ConsiderthisrouteintheexportVRF;

ip route add table 2 1.1.1.0/24 via 10.10.10.0 Gige0/0/0

therearetwowaysonemightconsiderrepresentingthisrouteintheimportVRF:

1) ip route add table 3 1.1.1.0/24 via 10.10.10.0 Gige0/0/0 2) ip route add table 3 1.1.1.0/24 via lookup-in-table 2

whereoption2)isanexampleofade-aggregateroutewhereasecondlookupisperformedintable2, the export VRF. Option 2) is clearly less efficient, since the cost of the second lookup is high.Option1)isthereforepreferred.However,connectedandattachedprefixes,andspecificallytheadj-fibs that they cover, require special attention. The control plane is aware of the connected andattachedprefixesthatarerequiredtobeexported,but it isunawareoftheadj-fibs. It isthereforetheresponsibilityofFIBtoensurethatwheneveranattachedprefixisexported,soaretheadj-fibsand local prefixes that it covers, and only the adj-fibs and locals, not any coveredmore specific(sourcede.g.byAPI).TheimportedFIBentriesaresourcedas‘attached-export’,thisisalowpriority

9IfthatsourceisrelyingonFIBtoperformrecursiveresolution,thenthereisnoreasonitshoulddosoitself.10Thederiveddata-planegraphMUSTneverbecyclic.11Advertised,e.g.byLDP,SRorBGP.12TheonlycasewherethelabelswillbethesameisBGPVPNv4labelallocationper-VRF.

Page 12: FIB 2.0: Hierarchical, Protocol Independent. · Figure 3: Route object diagram FIB sources There are various entities in the system that can add routes to the FIB tables. Each of

source, so if those prefixes already exist in the import VRF, sourced by the API, then they willcontinuetoforwardwiththatinformation.

Figure6showsthedatastructuresusedtoperformattachedexport.

- fib_import_t.Arepresentationoftheneedtoimportcoveredprefixes.AninstanceisassociatedwiththeFIBentryintheimportVRF.Theneedtoimportprefixesisrecognisedwhenanattachedrouteisaddedtoatablethatisdifferenttothetableoftheinterfacetowhichittisattached.Thecreationofafib_import_twilltriggerthecreationofafib_export_t.

- fib_export_t.Arepresentationoftheneedtoexportprefixes.AninstanceisassociatedwiththeattachedentryintheexportVRF.Afib_export_tcanhavemanyassociatedfib_import_tobjectsrepresentingmultipleVRFsintowhichtheprefixisexported.

Figure6:AttachedExportClassdiagram.

Figure7:AttachedExportobjectdiagram

Figure7showsanobjectinstancediagramfortheexportofaconnectedfromtable1totwoothertables.The/32adj-fibandlocalprefix intheexportVRFareexportedintotheimportVRFs,wherethey are sourced as ‘Attached-export’ and inherit the forwarding information from the exported

Page 13: FIB 2.0: Hierarchical, Protocol Independent. · Figure 3: Route object diagram FIB sources There are various entities in the system that can add routes to the FIB tables. Each of

entry.TheattachedprefixintheimportVRFalsoperformscovertrackingwiththeconnectedprefixin the export VRF so that it can react to updates to that prefix thatwill require the removal theimportedcoveredprefixes.

GraphWalksAll FIB object types are allocated from a VPPmemory pool13. The objects are thus susceptible tomemory re-allocation, therefore the use of a bare ‘C’ pointer to refer to a child or parent is notpossible. Instead there is the conceptof a fib_node_ptr_twhich is a tupleof type,index. The typeindicateswhattypeofobject it is (andhencewhichpooltouse)andtheindex isthe index inthatpool.Thisallowsforthesaferetrievalofanyobjecttype.

Whenachildresolvesviaaparent itdoessoknowingthetypeofthatparent.Thechildtoparentrelationship is thus fullyknowntothechild,andhencea forwardwalkof thegraph(fromchildtoparent)istrivial.However,aparentdoesnotchooseitschildren,itdoesnotevenchoosethetype.Allobject types that formpartof theFIBcontrolplanegraphall inherit fromasinglebaseclass14;fib_node_t. A fib_node_t indentifies the object’s index and its associated virtual function tableprovidestheparentamechanismto‘visit’thatobjectduringthewalk.Thereasonforaback-walkistoinformallchildrenthatthestateoftheparenthaschangedinsomeway,andthatthechildmayitselfneedtoupdate.

Tosupportthemanytoone,childtoparent,relationshipaparentmustmaintainalistofitschildren.Therequirementsofthislistare;

- O(1) insertion anddelete time. Several child-parent relationships aremade/brokenduringrouteaddition/deletion.

- Ordering.Highpriority childrenare at the front, lowpriority at theback (see section FastConvergence)

- Insertionatarbitrarylocations.

To realise these requirements the child-list is a doubly linked-list,where each element contains afib_node_ptr_t.TheVPPpoolmemorymodelappliestothelistelements,sotheyarealsoidentifiedbyanindex.Whenachildisaddedtoalistitisreturnedtheindexoftheelement.Usingthisindexthe element can be removed in constant time. The list supports ‘push-front’ and ‘push-back’semanticsforordering.Towalkthechildrenofaparentisthentoiterateofthislist.

Aback-walkofthegraph isadepthfirstsearchwhereallchildren inall levelsofthehierarchyarevisited. Such walks can therefore encounter all object instances in the FIB control plane graph,numberinginthemillions.AFIBcontrol-planegraphiscyclicinthepresenceofarecursionloop,sothewalkimplementationhasmechanismstodetectthisandexitearly.

A back-walk can be either synchronous or asynchronous. A synchronouswalkwill visit the entiresectionof thegraphbeforecontrol is returnedto thecaller,anasynchronouswalkwillqueuethewalk to a background process, to run at a later time, and immediately return to the caller. Toimplementasynchronouswalksafib_walk_tobjectitaddedtothefrontoftheparent’schildlist.Aschildrenare visited the fib_walk_t object advances through the list. Since it is inserted in the list,13Fastmemoryallocationiscrucialtofastrouteupdatetimes.14VPPmaybewritteninCandnotC++butinheritanceisstillpossible.

Page 14: FIB 2.0: Hierarchical, Protocol Independent. · Figure 3: Route object diagram FIB sources There are various entities in the system that can add routes to the FIB tables. Each of

when thewalk suspends and resumes, it can continue at the correct location. It is also safewithrespecttothedeletionofchildrenfromthelist.Newchildrenareaddedtotheheadofthelist,andsowillnotencounterthewalk,butsincetheyarenew,theyalreadyhavetheuptodatestateoftheparent.

AVLIBprocess ‘fib-walk’ runs toperform theasynchronouswalks.VLIBhasnopriority schedulingbetweenrespectiveprocesses,sothefib-walkprocessdoesworkinsmallincrementssoitdoesnotblock themain routedownloadprocess. Since themaindownloadprocess effectively has prioritynumerousasynchronousback-walkscanbestartedonthesameparentinstancebeforethefib-walkprocesscanrun.FIBisa‘finalstate’application.Ifaparentchangesntimes,itisnotnecessaryforthechildrentoalsoupdatentimes,insteaditisonlynecessarythatthischildupdatestothelatest,or final, state. Consequentlywhenmultiplewalks on a parent (and hence potential updates to achild)arequeued,thesewalkscanbemergedintoasinglewalk.

Choosingbetweenasynchronousandanasynchronouswalkisthereforeatrade-offbetweentimeittakestopropagateachangeintheparenttoallof itschildren,versusthetimeittakestoactonasinglerouteupdate.Forexample,ifarouteupdatewheretoaffectmillionsofchildrecursiveroutes,thentherateatwhichsuchupdatescouldbeprocessedwouldbedependentonthenumberofchildrecursiveroute–whichwouldnotbegood.AtthetimeofwritingFIB2.0usessynchronouswalkinalllocationsexceptwhenwalkingthechildrenofapath-list,andithasmorethan3215children.Thisavoidsthecasementionedabove.

Data-PlaneThe data-plane data model is a directed, acyclic16 graph of heterogeneous objects. A packet willforwardwalkthegraphasitisswitched.Eachobjectdescribestheactionstoperformonthepacket.Each object type has an associated VLIB graph node. For a packet to forward walk the graph isthereforetomovefromoneVLIBnodetothenext,witheachperformingtherequiredactions.ThisistheheartoftheVPPmodel.

Thedata-planegraphiscomposedofgenericdata-pathobjects(DPOs).AparentDPOisidentifiedbythetuple:{type,index,next_node}.Thenext_nodeparameter is the indexoftheVLIBnodetowhichthepacketsshouldbesentnext,thisispresenttomaximiseperformance-itisimportanttoensurethattheparentdoesnotneedtoberead17whilstprocessingthechild.Specialisations18oftheDPOperformdistinctactions.ThemostcommonDPOsandbrieflywhattheyrepresentare:

- Load-balance:achoiceinanECMPset.- Adjacency:applyarewriteandforwardthroughaninterface- MPLS-label:imposeanMPLSlabel.- Lookup:performanotherlookupinadifferenttable.

Thedata-planegraphisderivedfromthecontrol-planegraphbytheobjectstherein‘contributing’aDPO to the data-plane graph. Objects in the data-plane contain only the information needed toswitchapacket, theyarethereforesimpler,and inmemorytermssmaller,withtheaimto fitone

15Thevalueisarbitraryandyettobetuned.16Driectedimpliesitcannotbeback-walked.Itisacycliceveninthepresenceofarecursionloop.17Loadedintocache,andhencepotentiallyincurringad-cachemiss.18Theengagedreaderisdirectedtovnet/vnet/dpo/*

Page 15: FIB 2.0: Hierarchical, Protocol Independent. · Figure 3: Route object diagram FIB sources There are various entities in the system that can add routes to the FIB tables. Each of

DPOonasinglecache-line.Thederivationfromthecontrolplanemeansthatthedata-planegraphcontainsonlyobjectwhosecurrentstatecanforwardpackets.Forexample,thedifferencebetweenafib_path_list_tandaload_balance_tisthattheformerexpressesthecontrol-plane’sdesiredstate,thelatterthedata-planeavailablestate.Ifsomepathsinthepath-listareunresolvedordown,thentheload-balancewillnotincludethemintheforwardingchoice.

Figure8showsasimplifiedviewofthecontrol-planegraphindicatingthoseobjectsthatcontributeDPOs.AlsoshownaretheVLIBnodegraphsatwhichtheDPOisused.

Figure8:DPOcontributionsforanon-recursiveroute

Eachfib_entry_tcontributesitownload_balance_t,forthreereasons;

- Theresultofa lookupina IPv[46]table isasingle32bitunsignedinteger.This isan indexintoamemorypool.Consequentlytheobjecttypemustbethesameforeachresult.Somerouteswillneedaload-balanceandsomewillnot,buttoinsertanotherobjectinthegraphtorepresentthischoiceisawasteofcycles,sotheload-balanceobjectisalwaystheresult.IftheroutedoesnothaveECMP,thentheload-balancehasonlyonechoice.

- Inordertocollectper-routecounters,thelookupresultmustinsomewayuniquelyidentifythefib_entry_t.Asharedload-balance(contributedbythepath-list)wouldnotallowthis.

- Inthecasethefib_entry_thasMPLSoutlabels,andhenceafib_path_ext_t,thentheload-balancemustbeper-prefix, since theMPLS labels thatare itsparentsare themselvesper-fib_entry_t.

Figure9showstheload-balanceobjectscontributedforarecursiveroute.

Page 16: FIB 2.0: Hierarchical, Protocol Independent. · Figure 3: Route object diagram FIB sources There are various entities in the system that can add routes to the FIB tables. Each of

Figure9:DPOcontributionforarecursiveroute.

Figure10:DPOContributionsfromlabelledrecursiveroutes.

Page 17: FIB 2.0: Hierarchical, Protocol Independent. · Figure 3: Route object diagram FIB sources There are various entities in the system that can add routes to the FIB tables. Each of

Figure10showsthederiveddata-planegraphforalabelledrecursiveroute.TherecanbeasmanyMPLS-labelDPOinstancesasthereareroutesmultipliedbythenumberofpathsper-route.Forthisreasonthempls-labelDPOshouldbeassmallaspossible19.

The data-plane graph is constructed by ‘stacking’ one instance of a DPO on another to form thechild-parentrelationship.Whenthisstackingoccurs,thenecessaryVLIBgrapharcsareautomaticallyconstructedfromtherespectedDPOtype’sregisteredgraphnodes.

Thediagramsabove show that for any given route the full data-plane graph is knownbefore anypacketarrives. If that graph is composedofnobjects, then thepacketwill visitnnodesand thusincuraforwardingcostofapproximatelyntimesthegraphnodecost.Thiscouldbereducedifthegraphwere‘collapsed’intoasingleDPOandassociatednode.However,collapsingagraphremovestheindirectionobjectsthatprovidefastconvergence(seesectionFastConvergence).Tocollapseisthenatrade-offbetweenfasterforwardingandfastconvergence;VPPfavoursthelatter.

ThisDPOmodeleffectivelyexiststodaybutisinformallydefined.Presentlytheonlyobjectthatisinthedata-plane is the ip_adjacency_t,however, features (like ILA,OAMhop-by-hop, SR,MAP,etc)sub-type theadjacency. Themember lookup_next_index is equivalent todefininganew sub-type.Addingtotheexistingunion,orcastingsub-typespecificdataintotheopaquemember,orevenoverthe rewrite string (e.g. the new port range checker), is equivalent defining a new C-struct type.Fortunately, at this time, all these sub-types are smaller inmemory than the ip_adjacency_t. It isnow possible to dynamically register new adjacency sub-types with ip_register_adjacency() andprovideacustomformatfunction.

Inmy opinion a strongly defined objectmodelwill be easier for contributors to understand, andmorerobusttoimplement.

TunnelsTunnelsshareasimilarpropertytorecursiveroutesinthatafterapplyingthetunnelencapsulation,anewpacketmustbeforwarded, i.e. forwarding is recursive.However,aswithrecursiveroutesthetunnel’sdestinationisknownbeforehand,sotherecursiveswitchcanbeavoidedifthepacketcanfollowthealreadyconstructeddata-planegraphforthetunnel’sdestination.ThisprocessofjoiningtoDPgraphstogetheristermed‘stacking’.

19i.e.weshouldnotre-usetheadjacencystructure.

Page 18: FIB 2.0: Hierarchical, Protocol Independent. · Figure 3: Route object diagram FIB sources There are various entities in the system that can add routes to the FIB tables. Each of

Figure11:Tunnelcontrolplaneobjectdiagram

Figure11showsthecontrolplaneobjectgraphforarouteviaatunnel.Thetwosub-graphsfortheroute via the tunnel and the route for the tunnel’s destination are shown to the right and leftrespectively.Theredlineshowstherelationshipformbystackingthetwosub-graphs.Theadjacencyonthetunnelinterfaceistermeda‘mid-chain’thisitisnowpresentinthemiddleofthegraph/chainratherthanitsusualterminallocation.

Themid-chain adjacency is contributed by thegre_tunnel_t ,which also becomes part of the FIBcontrol-planegraph.Consequentlyitwillbevisitedbyaback-walkwhentheforwardinginformationfor the tunnel’sdestination changes. Thiswill trigger it to restack themid-chainadjacencyon thenewload_balance_tcontributedbytheparentfib_entry_t.

If the back-walk indicates that there is no route to the tunnel, or that the route does notmeetresolution constraints, then the tunnel can be marked as down, and fast convergence can betriggeredinthesamewayasforphysicalinterfaces(seesection...).

MPLSFIBThere is a tight coupling between IP andMPLS forwarding.MPLS forwarding equivalence classes(FECs)areoftenan IPprefix– that is tosay that trafficmatchingagiven IPprefix is routed intoa

Page 19: FIB 2.0: Hierarchical, Protocol Independent. · Figure 3: Route object diagram FIB sources There are various entities in the system that can add routes to the FIB tables. Each of

MPLSlabelswitchpath(LSP).Itisthusnecessarytobeabletoassociatedagivenprefix/routewithan[out-going]MPLSlabelthatwillbeimposedwhenthepacketisforwarded.Thisisconfiguredas:

ip route add 1.1.1.1/32 via 10.10.10.10 GigE0/0/0 out-label 33

packets matching 1.1.1.1/32 will be forwarded out GigE0/0/0 and have MPLS label 33 imposed.Morethanoneout-goinglabelcanbespecified.Out-goingMPLSlabelscanbeappliedtorecursiveandnon-recursiveroutes,e.g;

ip route add 2.2.2.0/24 via 1.1.1.1 out-label 34

packets matching 2.2.2.0/24 will thus have two MPLS labels imposed; 34 and 33. This is therealisationof,e,g,anMPLSBGPVPNv4.

The router receiving the MPLS encapsulated packets needs to be programmed with actionsassociatedwhicheachlabelvalue–thisistheroleoftheMPLSFIB.TheMPLSFIBIsatable,whosekeyistheMPLSlabelvalueandend-of-stack(EOS)bit,whichstorestheactiontoperformonpacketswithmatchingencapsulation.Currentlysupportedactionsare:

1) PopthelabelandperformanIPv[46]lookupinaspecifiedtable2) Popthelabelandforwardviaaspecifiednext-hop(thisispenultimate-hop-pop,PHP)3) Swapthelabelandforwardviaaspecifiednext-hop.

Thesecanbeprogrammedrespectivelyby:

1) mpls local-label 33 ip4-lookup-in-table X2) mpls local-label 33 via 10.10.10.10 GigE0/0/03) mpls local-label 33 via 10.10.10.10 GigE0/0/0 out-label 66

the latter is anexampleof anMPLS cross connect.Anydescriptionof anext-hop, recursive, non-recursive,labelled,non-labelled,etc,thatisvalidforanIPprefix,isalsovalidforanMPLSlocal-label.

ImplementationTheMPLSFIBisimplementedusingexactlythesamedatastructuresastheIPFIB.

Theonlydifferenceistheimplementationofthetable.WhereasforIPv4thisisanmtrieandforIPv6ahashtable,forMPLSitisaflatarrayindexedbya21bitkey(label&EOSbit).Thisimplementationischosentofavourpacketforwardingspeed.

TunnelsVPPno longersupportsMPLStunnels thatarecoupledtoaparticular transport, i.e.MPLSoGREorMPLSoEth.Suchtightcouplingisnotbeneficial.InsteadVPPsupports;

1) MPLSLSPsassociatedwithIPprefixesandMPLSlocal-labels(asdescribedabove)whicharetransportindependent(i.e.theIProutecouldbereachableoveraGREtunnel,oranyotherinterfacetype).

2) Agenericuni-directionalMPLStunnelinterfacethatistransportindependent.

AnMPLStunneliseffectivelyanLSPwithanassociatedinterface.TheLSPcanbedescribedbyanynext-hoptype(recursive,non-recursiveetc),e.g.:

Page 20: FIB 2.0: Hierarchical, Protocol Independent. · Figure 3: Route object diagram FIB sources There are various entities in the system that can add routes to the FIB tables. Each of

mpls tunnel add via 10.10.10.10 GigE0/0/0 out-label 66

IProutesand/orMPLSx-connectscanberoutedviatheinterface,e.g.

ip route add 2.2.2.0/24 via mpls-tunnel0

packetsmatchingtheroutefor2.2.2.0/24wouldthushavelabel66imposedsinceitistransmittedviathetunnel.

TheseMPLStunnelscanbeusedtorealiseMPLSRSVP-TEtunnels.

FastConvergence