troubleshoo>ng amlight - internet2 · troubleshoo>ng amlight: handling network events in a...
TRANSCRIPT
JeronimoBezerraFloridaInterna1onalUniversity
Internet2TechnologyExchangeMiami,Sep26th2016
Troubleshoo>ngAmLight:HandlingNetworkEventsinaProduc>onSDNEnvironment
MarcosSchwarzRedeNacionaldeEnsinoePesquisa
Outline
• Introduc>ontoAmLight• RFC7426:SDNTerminology• Testspre-produc>on• SDNTopologies• Whatshouldbemonitored?– ControlPlaneMonitoring– DataPlaneMonitoring
• Future
2
AmLightisaDistributedAcademicExchangePoint• Produc>onSDNInfrastructuresinceAug2014
• PartnershipinvolvingFIU,NSF,ANSP,RNP,RedClaraandAURA• Connectstwoacademicexchangepoints:AMPATH/MiamiandSouthernLight/Brazil• CarriesAcademicandNon-Academic/Commercialtraffic
– L2VPN,IPv4,IPv6,Mul>cast• SupportsNetworkProgrammability/Slicing
– OpenFlow1.0– FlowSpaceFirewallforNetworkProgrammability/Slicing– OESSforL2VPNs– OGFNetworkServiceInterface(NSI)enabled– ONOS/SDN-IPforAcademicIPv4– Currently5slicesforexperimenta>on(includingGlobalONOSSDN-IP)
• Currently,opera>ngwithmorethan800flows(produc>onandexperimenta>on)• Website:www.sdn.amlight.net
3
AmLightSDNStack
4
NSI
AmLight’sNRENs
FIBRESDN-IPONOS
SouthernLightAmpath2
Virtualization/Slices (FlowSpace Firewall)
Ampath1Andes1
Phys
ical L
ayer
Sout
hbou
nd AP
I:Op
enFlo
w 1.0
North
boun
d:Us
ers’
APIs
NOX
IDCP
Other NRENs
NOX
OpenNSA
OESS
OSCARS
OESS
Andes2
Univ.Twente
ONOS Internet2
Other Testbeds
SDN:LayersandArchitectureTerminology• Thispresenta>onwillusethe
SDNterminologystandardizedthroughIETFRFC7426:– Fourplanes:
• Applica>on,Control,ForwardingPlane&ManagementPlanes
– Interfaces:• Service,ControlPlaneSouthboundandManagementPlaneSouthboundinterfaces
– ServicesandApplica>ons
5Forwarding Device
Operational Plane
Application PlaneApplication Service
Forwarding Plane
Management Abstraction Layer (CAL)
Service Interface
Network Services Abstraction Layer (NSAL)
Service App App Service
Management PlaneControl Plane
App
Control Abstraction Layer (CAL)
Device and Resource Abstraction Layer (DAL)
CP Southbound Interface
MP Southbound Interface
Testspre-produc>on• BeforeapplyinganychangetotheSDN
environment,allplanes,appsandservicesneedtobevalidatedinacontrolledenvironment– Samesogwareanddevicesusedinproduc>on
needtobeavailablefortests
• Manytoolsandapproachesavailable,forexample,OFTest,RyuSwitchTest,Cbenchandsomecommercialpossibili>es– SometestsmightcauseinstabilitytotheSDN
stack(don’ttrythesetestsinproduc>on)
• Specialaien>onisrequiredfortheControlandDataplanes– Manypublica>onswithdifferentmethodologiesand
tests6
Forwarding Device
Operational Plane
Application PlaneApplication Service
Forwarding Plane
Management Abstraction
Layer (CAL)
Service Interface
Network Services Abstraction Layer (NSAL)
Service App App Service
Management PlaneControl Plane
App
Control Abstraction Layer
(CAL)
Device and Resource Abstraction Layer (DAL)
CP Southbound
Interface
MP Southbound
Interface
OFTest
Ryu Switch Test,
Cbench, ...
OFTest
Ryu Switch Test, ...
Unittest
...
Troubleshoo>ngaproduc>onSDNnetwork• Troubleshoo>ngaproduc>onenvironmenthasdifferentrequirements
– Itneedstobeagileandleastdisrup>veaspossible– Itmightneedhistoricalinforma>onandunderstandingoftrafficgoingthroughthenetwork– Toolshavetobehandy
• Legacytroubleshoo>ngtoolsarepar>allyusefulorcompletelyuseless– OAM(Opera>on,Administra>onandMaintenance)isnotsupportedbyOpenFlow(yet)– Ping,traceroute,SNMP,wireshark/tcpdumparesomehowcompromised
• Deepknowledgeofthehardwareandsogwareplakormisrequired:– Usageofthe”hidden”commandsbecomespartofyourrou>ne
• Sugges>on:geta”premium”supportcontract– Goingthroughthelevel2TACteamwillincreaseyourstressandthenetworkrecovery>me
7
SDNTopologies:Star>ngSimple
• Usually,withjustoneSDNApp,troubleshoo>ngislesscomplex– OneSDNAppisconnectedthroughanout-of-
bandnetworktomul>pleOFswitches– Usually,theSDNApphasfullcontrolofports
andVLANs
• AgoodnetworksnifferandaSyslogserverarethekeytosuccesshere – HelpsvalidatetheOpenFlowmessagessent
andreceived– Easesaccesstoerrormessages
8
ApplicationLayer
Forwarding Device
SDN App
OpenFlow 1.x
Forwarding DeviceForwarding Device
Forwarding DeviceUser AUser A User BUser B
SDNTopologies:AddingComplexity
• Differentcontrolplanesinparalleltendstobeaconsequenceofslicing– Moreapplica>onstounderstandandtrack– Differentlevelsofsogwarestabilityanddebug– Higherchancesofnetworkoutages
• Slicing/Par>>oningaddscomplexity:– OpenFlowcommunica>onbetweenOpenFlow
switchandSDNAppisnotend-to-end:• OFSwitch->SlicerorSlicer->OFApp
– ComplexitytotrackwhichswitchistalkingtowhichSDNAppandvice-versa• OFdoesn’tcarryDPIDoneachOFmessage
• ”Tradi>onal”sniffersarenotenoughtotrackindirectOpenFlowmessages
9
ApplicationLayer
Forwarding Device
OESS ONOS/SDN-IP
OpenFlow 1.0
Forwarding DeviceForwarding Device
Forwarding Device
FlowSpace Firewall
OpenFlow 1.0
User AUser A User BUser B
Testbed
ControlPlane:whatshouldbemonitored?
• EverythingconcernedtotheOpenFlowcommunica>on:– #offlowsinstalled
• Avoidgepngclosetothelimitsdocumented(weirdstuffmighthappen)
– RateofflowMods,PacketOut/PacketIn&Statsrequests/second:• Switch’sCPUisdirectlyaffectedbytheserates
– #ofOFP_FLOW_ERRORmessages:• Somemessagesmightindicatethatacrashisabouttohappen(FULL_TABLE)
– Flowsdura>on:• Helpstounderstandtrafficdisrup>onduetoflowsbeingreinstalled
– FlowandPortCounters(bpsandpps)• Ifslicingisareality,collectcountersperslice
• MostoftheSDNappsdon’tprovidesuchdata,someprovidethroughRESTinterfaces 10
DataPlane:whatshouldbemonitored?DataPlaneMonitoring:• Insomecases,everythinglooksok,buttrafficisnotflowing
• Somepossibledataplaneblackholes:– Aspecificlinecardorinterfacediscardingalltraffic
• Duetoaninterfacememoryissue,flowsareinstalledbuttrafficisdiscarded
– InterfacedowninonesidebutupintheremoteandtheSDNAppdoesn’tunderstandthat• Forinstance:10GLAN-PHY,Ethernetcircuitsand100Glonghaulcircuits• Inthiscase,dependingoftheside,theSDNAppinstallsthecircuitspoin>ngtotheaffectedlink,discardingalltraffic
– Aspecificinstalledflowentrycrashed• Duetoaninterfacememoryissue,onespecificflowiscompromissedandtrafficisdiscarded• DependingofthenumberofOpenFlowswitchesandflowentries,findingtheproblemmightbeextremely>me-consuming
• Inthesecases,in-bandtestsarerequired:– JustaveryfewSDNAppstestin-bandperlink– NoSDNAppstestin-bandperflow
11
Disclaimer:
WhatyouareabouttoseeandhearistheAmLight’sexperience.Wearenotsayingthesearethebestorrecommendedmethods–probablyarenot.Don’ttrythemonyournetwork!
12
ControlPlaneMonitoring• MonitoringtheOpenFlowmessageswith
passivepacketcapture:– Non-intrusive– Almostrisk-free
• Fewtoolsavailable:– Wireshark/tshark/tcpdump– OpenFlowFlightRecorder– AmLightOpenFlowSniffer
• AmLightOpenFlowSnifferwascreatedtobeCLI-basedwithsupporttoenvironmentswithslicers:– Dissects100%ofOpenFlow1.0– Doesn’trequireGUIorXwindow– End-to-endcommunica>onvisualiza>on– Colorstohighlightimportantfields– Manyfiltersavailabletoop>mizetshoot!– Source:github.com/jab1982/ofp_sniffer
13
ApplicationLayer
Forwarding Device
OESS ONOS/SDN-IP
OpenFlow 1.0
Forwarding Device
Forwarding Device
Forwarding Device
FlowSpace Firewall
OpenFlow 1.0
User AUser A User BUser B
Testbed
Monitor msgs:OpenFlow Sniffer, OFFR
libpcap
ControlPlaneMonitoring[2]• MonitoringAllApplica>onsandCountersina
centralizedNMS:– ScriptscollectinfofromSDNApps’RESTinterfaces
andexportviaJSON– ZabbiximportsJSONdataandsaveintoaMySQL
Database– Currently,collec>ngdatafromOESS,ONOS,FSFWand
switches– Examples:
14
ApplicationLayer
Forwarding Device
OESS ONOS/SDN-IP
OpenFlow 1.0
Forwarding Device
Forwarding Device
Forwarding Device
FlowSpace Firewall
OpenFlow 1.0
User AUser A User BUser B
Testbed
SNMP, REST, JavaAPI, etc
Monitoring:Zabbix + customized scripts
DataPlaneMonitoring• MostoftheSDNAppsuseLLDPorBDDPfor
topologydiscovery– Oncethetopologyisdiscovered,theseprotocols
arenotusedtomonitorthetopology– Also,intervalbetweenLLDP/BDDPpacketsisnot
appropriatedforlinkmonitoring
• Anin-bandtes>ngapproachisneededtovalidatetheDataPlane– OESSdoesthroughitsForwardingVerifica>on
module– MostofotherSDNAppsdon’thaveanything
equivalent
• EventhoughOESS/FVDvalidatesthedatapath,itdoesn’tvaliteusers’flows– Afullportissueisdetected,butasingleflowissue
isnot
15
ApplicationLayer
Forwarding Device
OESS ONOS/SDN-IP
OpenFlow 1.0
Forwarding Device
Forwarding Device
Forwarding Device
FlowSpace Firewall
OpenFlow 1.0
User AUser A User BUser B
Testbed
Monitoring Data plane: Trunk ports: OESS FWD
DataPlaneMonitoring[2]• Monitoringindividualflowsisimportantbutextremelycomplex– Beingproac>vewithallflowsisdesiredbuttheintervalbetweentestsandnumberofflowsneededtobetakenintoconsidera>on
– Usingareac>veapproachisthebestsugges>on• Userswon’tbehappy,butyourswitcheswon’tcrash
• Approachestotestusers’flowsareyetconsideredexperimental– ASDNTraceprotocolwasproposed:– hip://sdntrace-protocol.readthedocs.io/en/latest/ 16
ApplicationLayer
Forwarding Device
OESS ONOS/SDN-IP
OpenFlow 1.0
Forwarding Device
Forwarding Device
Forwarding Device
FlowSpace Firewall
OpenFlow 1.0
User AUser A User BUser B
Testbed
Monitoring User Flows: SDNTrace
DataPlaneMonitoring[3]• AmLight'sdevelopeditsownSDNTracetotest
users’flowswithoutchangingthem– WorksthroughGUIorREST– Verylightweight– Very“cheap”,onlytwo-fourflowentriesneeded– TracesL2andL3flows– Yetunderevalua>onatAmLight– Developedincollabora>onwiththeAcademic
NetworkofSaoPaulo/Brazil
• Tracingacircuitisdoneinsecondsinstead
ofmanyminutesandcanworkwithbothZabbixandNagios
Github:github.com/amlight/SDNTrace
17
Future• Newtools/scripts/protocolsares>llneeded
– S>llalongandpainfuljourneyahead– OpenFlow-OAM?
• ImprovementstoOpenFlowagentsarebeingconstantlyreleased– ButnewbugsarecomingwiththemL
• SomeSDNmonitoring-onlyapplica>onsarebeingproposedanddeveloped– AmLightisdevelopingitsownSDNLookingGlasstoconsolidateallpassiveandac>vemonitoringac>vi>esassociatedtotheSDNenvironment(tobereleasedbyJanuary)
– Butsideapplica>onsarenotideal:itisimportantthatallSDNApplica>onsincorporatetroubleshoo>ngcapabili>esintheircore!
18
Off-topic:Sugges>onstoNetworkEngineers• Whatis/willbeourposi>ondescrip>on?
– NetworkEngineers?SDNEngineers?ResearchNetworkEngineers?– MaybeNetworkEngineers2.0?– Itdoesn’tmaierthedescrip>on,itmaiersthatwehavetoevolve!
• WithSDN,troubleshoo>ngisverydifferent:insteadofusingCLIandsniffers,weneedtoreadcodeandapplica>on’slogs
• Mostofushatesogwaredevelopment,butitis>metochangeourmentality– AtAmLight,Idon’trememberlast>meIcreatedaVLANusingaCLI
• IfSDNbecomesthenextde-factostandard,itwillhappeninafewyears– Wealls>llhave>metolearnandgetpreparedforthisnewreality
• Recommenda>ons:– LearnPythonorJava(JavaScriptisaplus)
• Ryuisaveryinteres>ngOpenFlowcontrollertostartwith– JoinRyuorONOSmailinglists– Mininetisyourfriend!
19
JeronimoBezerraFloridaInterna>onalUniversity
Internet2TechnologyExchangeSep26th
Troubleshoo>ngAmLight:HandlingNetworkEventsinaProduc>onSDNEnvironment
Ques8ons???