aci troubleshooting · •introduction • understanding faults and health status • tools •...

99

Upload: dinhanh

Post on 06-Aug-2018

351 views

Category:

Documents


12 download

TRANSCRIPT

Page 1: ACI Troubleshooting · •Introduction • Understanding Faults and Health status • Tools • Troubleshooting scenarios • Conclusion / Q&A Agenda
Page 2: ACI Troubleshooting · •Introduction • Understanding Faults and Health status • Tools • Troubleshooting scenarios • Conclusion / Q&A Agenda

ACI TroubleshootingMioljub Jovanovic, Technical Leader

BRKACI-2102

Page 3: ACI Troubleshooting · •Introduction • Understanding Faults and Health status • Tools • Troubleshooting scenarios • Conclusion / Q&A Agenda

• Introduction

• Understanding Faults and Health status

• Tools

• Troubleshooting scenarios

• Conclusion / Q&A

Agenda

Page 4: ACI Troubleshooting · •Introduction • Understanding Faults and Health status • Tools • Troubleshooting scenarios • Conclusion / Q&A Agenda

Step 1: Download the Mobile App

Get all the information you need at

your fingertips!

Participate in session polling and Q&A

Step 2: Access the session

Log into the app using your Cisco

Live login & find your session

http://bit.ly/clus2015

Page 5: ACI Troubleshooting · •Introduction • Understanding Faults and Health status • Tools • Troubleshooting scenarios • Conclusion / Q&A Agenda

# show int eth 1/1 | grep input

30 seconds input rate 97064 bits/sec, 66 packets/sec

input rate 97064 bps, 66 pps; output rate 95008 bps, 57 pps

20297397 input packets 6494649266 bytes

0 input error 0 short frame 0 overrun 0 underrun 0 ignored

0 input with dribble 72 input discard

The way we’re used to do it

Good old CLI!!!

Example: Checking input rate on specific interface

Page 6: ACI Troubleshooting · •Introduction • Understanding Faults and Health status • Tools • Troubleshooting scenarios • Conclusion / Q&A Agenda

John Chambers@CiscoLive #clus, San Diego 2015

Page 7: ACI Troubleshooting · •Introduction • Understanding Faults and Health status • Tools • Troubleshooting scenarios • Conclusion / Q&A Agenda

The way we do it in APIC

Visualize interface input/output

Page 8: ACI Troubleshooting · •Introduction • Understanding Faults and Health status • Tools • Troubleshooting scenarios • Conclusion / Q&A Agenda

> moquery -c eqptIngrPkts5min -f 'eqpt.IngrPkts5min.unicastRate>"1000"' -o xml…<eqptIngrPkts5min childAction="" cnt="18" dn="topology/pod-1/node-101/sys/phys-[eth1/34]/CDeqptIngrPkts5min" … status="" unicastAvg="10833" unicastBase="0" unicastCum="2390904" unicastLast="18809" unicastMax="31630" unicastMin="2075" unicastPer="194995" unicastRate="1089.254093" unicastSpct="0" unicastThr="" unicastTr="0" unicastTrBase="503518"/></imdata>

> moquery -c eqptIngrPkts5min -f 'eqpt.IngrPkts5min.unicastRate>"1000"' | egrep -e "^dn|^unicastRate"

dn : topology/pod-1/node-101/sys/phys-[eth1/34]/CDeqptIngrPkts5min

unicastRate : 1742.12

The way we can do it with ACI

Query any managed object (MO) for data we need!

example: finding interface with unicast rate > 1000

• Q: that’s cool, but how do I know which object/class to query …?

• Q: it looks cryptic to me ... how do I find meaning of each field? check next slide for the answer

Page 9: ACI Troubleshooting · •Introduction • Understanding Faults and Health status • Tools • Troubleshooting scenarios • Conclusion / Q&A Agenda

APIC Management Information Model Reference

direct URL

https://apic/doc/html/

From the WebUI

Page 10: ACI Troubleshooting · •Introduction • Understanding Faults and Health status • Tools • Troubleshooting scenarios • Conclusion / Q&A Agenda

Connect to APIC

apic 3

apic 2

apic 1

AP

IC C

lust

er

CLI (ssh)

Visore

APIC UI

Web Browser

Page 11: ACI Troubleshooting · •Introduction • Understanding Faults and Health status • Tools • Troubleshooting scenarios • Conclusion / Q&A Agenda

Connect to switchspine 1 spine 2

leaf 1 leaf 2 leaf 3 leaf 4 leaf 5

ACI Fabric

We could connect directly to switches as well

- ssh or console

- visore

- REST

Page 12: ACI Troubleshooting · •Introduction • Understanding Faults and Health status • Tools • Troubleshooting scenarios • Conclusion / Q&A Agenda

CLI Available at the SwitchAAA via TACACS+, Radius and LDAP is supported when logging into switch CLI console.

Configuration mode is not supported at switch console.

There are two scenarios where administrators would log into switch console:

• From APIC UI, admin can remote login to switch console

• Login directly via serial console port on the switch front panel or SSH to management

IP via out of band or inband

For majority of use cases,

admin should utilize APIC.

Using username "admin".Application Policy Infrastructure Controlleradmin@apic1:~> acidiag fnvread

ID Name Serial Number IP Address Role State LastUpdMsgId-------------------------------------------------------------------------------------------------

101 leaf1 SAL18CLUX85 10.0.40.66/32 leaf active 0102 leaf2 SAL18CBRU00 10.0.64.69/32 leaf active 0103 leaf3 SAL18CLHR05 10.0.40.95/32 leaf active 0104 leaf4 SAL18CAMS14 10.0.40.65/32 leaf active 0105 leaf5 SAL18CCHD53 10.0.112.69/32 leaf active 0201 spine1 SAL18CMUC75 10.0.64.65/32 spine active 0202 spine2 SAL18CFRA11 10.0.64.64/32 spine active 0203 spine3 SAL18CSAN15 10.0.40.69/32 spine inactive 0x4000000ef664f204 spine4 SAL18CSFO14 10.0.112.67/32 spine inactive 0x4000000ef6650

Total 9 nodes

admin@apic1> ssh leaf1

Page 13: ACI Troubleshooting · •Introduction • Understanding Faults and Health status • Tools • Troubleshooting scenarios • Conclusion / Q&A Agenda

Fabric Health Overview

Page 14: ACI Troubleshooting · •Introduction • Understanding Faults and Health status • Tools • Troubleshooting scenarios • Conclusion / Q&A Agenda

Troubleshooting: Where do we start?

Statistics Faults Diagnostics

Faults,

Health Scores

ELAM SPANAtomic

Counters

On-Demand

Diagnostics

Drill-Downs

Thresholds

Fabric-wide monitoring

StatsSwitch

Nxos Cli

Troubleshooting, Drill Downs

Page 15: ACI Troubleshooting · •Introduction • Understanding Faults and Health status • Tools • Troubleshooting scenarios • Conclusion / Q&A Agenda

After logging in to the APIC, you’ll

see the initial ‘Dashboard’ screen.

Page 16: ACI Troubleshooting · •Introduction • Understanding Faults and Health status • Tools • Troubleshooting scenarios • Conclusion / Q&A Agenda

The APIC dashboard provides you with an ‘at-a-glance’ view of the system health and fault counts.

Page 17: ACI Troubleshooting · •Introduction • Understanding Faults and Health status • Tools • Troubleshooting scenarios • Conclusion / Q&A Agenda

‘System Health’ shows you a view of the

overall health of the ACI system (all nodes, tenants, etc).

Graph is plotted as per fabricOverallHealthHist5min

fabricHealthTotal

Page 18: ACI Troubleshooting · •Introduction • Understanding Faults and Health status • Tools • Troubleshooting scenarios • Conclusion / Q&A Agenda

API Inspectorenables us to see REST API calls (GET, DELETE, POST) from WebUI to APIC

admin@apic1> moquery -d "/topology/HDfabricOverallHealth5min-0"Total Objects shown: 1

# fabric.OverallHealthHist5minindex : 0childAction :cnt : 31dn : /topology/HDfabricOverallHealth5min-0healthAvg : 82healthMax : 82healthMin : 82healthSpct : 0healthThr :healthTr : 0lastCollOffset : 310modTs : neverrepIntvEnd : 2015-04-10T19:24:03.530+01:00repIntvStart : 2015-04-10T19:18:53.442+01:00rn : HDfabricOverallHealth5min-0status :Prefer JSON or XML instead of text in moquery?

-> no problem just specify “–o json” or “-o xml” with moquery

82

Page 19: ACI Troubleshooting · •Introduction • Understanding Faults and Health status • Tools • Troubleshooting scenarios • Conclusion / Q&A Agenda

How is topology built?

admin@apic1:~> moquery -c fabricLink…# fabric.Linkn1 : 203s1 : 1p1 : 1n2 : 101s2 : 1p2 : 51dn : topology/pod-1/lnkcnt-101/lnk-203-1-1-to-101-1-51lcOwn : locallinkState : okmodTs : 2015-03-13T14:26:39.526+01:00monPolDn : uni/fabric/monfab-defaultrn : lnk-203-1-1-to-101-1-51status :wiringIssues :

admin@bdsol-aci2-apic1:~> moquery -c fabricLink | egrep -e ^dn | head -5dn : topology/pod-1/lnkcnt-1/lnk-102-1-2-to-1-2-2dn : topology/pod-1/lnkcnt-2/lnk-102-1-4-to-2-2-2dn : topology/pod-1/lnkcnt-3/lnk-102-1-6-to-3-2-2dn : topology/pod-1/lnkcnt-201/lnk-102-1-49-to-201-1-34dn : topology/pod-1/lnkcnt-202/lnk-102-1-50-to-202-1-34

• APIC WebUI and API inspector• Identify which objects are used

to plot topology• Re-using fabricLink objects to

identify the links• We could create our own tool

for topology, monitoring or troubleshooting

Page 20: ACI Troubleshooting · •Introduction • Understanding Faults and Health status • Tools • Troubleshooting scenarios • Conclusion / Q&A Agenda

Visore – Web based MO query and browser tool

https://<IP>/visore.html

<?xml version="1.0" encoding="UTF-8"?><imdata totalCount="1"><fabricNode

adSt="on" childAction="" delayedHeartbeat="no" dn="topology/pod-1/node-101"

fabricSt="active" id="101" lcOwn="local" modTs="2015-04-08T14:38:44.546+02:00"

model="N9K-C9396PX" monPolDn="uni/fabric/monfab-default" name="bdsol-9396px-

02" role="leaf" serial="SAL18CLUS15" status="" uid="0" vendor="Cisco Systems, Inc"

version=""/></imdata>

fabricNode

adSt on

childAction

delayedHeartbeat no

dn topology/pod-1/node-101

fabricSt active

id 101

lcOwn local

modTs 2015-04-08T14:38:44.546+02:00

model N9K-C9396PX

monPolDn uni/fabric/monfab-default

name bdsol-9396px-02

role leaf

serial SAL18CLUS15

status

uid 0

vendor Cisco Systems, Inc

version

icurl -k 'https://apic/api/node/class/fabricNode.xml?query-target-filter=and(eq(fabricNode.id,"101"))'

Page 21: ACI Troubleshooting · •Introduction • Understanding Faults and Health status • Tools • Troubleshooting scenarios • Conclusion / Q&A Agenda

The lower half of the screen shows node and tenant health.

Page 22: ACI Troubleshooting · •Introduction • Understanding Faults and Health status • Tools • Troubleshooting scenarios • Conclusion / Q&A Agenda

The lower half of the screen shows node and tenant health.

Move these sliders down to

show only nodes / tenants

with lower health.

Page 23: ACI Troubleshooting · •Introduction • Understanding Faults and Health status • Tools • Troubleshooting scenarios • Conclusion / Q&A Agenda

On the right, you’ll see the fault

counts by domain

(e.g. access, tenant, security)…

…type

(config, environmental, etc)…

…and APIC cluster health.

Page 24: ACI Troubleshooting · •Introduction • Understanding Faults and Health status • Tools • Troubleshooting scenarios • Conclusion / Q&A Agenda

How to get object DN from GUI

2

1

3

Page 25: ACI Troubleshooting · •Introduction • Understanding Faults and Health status • Tools • Troubleshooting scenarios • Conclusion / Q&A Agenda

Health Score

Number

between

0 and 100Health Score

100 Perfect Health Score = 100

Page 26: ACI Troubleshooting · •Introduction • Understanding Faults and Health status • Tools • Troubleshooting scenarios • Conclusion / Q&A Agenda

Almost perfect score

Let me think … weighted score

I need 1 more to become perfect …

Page 27: ACI Troubleshooting · •Introduction • Understanding Faults and Health status • Tools • Troubleshooting scenarios • Conclusion / Q&A Agenda

Tools and utilities

Page 28: ACI Troubleshooting · •Introduction • Understanding Faults and Health status • Tools • Troubleshooting scenarios • Conclusion / Q&A Agenda

Physical Network

• ping

• traceroute

• show (interface / table / etc)

• syslog

• SPAN

Abstracted Network

• properties (EP / TEP / contract)

• health scores / faults / events / audit

• itraceroute

• atomic counters

• statistics

• diagnostics (on-demand)

• SPAN

• ELAM

Network Monitoring and Troubleshooting Tools

Page 29: ACI Troubleshooting · •Introduction • Understanding Faults and Health status • Tools • Troubleshooting scenarios • Conclusion / Q&A Agenda

FaultsHealth Audits Events

Statistics Call-home Syslogs SNMP

UI Tools

Page 30: ACI Troubleshooting · •Introduction • Understanding Faults and Health status • Tools • Troubleshooting scenarios • Conclusion / Q&A Agenda

MIT access from ishell

admin@apic1:mit> cd /mitadmin@apic1:mit> ls -1ltotal 3drw-rw---- 1 admin admin 512 Apr 2422:48 compdrw-rw---- 1 admin admin 512 Apr 2422:48 dbgsdrw-rw---- 1 admin admin 512 Apr 2422:48 expcontdrw-rw---- 1 admin admin 512 Apr 2422:48 fwrepodrw-rw---- 1 admin admin 512 Apr 2422:48 topologydrw-rw---- 1 admin admin 512 Apr 2422:48 uni

Page 31: ACI Troubleshooting · •Introduction • Understanding Faults and Health status • Tools • Troubleshooting scenarios • Conclusion / Q&A Agenda

moquery – CLI based MO query tooladmin@apic1:~> moquery -c fabricNode -f 'fabric.Node.id=="1"'Total Objects shown: 1

# fabric.Nodeid : 1adSt : ondelayedHeartbeat : nodn : topology/pod-1/node-1fabricSt : unknownlcOwn : localmodTs : 2015-04-08T14:27:16.290+02:00model : APICmonPolDn : uni/fabric/monfab-defaultname : apic1rn : node-1role : controllerserial : SAL18CLUS15status :uid : 0vendor : Cisco Systems, Incversion :

Page 32: ACI Troubleshooting · •Introduction • Understanding Faults and Health status • Tools • Troubleshooting scenarios • Conclusion / Q&A Agenda

• Find all EPGs with access encapsulation VLAN 3399

moquery -c fvRsPathAtt -o json -f ‘fv.RsPathAtt.encap=="vlan-3399"‘

• Obtain AAEP based on interface policy group

moquery -c "infraAccPortGrp" | egrep "^dn" | awk ' { print "moquery -d "$3" -x query-target=children \| egrep tDn" } ‘

• Query the actual policy group

moquery -d "uni/infra/funcprof/accportgrp-N3k_PG_ddastoli" -x query-target=children

moquery – some examples

Page 33: ACI Troubleshooting · •Introduction • Understanding Faults and Health status • Tools • Troubleshooting scenarios • Conclusion / Q&A Agenda

mobrowser – CLI based MO browser tool

Page 34: ACI Troubleshooting · •Introduction • Understanding Faults and Health status • Tools • Troubleshooting scenarios • Conclusion / Q&A Agenda

APIC Logs Switch Logs

• /var/log/dme/log

• /var/log/dme/oldlog

• /var/log/dme/log

• /var/log/dme/oldlog

• /var/sysmgr/tmp_logs/

Page 35: ACI Troubleshooting · •Introduction • Understanding Faults and Health status • Tools • Troubleshooting scenarios • Conclusion / Q&A Agenda

DME running on switch

NX

OS

Pro

ce

ss

NX

OS

Pro

ce

ss

NX

OS

Pro

ce

ss

Objectstore (Shared memory)

Switch

Get logical MO from PM and

push concrete MO to configure

switch

Delegate local faults, events,

records, health score Atomic counters, core handlingCollect stats from NXOS and

push to APIC

Opflex server for external

opflex elem

Page 36: ACI Troubleshooting · •Introduction • Understanding Faults and Health status • Tools • Troubleshooting scenarios • Conclusion / Q&A Agenda

acidiag – your friend at tough timesadmin@apic1:~> acidiag --help...

avread read appliance vectorfnvread read fabric node vectorfnvreadex read fabric node vector (extended mode)rvread read replica vectorrvreadle read replica leader summarycrashsuspecttracker

read crash suspect tracker statevalidateimage validate imageversion show ISO versionpreservelogs stash away logs in preparation for hard rebootplatform show platformverifyapic run apic installation verify commandbond0test run bond0 testtouch touch special filesrun run specific commands and capture outputinstaller installerstart start a servicestop stop a servicerestart restart a servicereboot reboot

Page 37: ACI Troubleshooting · •Introduction • Understanding Faults and Health status • Tools • Troubleshooting scenarios • Conclusion / Q&A Agenda

mkdir /tmp/tac-655555555

cd /tmp/tac-655555555

icurl –k 'https://localhost/api/class/faultInfo.xml' > faultInfo.xml

icurl –k 'https://localhost/api/class/faultRecord.xml' > faultRecord.xml

icurl –k 'https://localhost/api/class/eventRecord.xml' > eventRecord.xml

icurl –k 'https://localhost/api/class/aaaModLR.xml' > aaaModLR.xml

icurl –k 'https://localhost/api/class/aaaSessionLR.xml' > aaaSessionLR.xml

cd /tmp

tar zcvf tac-655555555.tgz tac-655555555

cp tac-655555555.tgz /data/techsupport

icurl – CLI utility for data transfer

Now you may download file from following URL:https://apic/files/1/techsupport/tac-655555555.tgz

We can import and analyze active faults, fault history, events history, accounting log, login history

Page 38: ACI Troubleshooting · •Introduction • Understanding Faults and Health status • Tools • Troubleshooting scenarios • Conclusion / Q&A Agenda

iShell filesystem - scriptcontainer

/ - APIC root filesystem/var/run/bashroot…bashroot/var/log/dme/log

…/mgmt/log/scriptcontainer.log

/ - ishell root folder/var/log/dme/log/debug/aci/mit

Linux

admin shell

Page 39: ACI Troubleshooting · •Introduction • Understanding Faults and Health status • Tools • Troubleshooting scenarios • Conclusion / Q&A Agenda

Troubleshooting scenarios

Page 40: ACI Troubleshooting · •Introduction • Understanding Faults and Health status • Tools • Troubleshooting scenarios • Conclusion / Q&A Agenda

2 x spine

2 x leaf N9K-9396px(48 x 1/10G SFP+)

2 x leaf N9K-93128tx(96 x 1/10G Base-T)

1 x leaf N9K-C9372px(48 x 1/10G SFP+)

3 x APIC

Topologyspine 1 spine 2

leaf 1 leaf 2 leaf 3 leaf 4 leaf 5

apic 3apic 2apic 1

10Gbps

ACI Fabric

Page 41: ACI Troubleshooting · •Introduction • Understanding Faults and Health status • Tools • Troubleshooting scenarios • Conclusion / Q&A Agenda

That’s all nice, but what if I can’t connect to WebUI

Page 42: ACI Troubleshooting · •Introduction • Understanding Faults and Health status • Tools • Troubleshooting scenarios • Conclusion / Q&A Agenda

Troubleshooting Web UI performanceOpen Web Browser’s Developer Tools Network tab

Web Browser’s Developer tool Network tab

Showing latency for each HTTP Request to APIC server

Ctrl + Shift + I or F12orCmd + Opt + I

Page 43: ACI Troubleshooting · •Introduction • Understanding Faults and Health status • Tools • Troubleshooting scenarios • Conclusion / Q&A Agenda

REST API call without webtokenVerify if APIC is able to process REST API

withoutLogin / APIC-cookie

http://apic/api/aaaListDomains.xml

Double-click on the specific request to

check timing details.

10ms looks good

Page 44: ACI Troubleshooting · •Introduction • Understanding Faults and Health status • Tools • Troubleshooting scenarios • Conclusion / Q&A Agenda

zegrep -A5 "aaaListDomains.xml" /var/log/dme/log/nginx.bin.log.*

nginx.bin.log.14.gz:

29701||15-05-10 23:11:05.701+02:00||nginx||DBG4||||Request received /api/aaaListDomains.xml||../common/src/rest/./Rest.cc||62 bico 56.827

29701||15-05-10 23:11:05.701+02:00||nginx||DBG4||||httpmethod=1; from 10.48.16.90; url=/api/aaaListDomains.xml; urloptions=||../common/src/rest/./Request.cc||103

29720||15-05-10 23:11:05.705+02:00||nginx||DBG4||co=doer:255:127:0xff00000003249f06:1||outCode: 200||../common/src/rest/./Worker.cc||357

29720||15-05-10 23:11:05.705+02:00||nginx||DBG4||co=doer:255:127:0xff00000003249f06:1||notifyEvent data ready 0x0||../common/src/rest/./Worker.cc||370

29701||15-05-10 23:11:05.706+02:00||nginx||DBG4||||Reply data (request 831 size 211) <?xml version="1.0" encoding="UTF-8"?><imdata totalCount="4"><aaaLoginDomain name="LOCAL"/><aaaLoginDomain name="RADIUS"/><aaaLoginDomainname="TACACS"/><aaaLoginDomain name="DefaultAuth" guiBanner=""/></imdata> Cookie: NONE||../common/src/rest/./Rest.cc||120

How does it look from APIC’s side?

We could use any other criteria for grep:IP, time stamp etc

zegrep -A5 "aaaListDomains.json" /var/log/dme/log/nginx*

Note JSON is usedbyAPIC WebUI, while we

used XML.

Page 45: ACI Troubleshooting · •Introduction • Understanding Faults and Health status • Tools • Troubleshooting scenarios • Conclusion / Q&A Agenda

APIC DME Debug URL

http://apic1/api/nginx/debug/tacacs.xml

Debug data of DMEs is also exposed via REST

Page 46: ACI Troubleshooting · •Introduction • Understanding Faults and Health status • Tools • Troubleshooting scenarios • Conclusion / Q&A Agenda

Same debug data is accessible from ishell alsoadmin@apic1:~> cat /debug/bdsol-aci3-apic1/nginx/tacacs/moRequestsDispatched : 1511ResponsesReceived : 1498

Check all other nifty stats by executing “find /debug/* …”

Example:

admin@apic1:~> find /debug/* -print -type f -exec cat {} \;

You can also check logs matching certain criteria

Example below, looking for tacacs logs or specific time.

zegrep TAC_ /var/log/dme/log/nginx*zegrep TAC_ /var/syslog/tmp_logs/nginx*zegrep “15-05-09 03:48” /var/log/dme/log/*

Page 47: ACI Troubleshooting · •Introduction • Understanding Faults and Health status • Tools • Troubleshooting scenarios • Conclusion / Q&A Agenda

Finding changes, faults during certain timeframe

Page 48: ACI Troubleshooting · •Introduction • Understanding Faults and Health status • Tools • Troubleshooting scenarios • Conclusion / Q&A Agenda

System health change

We noticed slight decrease in System health

Is the cause known?Do we need to perform Root Cause Analysis?Were there any known changes, maintenance etc?

… we’re not sure … should we call SWAT?

Page 49: ACI Troubleshooting · •Introduction • Understanding Faults and Health status • Tools • Troubleshooting scenarios • Conclusion / Q&A Agenda

Déjà vu?

We’ve suddenly experienced connectivity loss … nothing has been changed …

Let’s think for a second:

What is the the most common

cause of all network incidents?

Change!

Page 50: ACI Troubleshooting · •Introduction • Understanding Faults and Health status • Tools • Troubleshooting scenarios • Conclusion / Q&A Agenda

aaaModLRaaaModLR - AAA audit log record,which is automatically generatedwhenever a user modifies an object.

moquery -c aaaModLR -f 'aaa.ModLR.created>" 2015-05-07" and aaa.ModLR.created<" 2015-05-10"'

we want to check if there were any config changes

Match audit records (aaaModLR)between 2015-05-07 AND 2015-05-10

We noticed slight decrease in System health

moquery -c aaaLogLR -f 'aaa.Mod.LR.created=="2015-05-10“'

Match only on May 10th 2015

Page 51: ACI Troubleshooting · •Introduction • Understanding Faults and Health status • Tools • Troubleshooting scenarios • Conclusion / Q&A Agenda

Example looking for audit records by date / time

admin@bdsol-aci2-apic1:~> moquery -c aaaModLR -f 'aaa.ModLR.created>" 2015-05-07T17:00" and aaa.ModLR.created<"2015-05-11"'# aaa.ModLRid : 8589938110affected : uni/fabric/outofsvc/rsoosPath-[topology/pod-1/paths-101/pathep-[eth1/12]]cause : transitionchangeSet :childAction :code : E4208269created : 2015-05-08T15:22:04.317+01:00descr : Interface topology/pod-1/paths-101/pathep-[eth1/12] enableddn : subj-[uni/fabric/outofsvc/rsoosPath-[topology/pod-1/paths-101/pathep-[eth1/12]]]/mod-8589938110ind : deletionmodTs : neverrn : mod-8589938110severity : infostatus :trig : configtxId : 10720396user : admin

We don’t do changes on non-business days and the day before, so let’s see who has performed any config between

Thursday evening and Monday morning

admin configured interface eth1/12 on node 101

Page 52: ACI Troubleshooting · •Introduction • Understanding Faults and Health status • Tools • Troubleshooting scenarios • Conclusion / Q&A Agenda

ok so we found there was some admin changes on eth1/12

faultRecord in GUI

We could also check:

eventRecord

healthRecord

double click

Page 53: ACI Troubleshooting · •Introduction • Understanding Faults and Health status • Tools • Troubleshooting scenarios • Conclusion / Q&A Agenda

admin@apic1:~> moquery -c faultInst | egrep -e "^descr" | sort | uniq -c

2 descr : Configuration failed for EPG default due to Not Associated With Management Zone3 descr : Datetime Policy Configuration for F5clock failed due to : access-epg-not-specified1 descr : Failed to form relation to MO AbsGraph-VEStandAloneFuncProfile of class vnsAbsGraph1 descr : Failed to form relation to MO fwP-default of class nwsFwPol in context uni/infra1 descr : Ntp configuration on leaf leaf1 is Not Synchronized1 descr : Ntp configuration on leaf leaf2 is Not Synchronized1 descr : Ntp configuration on spine spine1 is Not Synchronized1 descr : Power supply shutdown. (serial number DCB18CLUS15)

Using moquery to dump/sort active faults (faultInst)

moquery –c faultInst –f fault.Inst.descr==“: Failed to form relation to MO AbsGraph-VEStandAloneFuncProfile …”

Now we could query all faults by criteria – such as description (fault.Inst.descr)

quickly sorts all active faults

Page 54: ACI Troubleshooting · •Introduction • Understanding Faults and Health status • Tools • Troubleshooting scenarios • Conclusion / Q&A Agenda

L4–L7 Integration debuging

Page 55: ACI Troubleshooting · •Introduction • Understanding Faults and Health status • Tools • Troubleshooting scenarios • Conclusion / Q&A Agenda

Troubleshooting: APIC Faults / Visore / debug.log / LTM log

APIC Faults

https://<APIC>/visore.html

/data/devicescript/F5.BIGIP.1.1.0/logs/debug.log/var/log/*

Page 56: ACI Troubleshooting · •Introduction • Understanding Faults and Health status • Tools • Troubleshooting scenarios • Conclusion / Q&A Agenda

APIC Faults

Double click on faultsIf need more details,

copy the affect object

Page 57: ACI Troubleshooting · •Introduction • Understanding Faults and Health status • Tools • Troubleshooting scenarios • Conclusion / Q&A Agenda

Example L4-L7 fault details using Visore Toolhttps://apic/visore.htm

Paste the affected object in “Class or DN” field

Provide full details of the issues

Page 58: ACI Troubleshooting · •Introduction • Understanding Faults and Health status • Tools • Troubleshooting scenarios • Conclusion / Q&A Agenda

APIC debug.logLocate the APIC that contains the shard configuring the BIG-IP, then go to the following location:

You will see debug.log and periodic.log

You can “tail -f debug.log” to monitor the process

admin@apic1:~> cd /data/devicescript/F5.BIGIP.1.0.0/logs

admin@apic1:logs> ls –all-rw-r--r-- 2 nobody nobody 52688 Sep 30 11:31 debug.log-rw-r--r-- 2 nobody nobody 35492 Sep 30 11:30 periodic.log

Page 59: ACI Troubleshooting · •Introduction • Understanding Faults and Health status • Tools • Troubleshooting scenarios • Conclusion / Q&A Agenda

APIC debug.log (faults)

2014-07-25 18:04:00,675 DEBUG 139789634365184 [172.23.76.198, 8534]: Faults: []

2014-07-25 18:05:47,466 DEBUG 139789634365184 [172.23.76.198, 8543]: result: serviceAudit {'stats': {'max': 20.035178899765015, 'num': 2, 'last': 20.035178899765015, 'avg': 16.63836646080017, 'min': 13.241554021835327}, 'result': {'faults': [([], 82, "Line 100 apic/service.py::modify: Could not configure service state: Server raised fault: 'Exception caught in Networking::urn:iControl:Networking/RouteDomainV2::get_identifier()\nException: Common::OperationFailed\n\tprimary_error_code : 17237812 (0x01070734)\n\tsecondary_error_code : 0\n\terror_string : 01070734:3: Configuration error: Invalid mcpd context, folder not found (/apic_5794)'")], 'state': 3, 'health': [([], 0)]}}

2014-07-25 18:05:47,467 DEBUG 139789634365184 [172.23.76.198, 8543]: Faults: [([], 82, "Line 100 apic/service.py::modify: Could not configure service state: Server raised fault: 'Exception caught in Networking::urn:iControl:Networking/RouteDomainV2::get_identifier()\nException: Common::OperationFailed\n\tprimary_error_code : 17237812 (0x01070734)\n\tsecondary_error_code : 0\n\terror_string : 01070734:3: Configuration error: Invalid mcpd context, folder not found (/apic_5794)'")]

Example: mcpd

Page 60: ACI Troubleshooting · •Introduction • Understanding Faults and Health status • Tools • Troubleshooting scenarios • Conclusion / Q&A Agenda

APIC debug.log (faults)

2014-10-07 13:09:51,166 DEBUG 140447157077760 [198.18.128.130, 76]: Faults: []

2014-10-07 13:09:51,187 DEBUG 140447157077760 [None, None]: Waiting for task

2014-10-07 13:09:53,847 DEBUG 140447148685056 [198.18.128.130, 76]: route_domain: Allocated route domain 907

2014-10-07 13:09:53,957 DEBUG 140447148685056 [198.18.128.130, 76]: route_domain: Setting route domain 907 on device BIGIP1

2014-10-07 13:09:54,140 INFO 140447148685056 [198.18.128.130, 76]: Line 664 apic/service.py::_modify_vlan: Target: : Creating VLAN '4663_16387' ID 202

2014-10-07 13:09:56,532 INFO 140447148685056 [198.18.128.130, 76]: Line 679 apic/service.py::_modify_vlan: Target: : Modifying VLAN '4663_16387' interface '1.1'

2014-10-07 13:09:57,304 DEBUG 140447148685056 [198.18.128.130, 76]: result: serviceModify {'stats': {'max': 39.48741388320923, 'num': 4, 'last': 6.139014005661011, 'avg': 21.184859931468964, 'min': 6.139014005661011}, 'result': {'faults': [([(0, '', 4663), (7, '', '2752512_16387')], 81, "Line 383 apic/handlers.py::set_interface: device: : VLAN ifc update fail: Server raised fault: 'Exception caught in Networking::urn:iControl:Networking/VLAN::add_member()\nException: Common::OperationFailed\n\tprimary_error_code : 17236569 (0x01070259)\n\tsecondary_error_code : 0\n\terror_string : 01070259:3: Requested member (1.1) is untagged on another VLAN'")], 'state': 2, 'health': []}}

Example: Tagging mismatch

Page 61: ACI Troubleshooting · •Introduction • Understanding Faults and Health status • Tools • Troubleshooting scenarios • Conclusion / Q&A Agenda

BIG-IP LTM log

Jul 19 11:57:53 apic-bigip2 notice mcpd[7439]: 01070638:5: Pool /apic_5668/apic_5668_webPool member /apic_5668/192.168.10.101%1295:80 monitor status down. [ /apic_5668/apic_5668_webMonitor: down ] [ was up for 20hrs:55mins:46sec ]

Jul 19 11:57:54 apic-bigip2 notice mcpd[7439]: 01070638:5: Pool /apic_5668/apic_5668_webPool member /apic_5668/192.168.10.102%1295:80 monitor status down. [ /apic_5668/apic_5668_webMonitor: down ] [ was up for 20hrs:55mins:47sec ]

Jul 19 11:57:54 apic-bigip2 notice mcpd[7439]: 01071682:5: SNMP_TRAP: Virtual /apic_5668/apic_5668_4096_Virtual-Server has become unavailable

Jul 19 11:57:54 apic-bigip2 err tmm[9357]: 01010028:3: No members available for pool /apic_5668/apic_5668_webPool

Jul 19 11:57:54 apic-bigip2 err tmm1[9357]: 01010028:3: No members available for pool /apic_5668/apic_5668_webPool

Jul 19 11:57:54 apic-bigip2 err tmm2[9357]: 01010028:3: No members available for pool /apic_5668/apic_5668_webPool

Jul 19 11:57:54 apic-bigip2 err tmm3[9357]: 01010028:3: No members available for pool /apic_5668/apic_5668_webPool

Jul 19 12:03:02 apic-bigip2 err iprepd[6725]: 015c0004:3: failed connect to 208.87.136.155 on 443

Jul 19 12:03:03 apic-bigip2 err iprepd[6725]: 015c0004:3: Certificate verification error: 18

Jul 19 12:03:03 apic-bigip2 err iprepd[6725]: 015c0004:3: nSendReceiveSsl failed SSL handshake

Jul 19 12:04:11 apic-bigip2 info pfmand[6925]: 01660009:6: Link: 2.1 is DOWN

Jul 19 12:04:11 apic-bigip2 info pfmand[6925]: 01660009:6: Link: 2.2 is DOWN

[root@bigip:Active:In Sync] log # cd /var/log[root@bigip:Active:In Sync] log # ls ltm*ltm ltm.11.gz ltm.2.gz ltm.4.gz ltm.6.gz ltm.8.gzltm.10.gz ltm.1.gz ltm.3.gz ltm.5.gz ltm.7.gz ltm.9.gz

SSH as root into BIG-IP and go to:

Example output

Page 62: ACI Troubleshooting · •Introduction • Understanding Faults and Health status • Tools • Troubleshooting scenarios • Conclusion / Q&A Agenda

Access Encapto

Fabric Encap

Page 63: ACI Troubleshooting · •Introduction • Understanding Faults and Health status • Tools • Troubleshooting scenarios • Conclusion / Q&A Agenda

spine 1 spine 2

leaf 3 leaf 4 leaf 5leaf 1 leaf 2

EP A to EPB - simplified

1

1

1 2

3

2

3

Regular L2 packet

iVXLAN packet

Regular L2 packet

EP A EP B

Page 64: ACI Troubleshooting · •Introduction • Understanding Faults and Health status • Tools • Troubleshooting scenarios • Conclusion / Q&A Agenda

spine 1 spine 2

leaf 3 leaf 4 leaf 5leaf 1 leaf 2

How to identify VLAN mapping

VM A

MAC: 00:00:33:33:33:33

linux VM A:

connected to ACI fabric

VLAN 3399

Scenario:

VM A is unable to reach other endpoints connected to the Fabric

- ping doesn’t work

- ARP doesn’t work

Page 65: ACI Troubleshooting · •Introduction • Understanding Faults and Health status • Tools • Troubleshooting scenarios • Conclusion / Q&A Agenda

leaf 1

eth 1/34

What happens when packet from EP A reaches leaf

EP A

MAC: 00:00:33:33:33:33

To Spines

Cisco ASIC

Merchant

ASIC

8/12 x 40G

To servers/blade, switches

8/12 x 40G

48/96 x 10G

leaf 1

1

2

3

packet first comes to

Merchant ASIC (BCM)

forwarded to destination

if it’s known on BCM

if destination not learned in BCM forwarding table, then send to Cisco ASIC

Page 66: ACI Troubleshooting · •Introduction • Understanding Faults and Health status • Tools • Troubleshooting scenarios • Conclusion / Q&A Agenda

Linux view

VM thinks it’s interface is in VLAN 3399

VM MAC: 00:00:33:33:33:33

Page 67: ACI Troubleshooting · •Introduction • Understanding Faults and Health status • Tools • Troubleshooting scenarios • Conclusion / Q&A Agenda

switch# bcm-shell-hw "l2 show"

mac=52:54:00:b0:c4:81 vlan=57 GPORT=0x22 modid=0 port=34/xe33 Hit

mac=58:f3:9c:24:2e:87 vlan=15 GPORT=0x2 modid=0 port=2/xe1 Hit

mac=00:00:33:33:33:33 vlan=57 GPORT=0x22 modid=0 port=34/xe33 Hit

mac=52:54:00:c3:b8:2c vlan=58 GPORT=0x22 modid=0 port=34/xe33 Hit

mac=00:22:bd:e2:e2:e2 vlan=49 GPORT=0x7f modid=2 port=127 Static

bcm-shell-hw

Broadcom says it’s

VLAN 57

checking l2 forwarding table on Broadcom

Page 68: ACI Troubleshooting · •Introduction • Understanding Faults and Health status • Tools • Troubleshooting scenarios • Conclusion / Q&A Agenda

switch# show mac address-table interface ethernet 1/34

Legend:

* - primary entry, G - Gateway MAC, (R) - Routed MAC, O - Overlay MAC

VLAN MAC Address Type age Secure NTFY Ports/SWID.SSID.LID

---------+-----------------+--------+---------+------+----+------------------

* 53 0000.3333.3333 dynamic - F F eth1/34

* 53 5254.00b0.c481 dynamic - F F eth1/34

* 54 5254.00c3.b82c dynamic - F F eth1/34

MAC learning from ACI switch

iShell CLI says it’s VLAN 53

from ishell command interface

Page 69: ACI Troubleshooting · •Introduction • Understanding Faults and Health status • Tools • Troubleshooting scenarios • Conclusion / Q&A Agenda

module-1# show system internal eltmc info vlan access_encap_vlan 3399

vlan_id: 53 ::: hw_vlan_id: 57

vlan_type: FD_VLAN ::: bd_vlan: 52

access_encap_type: 802.1q ::: access_encap: 3399

fabric_encap_type: VXLAN ::: fabric_encap: 9891

sclass: 16387 ::: scope: 8

bd_vnid: 9891 ::: untagged: 0

acess_encap_hex: 0xd47 ::: fabric_enc_hex: 0x26a3

so which VLAN is it?

it’s iVXLAN 9891 ??

note: we’re in vsh_lc CLI

Page 70: ACI Troubleshooting · •Introduction • Understanding Faults and Health status • Tools • Troubleshooting scenarios • Conclusion / Q&A Agenda

Is this actually possible with ACI?

Page 71: ACI Troubleshooting · •Introduction • Understanding Faults and Health status • Tools • Troubleshooting scenarios • Conclusion / Q&A Agenda

ELAM

Page 72: ACI Troubleshooting · •Introduction • Understanding Faults and Health status • Tools • Troubleshooting scenarios • Conclusion / Q&A Agenda

ELAM stands for Embedded Logic Analyzer Module

It is a logic that is present in the ASICs that provides the capability to capture and view one or more packets, that match a user specified criteria, from the stream of packets that are processed by the ASIC

What is ELAM?

Page 73: ACI Troubleshooting · •Introduction • Understanding Faults and Health status • Tools • Troubleshooting scenarios • Conclusion / Q&A Agenda

ELAM Support in Cisco ASIC

Lookup Block

Egress Pipeline (FabricFrontPanel)

ELAM

ELAM

Ingress Pipeline (FrontPanelFabric)

Parser Block

Packet RW Sideband

To Fabric

From BCM

Lookup Block

ELAM

ELAM

Parser Block

Packet RW Sideband

From Fabric

To BCM

Input

Select

Lines

Output

Select

Lines

Input

Select

Lines

Output

Select

Lines

Page 74: ACI Troubleshooting · •Introduction • Understanding Faults and Health status • Tools • Troubleshooting scenarios • Conclusion / Q&A Agenda

ELAM Support in North Star• North Star data path divided into ingress and egress pipelines

• 2 ELAM’s are present in each pipeline (Input ELAM and Output ELAM)

• These ELAM’s are present at the beginning and end of the lookup block.

• ELAM’s can be configured using the available select lines

• Packets can be captured on the input ELAM based on a output condition by configuring ELAM in “reverse” mode

Limitations

• Packets can be captured based on either input select lines or output select lines but not both.

• ELAM Configuration should happen in a single user mode

Page 75: ACI Troubleshooting · •Introduction • Understanding Faults and Health status • Tools • Troubleshooting scenarios • Conclusion / Q&A Agenda

• Cisco ASIC data path divided into ingress and egress pipelines

• 2 ELAM’s are present in each pipeline (Input ELAM and Output ELAM)

• These ELAM’s are present at the beginning and end of the lookup block.

• ELAM’s can be configured using the available select lines

• Packets can be captured on the input ELAM based on a output condition by configuring ELAM in “reverse” mode

Limitations

• Packets can be captured based on either input select lines or output select lines but not both.

• ELAM Configuration should happen in a single user mode

ELAM Support

Page 76: ACI Troubleshooting · •Introduction • Understanding Faults and Health status • Tools • Troubleshooting scenarios • Conclusion / Q&A Agenda

Input Select Lines Supported3 Outerl2-outerl3-outerl44 Innerl2-innerl3-inner l45 Outerl2-innerl26 Outerl3-innerl37 Outerl4-innerl4

Output Select Lines Supported0 Pktrw5 Sideband

ELAM Support

Note:Only output select lines 0 and 5 are supported for capturing packets based on output at both output and input

Page 77: ACI Troubleshooting · •Introduction • Understanding Faults and Health status • Tools • Troubleshooting scenarios • Conclusion / Q&A Agenda

ELAM ConfigurationThe diagram flow during ELAM configuration.

• Init – Initialize the ELAM – select the asic instance, pipeline and select lines

• Config – Configure the trigger based on different fields in the packet

• Arm – Arm the trigger by setting the fields to match in hardware

• Read – Once the trigger is triggered, read the report.

• Reset – Once the process is complete, reset the trigger to restart the process

1. Init

2. Config

5. Reset

3. Arm

4. Read

Trigger

Page 78: ACI Troubleshooting · •Introduction • Understanding Faults and Health status • Tools • Troubleshooting scenarios • Conclusion / Q&A Agenda

Show the trigger

The configured trigger can be verified using the show command

root@module-1(NS-elam-insel3)# show

ELAM configuration

Page 79: ACI Troubleshooting · •Introduction • Understanding Faults and Health status • Tools • Troubleshooting scenarios • Conclusion / Q&A Agenda

ELAM Report Analysis

Elam report is very detailed and dumps many fields.

In Pktrw the important fields are• adj_index

• ol_encap_idx

• sclass

• src_tep_idx

• sup_redirect

In Sideband the important fields are • l2flood

• fwddrop

• bnce

Page 80: ACI Troubleshooting · •Introduction • Understanding Faults and Health status • Tools • Troubleshooting scenarios • Conclusion / Q&A Agenda

ELAM Example

Page 81: ACI Troubleshooting · •Introduction • Understanding Faults and Health status • Tools • Troubleshooting scenarios • Conclusion / Q&A Agenda

spine 1 spine 2

leaf 3 leaf 4 leaf 5leaf 1 leaf 2

ELAM Example

1

1

12

32

3

leaf1: input ingress

outer header

spine: input ingress

inner header

leaf4: input egress

inner header

EP A EP B

ingress

egress

Page 82: ACI Troubleshooting · •Introduction • Understanding Faults and Health status • Tools • Troubleshooting scenarios • Conclusion / Q&A Agenda

spine 1 spine 2

leaf 3 leaf 4 leaf 5leaf 1 leaf 2

ELAM Example

1

1

1 leaf1: input ingress

outer header

EP A EP B

ingress

vsh_lcdebug platform internal ns elam asic 0trigger resettrigger init ingress in-select 3 out-select 0set outer l2 src_mac 00:25:b5:aa:00:0aset outer l2 dst_mac ff:ff:ff:ff:ff:ffstartstatusreport

MAC: 00:25:b5:aa:00:0a MAC: 00:25:b5:bb:00:0b

outerNote: outer header

Packet is not yet encapsulated in iVXLANOuter header is still original frame from EP

Page 83: ACI Troubleshooting · •Introduction • Understanding Faults and Health status • Tools • Troubleshooting scenarios • Conclusion / Q&A Agenda

NOTE:1) Without the "reset" command, trigger buffers are never reset other than reboot.2) Users can move in and out of the ELAM mode, and there will be no impact on the configured triggers.

ELAM configurationleaf1# vsh_lcmodule-1# debug platform internal ns elam asic 0module-1(NS-elam)# trigger resetmodule-1(NS-elam)# trigger init ingress in-select 3 out-select 0module-1(NS-elam-insel3)# set outer l2 src_mac 00:25:b5:aa:00:0amodule-1(NS-elam-insel3)# set outer l2 dst_mac ff:ff:ff:ff:ff:ffmodule-1(NS-elam-insel3)# startmodule-1(NS-elam-insel3)# statusStatus: Armedmodule-1(NS-elam-insel3)# ?

report Show trigger report…module-1(NS-elam-insel3)# reportELAM not triggered. No report available

We’re looking to confirm if broadcast packet sourced from

MAC00:25:b5:aa:00:0a

is reaching Cisco ASIC

Page 84: ACI Troubleshooting · •Introduction • Understanding Faults and Health status • Tools • Troubleshooting scenarios • Conclusion / Q&A Agenda

module-1(NS-elam-insel3)# report | egrep ce_|ar_|drop|hg2_srcGBL_C++: [INFO] hg2_srcpid: 0AGBL_C++: [INFO] ce_da: FFFFFFFFFFFFGBL_C++: [INFO] ce_sa: 0025B5AA000AGBL_C++: [INFO] ce_etype: 0806GBL_C++: [INFO] ar_sha: 0025B5AA000AGBL_C++: [INFO] ar_spa: 0A108030GBL_C++: [INFO] ar_tha: 000000000000GBL_C++: [INFO] ar_tpa: 0A108001GBL_C++: [INFO] ar_spare: 0000000000000000000000000000GBL_C++: [MSG] - pktrw is completeGBL_C++: [INFO] drop: 0GBL_C++: [INFO] hg2_srcpid: 0AGBL_C++: [INFO] hg2_vid_lo: 63GBL_C++: [INFO] vlan0: 063GBL_C++: [INFO] adj_index: 000CGBL_C++: [INFO] ol_encap_idx: 2FF6GBL_C++: [INFO] ol_ttl: 08GBL_C++: [INFO] ol_segid: 2A8001GBL_C++: [INFO] sclass: C005GBL_C++: [INFO] sup_redirect: 0GBL_C++: [INFO] mcast: 0

•module-1(NS-elam-insel3)# show platform internal ns forwarding encap 0x2FF6•TABLE INSTANCE : 0•Legend•MD: Mode (LUX & RWX) LB: Loopback•LE: Loopback ECMP LB-PT: Loopback Port•ML: MET Last TD: TTL Dec Disable•DV: Dst Valid DT-PT: Dest Port•DT-NP: Dest Port Not-PC ET: Encap Type•OP: Override PIF Pinning HR: Higig DstMod RW•HG-MD: Higig DstMode KV: Keep VNTAG•------------------------------------------------------------• M PORT L L LB MET M T D DT DT E TST O H HG K M E•POS D FTAG B E PT PTR L D V PT NP T IDX P R MD V D T Dst MAC DIP•---------------------------------------------------------------------------------------------------------------------------------------------------•---•12278 0 c00 0 1 0 0 0 0 0 0 0 3 4 0 0 0 0 0 3 00:00:00:00:00:00 10.0.200.127

ELAM Report Analysis(trigger went off)

hg2_srcpid: source port on front panelce_sa: Source MAC addressce_etype: Ethertype 0x806 = ARP (Address Resolution)ar_spa: Source IP address = 10.16.128.48ar_tpa: Destination IP address: 10.16.128.1

People that read hex on the fly appreciate this output!

VXLAN Destination TEP address derived

from encap: 10.0.200.127

Page 85: ACI Troubleshooting · •Introduction • Understanding Faults and Health status • Tools • Troubleshooting scenarios • Conclusion / Q&A Agenda

acidiag fnvread | egrep 10.0.200.127

moquery -c tunnelIf -f 'tunnel.If.dest=="10.0.200.127"‘

show isis dtep vrf overlay-1

We have destination TEP address, what next?

Find which switch has specific TEPOn APIC or Switch

# show isis dtep vrf overlay-1IS-IS Dynamic Tunnel End Point (DTEP) database:DTEP-Address Role Encapsulation Type10.0.120.95 SPINE N/A PHYSICAL10.0.200.64 SPINE N/A PHYSICAL,PROXY-ACAST-MAC10.0.200.65 SPINE N/A PHYSICAL,PROXY-ACAST-V410.0.8.65 SPINE N/A PHYSICAL,PROXY-ACAST-V610.0.8.64 LEAF N/A PHYSICAL10.0.200.127 LEAF N/A PHYSICAL10.0.200.126 SPINE N/A PHYSICAL

switch outputAPIC is not running ISIS

protocol

Page 86: ACI Troubleshooting · •Introduction • Understanding Faults and Health status • Tools • Troubleshooting scenarios • Conclusion / Q&A Agenda

spine 1 spine 2

leaf 3 leaf 4 leaf 5leaf 1 leaf 2

ELAM Example

1

2

EP A EP B

2 spine: input ingress

inner header

ingress

vsh_lcdebug platform internal alp elam asic 0 | 1trigger init ingress in-select 3 out-select 0set inner l2 src_mac 00:25:b5:aa:00:0aset inner l2 dst_mac 00:25:b5:bb:00:0bstartstatusreport

innerCisco ASIC

in spine

MAC: 00:25:b5:aa:00:0a MAC: 00:25:b5:bb:00:0b

Packet is now encapsulated in iVXLAN, so we’re looking for inner header

Hint: don’t forget trigger reset

Page 87: ACI Troubleshooting · •Introduction • Understanding Faults and Health status • Tools • Troubleshooting scenarios • Conclusion / Q&A Agenda

spine 1 spine 2

leaf 3 leaf 4 leaf 5leaf 1 leaf 2

ELAM Example

1

3

host A host B

MAC: 00:25:b5:aa:00:0a MAC: 00:25:b5:bb:00:0b

3 leaf4: input egress

inner header

egress

vsh_lcdebug platform internal ns elam asic 0trigger init egress in-select 3 out-select 0set inner l2 src_mac 00:25:b5:aa:00:0aset inner l2 dst_mac 00:25:b5:bb:00:0bstartstatusreport

innerCisco ASIC

in leaf

report

*** report will be available when trigger went off

Egress because we’re egressing the fabric

Page 88: ACI Troubleshooting · •Introduction • Understanding Faults and Health status • Tools • Troubleshooting scenarios • Conclusion / Q&A Agenda

References

Page 89: ACI Troubleshooting · •Introduction • Understanding Faults and Health status • Tools • Troubleshooting scenarios • Conclusion / Q&A Agenda

Quick Start / Videos

APIC Help pages

API Documentation

Python SDK

APIC resources

Page 90: ACI Troubleshooting · •Introduction • Understanding Faults and Health status • Tools • Troubleshooting scenarios • Conclusion / Q&A Agenda

ACI Documentation - cisco.com/go/aci

Cisco.com – APIC Troubleshooting

Cisco Support Forums

Cisco DevNet

GitHub/datacenter

Online resources

Page 91: ACI Troubleshooting · •Introduction • Understanding Faults and Health status • Tools • Troubleshooting scenarios • Conclusion / Q&A Agenda

GitHub – a resource for ACI scripts and tools

• ACI Toolkit:http://datacenter.github.io/acitoolkit/https://github.com/datacenter/acitoolkit

• ACI Diagramhttps://github.com/cgascoig/aci-diagram

• ACI Endpoint Trackerhttp://datacenter.github.io/acitoolkit/docsbuild/html/endpointtracker.html

Page 92: ACI Troubleshooting · •Introduction • Understanding Faults and Health status • Tools • Troubleshooting scenarios • Conclusion / Q&A Agenda

Troubleshooting Cisco ACI

Available at GitHub

Page 93: ACI Troubleshooting · •Introduction • Understanding Faults and Health status • Tools • Troubleshooting scenarios • Conclusion / Q&A Agenda

Policy Driven Data Center with ACI, The: Architecture, Concepts, and Methodology

ISBN: 9781587144905

Page 94: ACI Troubleshooting · •Introduction • Understanding Faults and Health status • Tools • Troubleshooting scenarios • Conclusion / Q&A Agenda

Designing Data Centers with Cisco's ACI LiveLessons--Networking Talks

ISBN: 978-1-58714-436-3

Page 95: ACI Troubleshooting · •Introduction • Understanding Faults and Health status • Tools • Troubleshooting scenarios • Conclusion / Q&A Agenda

Participate in the “My Favorite Speaker” Contest

• Promote your favorite speaker through Twitter and you could win $200 of Cisco Press products (@CiscoPress)

• Send a tweet and include

• Your favorite speaker’s Twitter handle <@miojovanovic>

• Two hashtags: #CLUS #MyFavoriteSpeaker

• You can submit an entry for more than one of your “favorite” speakers

• Don’t forget to follow @CiscoLive and @CiscoPress

• View the official rules at http://bit.ly/CLUSwin

Promote Your Favorite Speaker and You Could Be a Winner

Page 96: ACI Troubleshooting · •Introduction • Understanding Faults and Health status • Tools • Troubleshooting scenarios • Conclusion / Q&A Agenda

Complete Your Online Session Evaluation

Don’t forget: Cisco Live sessions will be available for viewing on-demand after the event at CiscoLive.com/Online

• Give us your feedback to be entered into a Daily Survey Drawing. A daily winner will receive a $750 Amazon gift card.

• Complete your session surveys though the Cisco Live mobile app or your computer on Cisco Live Connect.

Page 97: ACI Troubleshooting · •Introduction • Understanding Faults and Health status • Tools • Troubleshooting scenarios • Conclusion / Q&A Agenda

Continue Your Education

• Demos in the Cisco campus

• Walk-in Self-Paced Labs

• Table Topics

• Meet the Engineer 1:1 meetings

• Related sessions

Page 98: ACI Troubleshooting · •Introduction • Understanding Faults and Health status • Tools • Troubleshooting scenarios • Conclusion / Q&A Agenda

Thank you

Page 99: ACI Troubleshooting · •Introduction • Understanding Faults and Health status • Tools • Troubleshooting scenarios • Conclusion / Q&A Agenda