solving congestion problems in storage area networks
TRANSCRIPT
in Storage Area Networks (SAN)
Solving congestion problems
Paresh Gupta, Technical Marketing Engineer
Ed Mazurek, Technical Leader, Services
Feb, 2015
numbers of Apps in 2014 over last year
The worlds that we live in
data growth by 2020
The worlds that we live in
2xnumbers of Apps in 2014 10x
data growth by 2020
1.5xIT professionals
The worlds that we live in
Highlight: HW Enhanced Slow Drain
Detection Troubleshooting Automatic Recovery
Immediate1 ms1 ms
Understanding Slow Drain
• B2B credits are not negotiated – just agreed to
• Each side informs the other side of the number of buffer credits it has
Fibre Channel Flow Control: B2B CreditsI have 1 RX B2B credit
FN
OK. I have 3 B2B credits B B B
B
Fibre Channel
Switch
F-Port has
three credits!
Storage disk
N-port
has one
credit!
• MDS Rx buffer queue is decremented by 1 B2B credit for each received frame
• R_RDY is sent to sender when buffer occupying frame is handled
• For each frame sent, R_RDY (B2B Credit) should be returned
• R_RDYs are not sent reliably – they can be corrupted/lost
Fibre Channel Flow Control: Traffic Flow
Storage disk
FN
B B B
BFrame1
R_RDY B
Fibre Channel
Switch
Frame2
Frame3
• Disk 1 sends frame to Server 1
• Switch 1 sends R_RDY after it transmits the frame to switch 2
• Switch 2 sends R_RDY after it transmits the frame to Server 1
• Server 1 sends R_RDY after frame is consumed by HBA
Lossless Fibre Channel fabric
Disk 1
B
BBBB
BBB
BBB
BBB
BBBB
Frame
Server 1
Switch 1 Switch 2
Frame
R_RDYR_RDYR_RDY
Frame
• Server 1 cannot process frames does not return R_RDY
• No available B2B credits on port connected to Server 1 and Disk 1
• No available B2B credits on ISL Ports
• Disk 1 stops transmitting fabric becomes lossless
Lossless Fibre Channel fabric
Disk 1
B
BBBB
BBB
BBB
BBB
BBBB
Frame
Server 1
Switch 1 Switch 2
Frame
Frame
Frame
FrameFrame
Frame
Frame
Frame
Frame
Frame
Frame
Frame
Frame
Frame
Frame
Frame
Frame
R_RDYBackPressureR_RDYBackPressureR_RDY
• B2B credits exhausted on ISL
• No R_RDY sent to Disk 1 as well as Disk 2
• Effect of ‘slow server 1’ on Flow Disk2-Server2
Slow Drain situation
Disk 1
B
BBBB
BBB
BBB
BBB
BBBB
Frame
Server 1
Switch 1 Switch 2
Frame
Frame
Frame
FrameFrame
Frame
Frame
Frame
Frame
Frame
Frame
Frame
Frame
Frame
Frame
Frame
Frame
Disk 2
B
B
BBB
BBB B
B
Server 2
R_RDYBackPressureR_RDYBackPressureR_RDY
Frame
FrameFrame
Frame
Frame
BackPressure
R_RDY
• One slow device impacts all other devices sharing same switches and ISL
Slow Drain situation
Disk 1
B
BBBB
BBB
BBB
BBB
BBBB
Frame
Server 1
Switch 1 Switch 2
Frame
Frame
Frame
FrameFrame
Frame
Frame
Frame
Frame
Frame
Frame
Frame
Frame
Frame
Frame
Frame
Frame
Disk 2
B
B
BBB
BBB B
B
Server 2
R_RDYBackPressureR_RDYBackPressureR_RDY
Frame
FrameFrame
Frame
Frame
BackPressure
R_RDY
Slow
Node
Impacted
NodesImpacted
Node
• Edge devices • Server performance problems: application or OS
• Host bus adapter (HBA) problems: driver or physical failure
• Speed mismatches: one fast device and one slow device
• Non-graceful virtual machine exit on a virtualized server, resulting in packets held in HBA buffers
• Storage subsystem performance problems, including overload
• Inter Switch Links (ISL) • Lack of B2B credits for the distance the ISL is traversing
• Ex: 4 credits per KM @ 8Gbps
• The existence of slow drain edge devices
• Edge devices with faster speeds than ISLs even when port-channeled
Reasons for Slow Drain
Cisco MDS Architecture
Line Card 2Line Card 1
Active Supervisor Arbiter
Fabric Module(XBAR)
Fabric Module(XBAR)
XBAR
interface
VOQ
P
o
r
t
P
o
r
t
Frame & credit processing in MDS switch
Cisco MDS
Initiator sends FC frame1
MDS receives frame in its entirely
and stored2
Frame transmitted to VOQ3
XBAR interface requests Arbiter for
grant to transmit frame to egress
port via XBAR
4
Arbiter grants request to XBAR
interface to forward frame – only
sent when egress port has buffer
space available
5
FC Frame is forwarded to XBAR
then R_RDY sent back since
buffer is now free
6
FC Frame is forwarded to egress line card
7
ASIC forwards frame to target8
Credit is returned to Arbiter9
ReqGrant
Frame
R_RDY Frame
Frame
Frame
credit
Line Card 2Line Card 1
Active Supervisor Arbiter
Fabric Module(XBAR)
Fabric Module(XBAR)
XBAR
interface
VOQ
P
o
r
t
P
o
r
t
Cisco MDS architecture advantage
Cisco MDS
Throughput & Latency
Consistent
performance at different
traffic loads & type
Predictable
by CRC checking at all
stages
Drops corrupt frame
non-blocking arbitrated
crossbar architecture
Never drops good frame
Under Congestion
Slow Drain Detection
• Credits unavailable on port for extended duration• Traffic does not flow at all• Separate counter is maintained for stuck ports
• Credits returned Slowly• Traffic does not flow at line rate• Counter is maintained if credits unavailable for 100 ms
Slow & Stuck Port
BBB
BBFrame
MDS
Frame
Frame
FrameFrame
FrameFrameFrameFrame
Slow Port
Stuck Port
R_RDY
Slow Drain Troubleshooting
• Credits: Number agreed initially• Remaining credit: Dynamic counter, these many frames can still be sent
• Increments counter whenever port hits zero credits• Maintained as Hardware statistic
Credit availability
BBB
BBFrame
MDS
Frame
Frame
FrameFrame
FrameFrameFrameFrame
Credit transition to zero
Credit and remaining credit
R_RDY
• Real time display of frames in ingress queues
• Display key info of the frame dropped due to timeout• Like Source FCID (SID), Destination FCID(DID), etc
Frame information
BBB
BBFrame
MDS
Frame
Frame
FrameFrame
FrameFrameFrameFrame
Display dropped frame info
Display frames in ingress queue
R_RDY
• Each LineCard logs events to an NVRAM buffer• Events are timestamped
On Board Failure Logging (OBFL)
MDS
Line Card 1
Line Card 2
OBFL - NVRAMError-statsFlow-controlTimeoutsRequest-timeouts
OBFL - NVRAMError-statsFlow-controlTimeoutsRequest-timeouts
DCNM Enhancements
DCNM Slow Drain EnhancementsAutomates Troubleshooting• Collects the whole fabric at once
Automates Collection• From hours of collection to minutes
Reduces False Positives• prioritizing ports highest severity counters
Shows Fluctuations in counters• Graphs counters • Enables user to zero in on specific counters
Slow Drain Automatic Recovery
• On receiving LR, checks if input
buffers are empty
• If input buffers are not empty in 90ms
the “LR Rcvd B2B” condition occurs &
the link fails with reason “Link failure
Link Reset failed nonempty Recv
queue”
• Indication of upstream congestion
• Credits unavailable• F Port : 1 second
• E Port : 1.5 second• Transmits Link Reset (LR)• If Link Reset Response(LRR) is
received, replenish credits
• If not received, Port failure
• Increment Counter
Stuck Port Recovery
BBB
BBB
BBBB
BBBB
MDS1 MDS2
Frame
FrameFrameFrame
FrameFrameFrame
FrameFrameFrameFrame
FrameFrameFrame
BBB
BBBR_RDYFrame
FrameFrame
Receiving PortTransmitting Port
Congestion Drop
• MDS timestamps each received frame
• Frame is dropped if cannot be delivered to the egress port within timeout
• Logging is done
• Can be configured 100ms-500ms (500ms default)
• Lowering will timeout frames quicker and reduce effects of slow drain devices
BBB
BBBB
MDS
Frame
Frame
Frame
Frame
Frame
Frame
Frame
BBBFrame
Frame
Frame
no-credit-drop
Disk 1
B
BBBB
BBB
BBB
BBB
BBBB
Frame
Server 1
MDS 1 MDS 2
Frame
Frame
Frame
FrameFrame
Frame
Frame
Frame
Frame
Frame
Frame
Frame
Frame
Frame
Frame
Frame
Frame
Disk 2
B
B
BBB
BBB B
B
Server 2
R_RDYR_RDY
Frame
Frame
Frame
Frame
Frame
R_RDY
Drop frames from egress queue
of Slow Port
BackPressure
ReleasedBackPressure
Released
BackPressure
Released
• Frames dropped in egress queue if credits unavailable for no-credit-drop timeout
HW Enhancements
HW + SW Slow Drain Support
(6.2(9) onwards)
MDS 9148S
MDS 9250i
Supported Software and Hardware
48 Port 16G FC Line Card (DS-X9448-768K9)
MDS 9710 MDS 9706
SW (Only) Slow Drain Support
MDS9222i MDS9148
MDS9513 MDS9509
MDS9506
32/48 Port 8G FC Line Card
HW Assistance Explained
Control Plane
Data Plane
100 ms
Polling
Software Based
Detection
Action
HW Assistance
Action
Detection
no-credit-drop : HW Assistance
Detection Range: 1-500 ms instead of 100-500 ms
Devices slower than 100ms handled
Reduced traffic drop at high speeds
Granularity: Reduced from 100 to 1ms
Enhanced precision
Any value from 1 to 500 ms, (earlier: 100, 200, etc. Now: 101, 102, etc ms)
No missed transient conditions!
no-credit-drop Action: Immediate (ns)
Up to 99ms of early Action!
Recovery from no-credit-drop condition: Immediate (ns)
Up to 99ms of early Recovery!
at least
60%incremental performance
Slow Port Monitoring
Shows real time delay of R_RDY
Monitoring done at 1ms
Mds9706# show process creditmon slowport-monitor-events
Module: 01 Slowport Detected: YES
=====================================================================
Interface = fc1/18
------------------------------------------------------------
| admin | slowport | oper | Timestamp
| delay | detection | delay |
| (ms) | count | (ms) |
------------------------------------------------------------
| 1 | 0 | 9 | Wed Jul 2 19:47:35.038 2014
| 1 | 128 | 9 | Wed Jul 2 19:47:19.922 2014
| 1 | 127 | 4 | Wed Jul 2 19:47:19.618 2014
| 1 | 119 | 10 | Wed Jul 2 19:47:19.518 2014
| 1 | 109 | 10 | Wed Jul 2 19:47:19.418 2014
| 1 | 101 | 10 | Wed Jul 2 19:47:19.318 2014
| 1 | 100 | 4 | Wed Jul 2 19:47:19.118 2014
| 1 | 93 | 10 | Wed Jul 2 19:47:19.017 2014
| 1 | 83 | 10 | Wed Jul 2 19:47:18.917 2014
| 1 | 74 | 12 | Wed Jul 2 19:47:18.818 2014
Configured Delay via
slow-port-monitor
Number of times the
delay was detected.
Actual Delay seen by
the port
Timestamp of last 10
times when the delay
was observed
Done in Hardware. No overhead on CPU
Recommendation: Always Turn it on!
Cisco recommends troubleshooting slow drain in the following order
Methodology
34
Level 3: Extreme Delay
Level 2: Retransmission
Level 1: Latency
Troubleshooting Slow Drain
• If Rx congestion then find ports communicating with this port that have Tx congestion
• Zoning defines which devices communicate with this port
• Understand topology
• If port communicating with port showing Rx congestion is FCIP
• Check for TCP retransmits
• Check for overutilization of FCIP
35
F E
Rx Credits
0 Remaining
Tx Credits
0 Remaining
Congestion
Methodology – Follow Congestion to Source
Troubleshooting Slow Drain
• If Tx congestion found
• If F port then device attached is slow drain device, if not;
• If E port then go to adjacent switch and continue troubleshooting
• Continue to track through the fabric until destination F-port is discovered
36
E EF F
Rx Credits
0 Remaining
Tx Credits
0 RemainingCongestion
Methodology – Follow Congestion to Source
Troubleshooting Slow Drain
Port monitoring
MDS
Event • Generate Alarms
• Flap Port
• Error disable port
• Port-monitor sends SNMP alerts and also take portguard action
• Adding portguard to errdisable or flap a port can help the switch automatically
recover problems
• Should be done on access(F) ports only
• Use separate access(F) and trunk(E) policies
• Warning: Currently access (F) ports include F port-channels and trunks. Consequently, portguard actions should be avoided on these switches.
• System timeout congestion-drop and no-credit-drop should also be considered
38
Port-monitor portguard
Port-monitor Alerting and Action
39
port-monitor name AllPorts
port-type all
no monitor counter link-loss
no monitor counter sync-loss
no monitor counter signal-loss
no monitor counter invalid-words
no monitor counter invalid-crc
counter tx-discards poll-interval 60 delta rising-threshold 50 event 4 falling-threshold 10 event 4
counter lr-rx poll-interval 60 delta rising-threshold 5 event 4 falling-threshold 1 event 4
counter lr-tx poll-interval 60 delta rising-threshold 5 event 4 falling-threshold 1 event 4
counter timeout-discards poll-interval 60 delta rising-threshold 50 event 4 falling-threshold 10 event 4
counter credit-loss-reco poll-interval 60 delta rising-threshold 1 event 4 falling-threshold 0 event 4
counter tx-credit-not-available poll-interval 1 delta rising-threshold 10 event 4 falling-threshold 0 event 4
no monitor counter rx-datarate
no monitor counter tx-datarate
no monitor counter err-pkt-from-port
no monitor counter err-pkt-to-xbar
no monitor counter err-pkt-from-xbar
Policy applies to Access(F) and Trunk(E) ports
These counters are not monitored
Note: The above monitors 6 slow drain counters and does not monitor 10 others
Port-monitor alerting – Sample all ports policy
Port-monitor
9513(config)# port-monitor activate AllPorts
9513(config)# show port-monitor active
Policy Name : AllPorts
Admin status : Active
Oper status : Active
Port type : All Ports
---------------------------------------------------------------------------------------------------------
Counter Threshold Interval Rising Threshold event Falling Threshold event PMON Portguard
------- --------- -------- ---------------- ----- ------------------ ----- --------------
TX Discards Delta 60 50 4 10 4 Not enabled
LR RX Delta 60 5 4 1 4 Not enabled
LR TX Delta 60 5 4 1 4 Not enabled
Timeout Discards Delta 60 50 4 10 4 Not enabled
Credit Loss Reco Delta 60 1 4 0 4 Not enabled
TX Credit Not Available Delta 1 10 4 0 4 Not enabled
----------------------------------------------------------------------------------------------------------
40
All Ports port policy
Port-monitor alerting – Sample all ports policy
Port-monitor
• The following shows portguard to timeout-discards and credit-loss-reco and adjusts the rising-threshold up a bit:
port-monitor name AccessPorts
port-type access
no monitor counter link-loss
no monitor counter sync-loss
no monitor counter signal-loss
no monitor counter invalid-words
no monitor counter invalid-crc
counter tx-discards poll-interval 60 delta rising-threshold 50 event 4 falling-threshold 10 event 4
counter lr-rx poll-interval 60 delta rising-threshold 5 event 4 falling-threshold 1 event 4
counter lr-tx poll-interval 60 delta rising-threshold 5 event 4 falling-threshold 1 event 4
counter timeout-discards poll-interval 60 delta rising-threshold 60 event 4 falling-threshold 10 event 4 portguard errordisable
counter credit-loss-reco poll-interval 60 delta rising-threshold 4 event 4 falling-threshold 0 event 4 portguard errordisable
counter tx-credit-not-available poll-interval 1 delta rising-threshold 10 event 4 falling-threshold 0 event 4
no monitor counter rx-datarate
no monitor counter tx-datarate
no monitor counter err-pkt-from-port
no monitor counter err-pkt-to-xbar
no monitor counter err-pkt-from-xbar
41
Error disable the port when 60 timeout-discards happen in 60 seconds
Error disable the port when 4 credit loss recovery events occur in 60 seconds
Access(F) port policy
Port-monitor portguard – Sample access (F) port policy
Port-monitor
42
port-monitor name ISLPorts
port-type trunks
no monitor counter link-loss
no monitor counter sync-loss
no monitor counter signal-loss
no monitor counter invalid-words
no monitor counter invalid-crc
counter tx-discards poll-interval 60 delta rising-threshold 100 event 4 falling-threshold 10 event 4
counter lr-rx poll-interval 60 delta rising-threshold 5 event 4 falling-threshold 1 event 4
counter lr-tx poll-interval 60 delta rising-threshold 5 event 4 falling-threshold 1 event 4
counter timeout-discards poll-interval 60 delta rising-threshold 100 event 4 falling-threshold 10 event 4
counter credit-loss-reco poll-interval 60 delta rising-threshold 1 event 4 falling-threshold 0 event 4
counter tx-credit-not-available poll-interval 1 delta rising-threshold 10 event 4 falling-threshold 0 event 4
no monitor counter rx-datarate
no monitor counter tx-datarate
no monitor counter err-pkt-from-port
no monitor counter err-pkt-to-xbar
no monitor counter err-pkt-from-xbar
Trunk (E) port policy
Port-monitor portguard – Sample trunk (E) port policy
Port-monitor
MDS9513# show port-monitor active
Policy Name : ISLPorts
Admin status : Active
Oper status : Active
Port type : All Trunk Ports
---------------------------------------------------------------------------------------------------------
Counter Threshold Interval Rising Threshold event Falling Threshold event PMON Portguard
------- --------- -------- ---------------- ----- ------------------ ----- --------------
TX Discards Delta 60 100 4 10 4 Not enabled
LR RX Delta 60 5 4 1 4 Not enabled
LR TX Delta 60 5 4 1 4 Not enabled
Timeout Discards Delta 60 100 4 10 4 Not enabled
Credit Loss Reco Delta 60 1 4 0 4 Not enabled
TX Credit Not Available Delta 1 10 4 0 4 Not enabled
----------------------------------------------------------------------------------------------------------
Policy Name : AccessPorts
Admin status : Active
Oper status : Active
Port type : All Access Ports
---------------------------------------------------------------------------------------------------------
Counter Threshold Interval Rising Threshold event Falling Threshold event PMON Portguard
------- --------- -------- ---------------- ----- ------------------ ----- --------------
TX Discards Delta 60 50 4 10 4 Not enabled
LR RX Delta 60 5 4 1 4 Not enabled
LR TX Delta 60 5 4 1 4 Not enabled
Timeout Discards Delta 60 60 4 10 4 Error Disable
Credit Loss Reco Delta 60 4 4 0 4 Error Disable
TX Credit Not Available Delta 1 10 4 0 4 Not enabled
----------------------------------------------------------------------------------------------------------
43
Both policies active
Port-monitor portguard – both policies when activated
Port-monitor
44
DCNM event log
Port-monitor
• Creditmon is a process that runs periodically in each linecard
• It checks for transmit credits at zero
• F Port at 0 Tx credits for 1 second
• E Port at 0 Tx credits for 1.5 seconds
• Credit loss recovery invoked
• Can occur due to faulty hardware in the connection to the device
• Frames dropped due to errors(CRC, etc.)
• No credits returned for corrupted frames – this eventually causes repeated credit loss
45
0 sec --
1/1.5 sec --
No Credits (Stuck)
LRR
LR
+60ms --
credit
Port resumes
normal operation
Stuck Port / Credit Loss Due to Bad Physical Connection
Case Study 1
• Counters are polled every 20 seconds
• When counter value changes it is included
• Several different counters are included in error-stats:
• Timeout drops
• Credit loss recovery
• Tx/Rx credit not available(100ms)
• Force timeout on/off
mds9710-2# show logging onboard error-stats
----------------------------
Module: 1
----------------------------
--------------------------------------------------------------------------------
ERROR STATISTICS INFORMATION FOR DEVICE DEVICE: FCMAC
--------------------------------------------------------------------------------
Interface | | | Time Stamp
Range | Error Stat Counter Name | Count |MM/DD/YY HH:MM:SS
| | |
--------------------------------------------------------------------------------
fc1/13 |F16_TMM_TOLB_TIMEOUT_DROP_CNT |242618 |04/14/14 12:17:58
fc1/13 |FCP_SW_CNTR_TX_WT_AVG_B2B_ZERO |124 |04/14/14 12:17:58
fc1/13 |FCP_SW_CNTR_CREDIT_LOSS |124 |04/14/14 12:17:58
fc1/13 |F16_TMM_TOLB_TIMEOUT_DROP_CNT |201650 |04/14/14 12:17:38
fc1/13 |FCP_SW_CNTR_TX_WT_AVG_B2B_ZERO |108 |04/14/14 12:17:38
fc1/13 |FCP_SW_CNTR_CREDIT_LOSS |107 |04/14/14 12:17:38
46
Show logging onboard starttime <mm-dd-yy-00:00:00> error-stats
Troubleshooting
• Hosts are reporting latency/errors
• First notice timeout drops(Tx) occurring on storage edge switch
• Use show logging onboard starttime<date-time> error-stats
• Follow Tx congestion to core switches
• Follow Tx congestion to Host edge switch
• Follow Tx congestion to offending host
47
Storage edge
Core#1 Core#2
Host edge
Timeout
drops
Host
ISLs on edge switch dropping frames
Case Study 2
Summary
Line Card 2Line Card 1
Active Supervisor Arbiter
Fabric Module(XBAR)
Fabric Module(XBAR)
XBAR
interface
VOQ
P
o
r
t
P
o
r
t
Cisco MDS architecture advantage
Cisco MDS
Throughput & Latency
Consistent
performance at different
traffic loads & type
Predictable
by CRC checking at all
stages
Drop corrupt frames
non-blocking arbitrated
crossbar architecture
Never drops good frame
Under Congestion
MDS, Nexus & DCNM Slow Drain Advantage
Detection Troubleshooting Automatic Recovery
Slow Port
Stuck Port
Slow Port Monitoring
Credit transition to zero
Credit and remaining credit
Info of dropped frames
See frames in ingress Q
OBFL logging
Port Monitoring
Virtual Output queues
Stuck Port Recovery
LR Rcvd B2B
Congestion drop
No-credit-dropDCNMFabric wide visibility
Automatic collection and graphical display of counters
Reduced false positives
HW Assisted
HW Assisted
Detection
1 ms
Action
Immediate
Key Takeaways
Cisco MDS, Nexus & DCNM builds Self Healing Fabrics
Resources
Cisco Live! San Diego June 7 – 11, 2015
BRKSAN-3446
SAN Congestion
Understanding, Troubleshooting, Mitigating in a Cisco Fabric
by
Ed Mazurek
• Understanding Slow Drain: Detection, Troubleshooting & Automatic Recovery: https://www.youtube.com/watch?v=wEz3z6NLaBU&list=PL_ju2fKFbFzVMZgXAHV9kZ6FT93BuG0eB
• White Paper on “Slow Drain Device Detection and Congestion Avoidance” at http://www.cisco.com/c/en/us/products/collateral/storage-networking/mds-9700-series-multilayer-directors/white_paper_c11-729444.html
• Cisco Live Session: BRKSAN-3446 by Ed Mazurek on “MDS 9500 9710 Understanding Detecting Troubleshooting Mitigating Slow Drain in a Cisco Fabric” https://www.ciscolive.com/online/connect/sessionDetail.ww?SESSION_ID=78677&backBtn=true
• Generation 4 Slow Drain Counters commands and troubleshooting: http://www.cisco.com/c/en/us/support/docs/storage-networking/mds-9509-multilayer-director/116098-trouble-gen4-00.html
Slow Drain Reference