solving congestion problems in storage area networks

in Storage Area Networks (SAN)

Solving congestion problems

Paresh Gupta, Technical Marketing Engineer

Ed Mazurek, Technical Leader, Services

Feb, 2015

numbers of Apps in 2014 over last year

The worlds that we live in

data growth by 2020


2xnumbers of Apps in 2014 10x

data growth by 2020

1.5xIT professionals


Highlight: HW Enhanced Slow Drain

Detection Troubleshooting Automatic Recovery

Immediate1 ms1 ms

Understanding Slow Drain

• B2B credits are not negotiated – just agreed to

• Each side informs the other side of the number of buffer credits it has

Fibre Channel Flow Control: B2B CreditsI have 1 RX B2B credit

FN

OK. I have 3 B2B credits B B B

B

Fibre Channel

Switch

F-Port has

three credits!

Storage disk

N-port

has one

credit!

• MDS Rx buffer queue is decremented by 1 B2B credit for each received frame

• R_RDY is sent to sender when buffer occupying frame is handled

• For each frame sent, R_RDY (B2B Credit) should be returned

• R_RDYs are not sent reliably – they can be corrupted/lost

Fibre Channel Flow Control: Traffic Flow

Storage disk

FN

B B B

BFrame1

R_RDY B

Fibre Channel

Switch

Frame2

Frame3

• Disk 1 sends frame to Server 1

• Switch 1 sends R_RDY after it transmits the frame to switch 2

• Switch 2 sends R_RDY after it transmits the frame to Server 1

• Server 1 sends R_RDY after frame is consumed by HBA

Lossless Fibre Channel fabric

Disk 1

B

BBBB

BBB

BBB

BBB

BBBB

Frame

Server 1

Switch 1 Switch 2

Frame

R_RDYR_RDYR_RDY

Frame

• Server 1 cannot process frames does not return R_RDY

• No available B2B credits on port connected to Server 1 and Disk 1

• No available B2B credits on ISL Ports

• Disk 1 stops transmitting fabric becomes lossless

Lossless Fibre Channel fabric

Disk 1

B

BBBB

BBB

BBB

BBB

BBBB

Frame

Server 1

Switch 1 Switch 2

Frame

Frame

Frame

FrameFrame

Frame

Frame

Frame

Frame

Frame

Frame

Frame

Frame

Frame

Frame

Frame

Frame

R_RDYBackPressureR_RDYBackPressureR_RDY

• B2B credits exhausted on ISL

• No R_RDY sent to Disk 1 as well as Disk 2

• Effect of ‘slow server 1’ on Flow Disk2-Server2

Slow Drain situation

Disk 1

B

BBBB

BBB

BBB

BBB

BBBB

Frame

Server 1

Switch 1 Switch 2

Frame

Frame

Frame

FrameFrame

Frame

Frame

Frame

Frame

Frame

Frame

Frame

Frame

Frame

Frame

Frame

Frame

Disk 2

B

B

BBB

BBB B

B

Server 2


Frame

FrameFrame

Frame

Frame

BackPressure

R_RDY

• One slow device impacts all other devices sharing same switches and ISL

Slow Drain situation

Disk 1

B

BBBB

BBB

BBB

BBB

BBBB

Frame

Server 1

Switch 1 Switch 2

Frame

Frame

Frame

FrameFrame

Frame

Frame

Frame

Frame

Frame

Frame

Frame

Frame

Frame

Frame

Frame

Frame

Disk 2

B

B

BBB

BBB B

B

Server 2


Frame

FrameFrame

Frame

Frame

BackPressure

R_RDY

Slow

Node

Impacted

NodesImpacted

Node

• Edge devices • Server performance problems: application or OS

• Host bus adapter (HBA) problems: driver or physical failure

• Speed mismatches: one fast device and one slow device

• Non-graceful virtual machine exit on a virtualized server, resulting in packets held in HBA buffers

• Storage subsystem performance problems, including overload

• Inter Switch Links (ISL) • Lack of B2B credits for the distance the ISL is traversing

• Ex: 4 credits per KM @ 8Gbps

• The existence of slow drain edge devices

• Edge devices with faster speeds than ISLs even when port-channeled

Reasons for Slow Drain

Cisco MDS Architecture

Line Card 2Line Card 1

Active Supervisor Arbiter

Fabric Module(XBAR)

Fabric Module(XBAR)

XBAR

interface

VOQ

P

o

r

t

P

o

r

t

Frame & credit processing in MDS switch

Cisco MDS

Initiator sends FC frame1

MDS receives frame in its entirely

and stored2

Frame transmitted to VOQ3

XBAR interface requests Arbiter for

grant to transmit frame to egress

port via XBAR

4

Arbiter grants request to XBAR

interface to forward frame – only

sent when egress port has buffer

space available

5

FC Frame is forwarded to XBAR

then R_RDY sent back since

buffer is now free

6

FC Frame is forwarded to egress line card

7

ASIC forwards frame to target8

Credit is returned to Arbiter9

ReqGrant

Frame

R_RDY Frame

Frame

Frame

credit



Fabric Module(XBAR)

Fabric Module(XBAR)

XBAR

interface

VOQ

P

o

r

t

P

o

r

t

Cisco MDS architecture advantage

Cisco MDS

Throughput & Latency

Consistent

performance at different

traffic loads & type

Predictable

by CRC checking at all

stages

Drops corrupt frame

non-blocking arbitrated

crossbar architecture

Never drops good frame

Under Congestion

Slow Drain Detection

• Credits unavailable on port for extended duration• Traffic does not flow at all• Separate counter is maintained for stuck ports

• Credits returned Slowly• Traffic does not flow at line rate• Counter is maintained if credits unavailable for 100 ms

Slow & Stuck Port

BBB

BBFrame

MDS

Frame

Frame

FrameFrame

FrameFrameFrameFrame

Slow Port

Stuck Port

R_RDY

Slow Drain Troubleshooting

• Credits: Number agreed initially• Remaining credit: Dynamic counter, these many frames can still be sent

• Increments counter whenever port hits zero credits• Maintained as Hardware statistic

Credit availability

BBB

BBFrame

MDS

Frame

Frame

FrameFrame


Credit transition to zero

Credit and remaining credit

R_RDY

• Real time display of frames in ingress queues

• Display key info of the frame dropped due to timeout• Like Source FCID (SID), Destination FCID(DID), etc

Frame information

BBB

BBFrame

MDS

Frame

Frame

FrameFrame


Display dropped frame info

Display frames in ingress queue

R_RDY

• Each LineCard logs events to an NVRAM buffer• Events are timestamped

On Board Failure Logging (OBFL)

MDS

Line Card 1

Line Card 2

OBFL - NVRAMError-statsFlow-controlTimeoutsRequest-timeouts

OBFL - NVRAMError-statsFlow-controlTimeoutsRequest-timeouts

DCNM Enhancements

DCNM Slow Drain EnhancementsAutomates Troubleshooting• Collects the whole fabric at once

Automates Collection• From hours of collection to minutes

Reduces False Positives• prioritizing ports highest severity counters

Shows Fluctuations in counters• Graphs counters • Enables user to zero in on specific counters

Slow Drain Automatic Recovery

• On receiving LR, checks if input

buffers are empty

• If input buffers are not empty in 90ms

the “LR Rcvd B2B” condition occurs &

the link fails with reason “Link failure

Link Reset failed nonempty Recv

queue”

• Indication of upstream congestion

• Credits unavailable• F Port : 1 second

• E Port : 1.5 second• Transmits Link Reset (LR)• If Link Reset Response(LRR) is

received, replenish credits

• If not received, Port failure

• Increment Counter

Stuck Port Recovery

BBB

BBB

BBBB

BBBB

MDS1 MDS2

Frame

FrameFrameFrame

FrameFrameFrame


FrameFrameFrame

BBB

BBBR_RDYFrame

FrameFrame

Receiving PortTransmitting Port

Congestion Drop

• MDS timestamps each received frame

• Frame is dropped if cannot be delivered to the egress port within timeout

• Logging is done

• Can be configured 100ms-500ms (500ms default)

• Lowering will timeout frames quicker and reduce effects of slow drain devices

BBB

BBBB

MDS

Frame

Frame

Frame

Frame

Frame

Frame

Frame

BBBFrame

Frame

Frame

no-credit-drop

Disk 1

B

BBBB

BBB

BBB

BBB

BBBB

Frame

Server 1

MDS 1 MDS 2

Frame

Frame

Frame

FrameFrame

Frame

Frame

Frame

Frame

Frame

Frame

Frame

Frame

Frame

Frame

Frame

Frame

Disk 2

B

B

BBB

BBB B

B

Server 2

R_RDYR_RDY

Frame

Frame

Frame

Frame

Frame

R_RDY

Drop frames from egress queue

of Slow Port

BackPressure

ReleasedBackPressure

Released

BackPressure

Released

• Frames dropped in egress queue if credits unavailable for no-credit-drop timeout

HW Enhancements

HW + SW Slow Drain Support

(6.2(9) onwards)

MDS 9148S

MDS 9250i

Supported Software and Hardware

48 Port 16G FC Line Card (DS-X9448-768K9)

MDS 9710 MDS 9706

SW (Only) Slow Drain Support

MDS9222i MDS9148

MDS9513 MDS9509

MDS9506

32/48 Port 8G FC Line Card

HW Assistance Explained

Control Plane

Data Plane

100 ms

Polling

Software Based

Detection

Action

HW Assistance

Action

Detection

no-credit-drop : HW Assistance

Detection Range: 1-500 ms instead of 100-500 ms

Devices slower than 100ms handled

Reduced traffic drop at high speeds

Granularity: Reduced from 100 to 1ms

Enhanced precision

Any value from 1 to 500 ms, (earlier: 100, 200, etc. Now: 101, 102, etc ms)

No missed transient conditions!

no-credit-drop Action: Immediate (ns)

Up to 99ms of early Action!

Recovery from no-credit-drop condition: Immediate (ns)

Up to 99ms of early Recovery!

at least

60%incremental performance

Slow Port Monitoring

Shows real time delay of R_RDY

Monitoring done at 1ms

Mds9706# show process creditmon slowport-monitor-events

Module: 01 Slowport Detected: YES

=====================================================================

Interface = fc1/18

------------------------------------------------------------

| admin | slowport | oper | Timestamp

| delay | detection | delay |

| (ms) | count | (ms) |

------------------------------------------------------------

| 1 | 0 | 9 | Wed Jul 2 19:47:35.038 2014

| 1 | 128 | 9 | Wed Jul 2 19:47:19.922 2014

| 1 | 127 | 4 | Wed Jul 2 19:47:19.618 2014

| 1 | 119 | 10 | Wed Jul 2 19:47:19.518 2014

| 1 | 109 | 10 | Wed Jul 2 19:47:19.418 2014

| 1 | 101 | 10 | Wed Jul 2 19:47:19.318 2014

| 1 | 100 | 4 | Wed Jul 2 19:47:19.118 2014

| 1 | 93 | 10 | Wed Jul 2 19:47:19.017 2014

| 1 | 83 | 10 | Wed Jul 2 19:47:18.917 2014

| 1 | 74 | 12 | Wed Jul 2 19:47:18.818 2014

Configured Delay via

slow-port-monitor

Number of times the

delay was detected.

Actual Delay seen by

the port

Timestamp of last 10

times when the delay

was observed

Done in Hardware. No overhead on CPU

Recommendation: Always Turn it on!

Cisco recommends troubleshooting slow drain in the following order

Methodology

34

Level 3: Extreme Delay

Level 2: Retransmission

Level 1: Latency

Troubleshooting Slow Drain

• If Rx congestion then find ports communicating with this port that have Tx congestion

• Zoning defines which devices communicate with this port

• Understand topology

• If port communicating with port showing Rx congestion is FCIP

• Check for TCP retransmits

• Check for overutilization of FCIP

35

F E

Rx Credits

0 Remaining

Tx Credits

0 Remaining

Congestion

Methodology – Follow Congestion to Source


• If Tx congestion found

• If F port then device attached is slow drain device, if not;

• If E port then go to adjacent switch and continue troubleshooting

• Continue to track through the fabric until destination F-port is discovered

36

E EF F

Rx Credits

0 Remaining

Tx Credits

0 RemainingCongestion

Methodology – Follow Congestion to Source


Port monitoring

MDS

Event • Generate Alarms

• Flap Port

• Error disable port

• Port-monitor sends SNMP alerts and also take portguard action

• Adding portguard to errdisable or flap a port can help the switch automatically

recover problems

• Should be done on access(F) ports only

• Use separate access(F) and trunk(E) policies

• Warning: Currently access (F) ports include F port-channels and trunks. Consequently, portguard actions should be avoided on these switches.

• System timeout congestion-drop and no-credit-drop should also be considered

38

Port-monitor portguard

Port-monitor Alerting and Action

39

port-monitor name AllPorts

port-type all

no monitor counter link-loss

no monitor counter sync-loss

no monitor counter signal-loss

no monitor counter invalid-words

no monitor counter invalid-crc

counter tx-discards poll-interval 60 delta rising-threshold 50 event 4 falling-threshold 10 event 4

counter lr-rx poll-interval 60 delta rising-threshold 5 event 4 falling-threshold 1 event 4

counter lr-tx poll-interval 60 delta rising-threshold 5 event 4 falling-threshold 1 event 4

counter timeout-discards poll-interval 60 delta rising-threshold 50 event 4 falling-threshold 10 event 4

counter credit-loss-reco poll-interval 60 delta rising-threshold 1 event 4 falling-threshold 0 event 4

counter tx-credit-not-available poll-interval 1 delta rising-threshold 10 event 4 falling-threshold 0 event 4

no monitor counter rx-datarate

no monitor counter tx-datarate

no monitor counter err-pkt-from-port

no monitor counter err-pkt-to-xbar

no monitor counter err-pkt-from-xbar

Policy applies to Access(F) and Trunk(E) ports

These counters are not monitored

Note: The above monitors 6 slow drain counters and does not monitor 10 others

Port-monitor alerting – Sample all ports policy

Port-monitor

9513(config)# port-monitor activate AllPorts

9513(config)# show port-monitor active

Policy Name : AllPorts

Admin status : Active

Oper status : Active

Port type : All Ports

---------------------------------------------------------------------------------------------------------

Counter Threshold Interval Rising Threshold event Falling Threshold event PMON Portguard

------- --------- -------- ---------------- ----- ------------------ ----- --------------

TX Discards Delta 60 50 4 10 4 Not enabled

LR RX Delta 60 5 4 1 4 Not enabled

LR TX Delta 60 5 4 1 4 Not enabled

Timeout Discards Delta 60 50 4 10 4 Not enabled

Credit Loss Reco Delta 60 1 4 0 4 Not enabled

TX Credit Not Available Delta 1 10 4 0 4 Not enabled

----------------------------------------------------------------------------------------------------------

40

All Ports port policy

Port-monitor alerting – Sample all ports policy

Port-monitor

• The following shows portguard to timeout-discards and credit-loss-reco and adjusts the rising-threshold up a bit:

port-monitor name AccessPorts

port-type access









counter timeout-discards poll-interval 60 delta rising-threshold 60 event 4 falling-threshold 10 event 4 portguard errordisable

counter credit-loss-reco poll-interval 60 delta rising-threshold 4 event 4 falling-threshold 0 event 4 portguard errordisable







41

Error disable the port when 60 timeout-discards happen in 60 seconds

Error disable the port when 4 credit loss recovery events occur in 60 seconds

Access(F) port policy

Port-monitor portguard – Sample access (F) port policy

Port-monitor

42

port-monitor name ISLPorts

port-type trunks









counter timeout-discards poll-interval 60 delta rising-threshold 100 event 4 falling-threshold 10 event 4

counter credit-loss-reco poll-interval 60 delta rising-threshold 1 event 4 falling-threshold 0 event 4







Trunk (E) port policy

Port-monitor portguard – Sample trunk (E) port policy

Port-monitor

MDS9513# show port-monitor active

Policy Name : ISLPorts



Port type : All Trunk Ports

---------------------------------------------------------------------------------------------------------


------- --------- -------- ---------------- ----- ------------------ ----- --------------




Timeout Discards Delta 60 100 4 10 4 Not enabled

Credit Loss Reco Delta 60 1 4 0 4 Not enabled


----------------------------------------------------------------------------------------------------------

Policy Name : AccessPorts



Port type : All Access Ports

---------------------------------------------------------------------------------------------------------


------- --------- -------- ---------------- ----- ------------------ ----- --------------




Timeout Discards Delta 60 60 4 10 4 Error Disable

Credit Loss Reco Delta 60 4 4 0 4 Error Disable


----------------------------------------------------------------------------------------------------------

43

Both policies active

Port-monitor portguard – both policies when activated

Port-monitor

44

DCNM event log

Port-monitor

• Creditmon is a process that runs periodically in each linecard

• It checks for transmit credits at zero

• F Port at 0 Tx credits for 1 second

• E Port at 0 Tx credits for 1.5 seconds

• Credit loss recovery invoked

• Can occur due to faulty hardware in the connection to the device

• Frames dropped due to errors(CRC, etc.)

• No credits returned for corrupted frames – this eventually causes repeated credit loss

45

0 sec --

1/1.5 sec --

No Credits (Stuck)

LRR

LR

+60ms --

credit

Port resumes

normal operation

Stuck Port / Credit Loss Due to Bad Physical Connection

Case Study 1

• Counters are polled every 20 seconds

• When counter value changes it is included

• Several different counters are included in error-stats:

• Timeout drops

• Credit loss recovery

• Tx/Rx credit not available(100ms)

• Force timeout on/off

mds9710-2# show logging onboard error-stats

----------------------------

Module: 1

----------------------------

--------------------------------------------------------------------------------

ERROR STATISTICS INFORMATION FOR DEVICE DEVICE: FCMAC

--------------------------------------------------------------------------------

Interface | | | Time Stamp

Range | Error Stat Counter Name | Count |MM/DD/YY HH:MM:SS

| | |

--------------------------------------------------------------------------------

fc1/13 |F16_TMM_TOLB_TIMEOUT_DROP_CNT |242618 |04/14/14 12:17:58

fc1/13 |FCP_SW_CNTR_TX_WT_AVG_B2B_ZERO |124 |04/14/14 12:17:58

fc1/13 |FCP_SW_CNTR_CREDIT_LOSS |124 |04/14/14 12:17:58

fc1/13 |F16_TMM_TOLB_TIMEOUT_DROP_CNT |201650 |04/14/14 12:17:38

fc1/13 |FCP_SW_CNTR_TX_WT_AVG_B2B_ZERO |108 |04/14/14 12:17:38

fc1/13 |FCP_SW_CNTR_CREDIT_LOSS |107 |04/14/14 12:17:38

46

Show logging onboard starttime <mm-dd-yy-00:00:00> error-stats

Troubleshooting

• Hosts are reporting latency/errors

• First notice timeout drops(Tx) occurring on storage edge switch

• Use show logging onboard starttime<date-time> error-stats

• Follow Tx congestion to core switches

• Follow Tx congestion to Host edge switch

• Follow Tx congestion to offending host

47

Storage edge

Core#1 Core#2

Host edge

Timeout

drops

Host

ISLs on edge switch dropping frames

Case Study 2

Summary



Fabric Module(XBAR)

Fabric Module(XBAR)

XBAR

interface

VOQ

P

o

r

t

P

o

r

t

Cisco MDS architecture advantage

Cisco MDS

Throughput & Latency

Consistent

performance at different

traffic loads & type

Predictable

by CRC checking at all

stages

Drop corrupt frames

non-blocking arbitrated

crossbar architecture

Never drops good frame

Under Congestion

MDS, Nexus & DCNM Slow Drain Advantage

Detection Troubleshooting Automatic Recovery

Slow Port

Stuck Port

Slow Port Monitoring

Credit transition to zero

Credit and remaining credit

Info of dropped frames

See frames in ingress Q

OBFL logging

Port Monitoring

Virtual Output queues

Stuck Port Recovery

LR Rcvd B2B

Congestion drop

No-credit-dropDCNMFabric wide visibility

Automatic collection and graphical display of counters

Reduced false positives

HW Assisted

HW Assisted

Detection

1 ms

Action

Immediate

Key Takeaways

Cisco MDS, Nexus & DCNM builds Self Healing Fabrics

Resources

Cisco Live! San Diego June 7 – 11, 2015

BRKSAN-3446

SAN Congestion

Understanding, Troubleshooting, Mitigating in a Cisco Fabric

by

Ed Mazurek

• Understanding Slow Drain: Detection, Troubleshooting & Automatic Recovery: https://www.youtube.com/watch?v=wEz3z6NLaBU&list=PL_ju2fKFbFzVMZgXAHV9kZ6FT93BuG0eB

• White Paper on “Slow Drain Device Detection and Congestion Avoidance” at http://www.cisco.com/c/en/us/products/collateral/storage-networking/mds-9700-series-multilayer-directors/white_paper_c11-729444.html

• Cisco Live Session: BRKSAN-3446 by Ed Mazurek on “MDS 9500 9710 Understanding Detecting Troubleshooting Mitigating Slow Drain in a Cisco Fabric” https://www.ciscolive.com/online/connect/sessionDetail.ww?SESSION_ID=78677&backBtn=true

• Generation 4 Slow Drain Counters commands and troubleshooting: http://www.cisco.com/c/en/us/support/docs/storage-networking/mds-9509-multilayer-director/116098-trouble-gen4-00.html

Slow Drain Reference

https://www.youtube.com/watch?v=wEz3z6NLaBU&list=PL_ju2fKFbFzVMZgXAHV9kZ6FT93BuG0eB

http://www.cisco.com/c/en/us/products/collateral/storage-networking/mds-9700-series-multilayer-directors/white_paper_c11-729444.html

https://www.ciscolive.com/online/connect/sessionDetail.ww?SESSION_ID=78677&backBtn=true

http://www.cisco.com/c/en/us/support/docs/storage-networking/mds-9509-multilayer-director/116098-trouble-gen4-00.html

solving congestion problems in storage area networks

Technology

cisco andor

cisco publichighlight

cisco publicb2b credits

cisco mds switches

cisco publicthe world

cisco publicevery device

cisco publicread blog

number of buffer credits