ibm san c-type update · ibm san c-type directors –performance • industry’s first 1.5 tbps...

56
Paresh Gupta Technical Marketing Engineer, Cisco September 2019 IBM SAN c-type update Integrated analytics, Investment protection, High performance

Upload: others

Post on 03-Jul-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: IBM SAN c-type update · IBM SAN c-type directors –Performance • Industry’s first 1.5 Tbps per slot switching capacity directors • Non-blocking, Non-Oversubscribed architecture

Paresh GuptaTechnical Marketing Engineer, CiscoSeptember 2019

IBM SAN c-type updateIntegrated analytics, Investment protection, High performance

Page 2: IBM SAN c-type update · IBM SAN c-type directors –Performance • Industry’s first 1.5 Tbps per slot switching capacity directors • Non-blocking, Non-Oversubscribed architecture

Washington Systems Center - Storage

© Copyright IBM Corporation 2019 Accelerate with IBM Storage

Accelerate with IBM Storage Webinars

The Free IBM Storage Technical Webinar Series Continues in 2019...

Washington Systems Center – Storage experts cover a variety of technical topics.

Audience: Clients who have or are considering acquiring IBM Storage solutions. Business

Partners and IBMers are also welcome.

To automatically receive announcements of upcoming Accelerate with IBM Storage webinars,

Clients, Business Partners and IBMers are welcome to send an email request to accelerate-

[email protected].

Located in the Accelerate with IBM Storage Blog:

https://www.ibm.com/developerworks/mydeveloperworks/blogs/accelerate/?lang=en

Also, check out the WSC YouTube Channel here:

https://www.youtube.com/channel/UCNuks0go01_ZrVVF1jgOD6Q

2019 Upcoming Webinars:

September 24 – IBM Storage SAN c-type Update

Register Here: https://ibm.webex.com/ibm/onstage/g.php?MTID=e13d04c15a11db836f299d2c1c6f19898

November 14 - IBM DS8900F R9.0 Update+

Register Here: https://ibm.webex.com/ibm/onstage/g.php?MTID=e589f9c2605331da2bba1df64d09aca57

Page 3: IBM SAN c-type update · IBM SAN c-type directors –Performance • Industry’s first 1.5 Tbps per slot switching capacity directors • Non-blocking, Non-Oversubscribed architecture

Visibility and Analytics

Inline visibility & analytics

NVMe/FC visibility & analytics

SAN Insights

Operations and Efficiency

Congestion detection at

2.5 µs

Congestion recovery at 1

ms

Congestion recovery –

Virtual Links

Automatic zoning via Autozone

HBA & Link diagnostics

ManagementUCS FI

visibility in DCNM

VM visibility in DCNM

HTML5 GUI on DCNM

Fabric-wide slow drain

analysis

Create/assign storage – DCNM

Connect

Slow Drain topology analysis

Simplicity and Automation

Switch native RESTful API

On-switch Python

Quick Provisioning

via POAP

Quick Provisioning

via USB

Storage Ansible

modules

Speed and Performance

16G FC40 Gbps

FCoE32G FC

32G ready directors

64G ready directors

2019

IBM SAN c-type Technology Leadership

2013 2014 2015 2016 2017 2018Not an exhaustive list. Only major and/or first-in-the-industry features mentioned

Page 4: IBM SAN c-type update · IBM SAN c-type directors –Performance • Industry’s first 1.5 Tbps per slot switching capacity directors • Non-blocking, Non-Oversubscribed architecture

IBM Storage Networking c-type OverviewSAN Directors

Fabric Switches

Director Modules

9RU

SAN384C-6 SAN192C-6

14RU

26RU

SAN768C-64 slots8 slots16 slots

24 x 16G FC, 8 x 1/10 GE & 2 x 40 GE

48 x 32 G FC

SAN50C-R

SAN32C-6 SAN48C-6

SAN96C-6

SAN Analytics

FCIP

NVMe/FC

FICON

Line-rate, Non-blocking, Non-oversubscribed, since 2013

Page 5: IBM SAN c-type update · IBM SAN c-type directors –Performance • Industry’s first 1.5 Tbps per slot switching capacity directors • Non-blocking, Non-Oversubscribed architecture

IBM SAN c-type Directors

SAN384C-6Up to 384 line-rate ports

SAN192C-6Up to 192 line-rate ports

SAN768C-6Up to 768 line-rate ports

• All the 3 models have similar architecture

• Port-modules and power supplies can be shared

• XBAR and fan-trays can’t be shared due to different form-factor

• Supervisor can be shared between SAN384C-6 and SAN192C-6

Page 6: IBM SAN c-type update · IBM SAN c-type directors –Performance • Industry’s first 1.5 Tbps per slot switching capacity directors • Non-blocking, Non-Oversubscribed architecture

IBM SAN c-type directors

Page 7: IBM SAN c-type update · IBM SAN c-type directors –Performance • Industry’s first 1.5 Tbps per slot switching capacity directors • Non-blocking, Non-Oversubscribed architecture

IBM SAN c-type directors – Performance• Industry’s first 1.5 Tbps per slot switching capacity directors

• Non-blocking, Non-Oversubscribed architecture

• All ports are line rate, without dependency & restrictions of local-switching

• Consistent and Predictable performance across all ports

• Fabric modules (Crossbar or XBAR) provide data switching between two ports (inter & intra slot)

• XBAR are inserted from rear of chassis

Number of Fabric Cards

Front Panel FC Bandwidth/SlotFront Panel FCoE Bandwidth/Slot

1 256 Gbps 220 Gbps

2 512 Gbps 440 Gbps

3 768 Gbps 660 Gbps

4 1024 Gbps 880 Gbps

5 1280 Gbps 1100 Gbps

6 1536 Gbps 1320 Gbps

Rear view of SAN192C-6

Page 8: IBM SAN c-type update · IBM SAN c-type directors –Performance • Industry’s first 1.5 Tbps per slot switching capacity directors • Non-blocking, Non-Oversubscribed architecture

Industry’s 1st 64G directors

• Invest in the future today

• 16G and 32G speeds already delivered in the same chassis. Getting ready for next-gen

• Next-gen XBAR and Supervisors (Sup-4 and Fab-3)

• All upgrades are non-disruptive without any forklift

Number of Fab3 Front Panel FC Bandwidth/Slot

1 512 Gbps

2 1024 Gbps

3 1536 Gbps

4 2048 Gbps

5 2560 Gbps

6 3072 Gbps

Rear view of SAN192C-6

Page 9: IBM SAN c-type update · IBM SAN c-type directors –Performance • Industry’s first 1.5 Tbps per slot switching capacity directors • Non-blocking, Non-Oversubscribed architecture

32G FC Module

• On-board engine for storage traffic visibility

• All ports are quad-rate – 4/8/16/32G FC

• 500 B2B credits per port by default. Up to 8191 B2B credits per port (with enterprise license)

• Optics are tri-rate

• 32G optics for 8/16/32G FC

• 16G optics for 4/8/16G FC

• 8G optics for 4/8G FC

• Intelligent features like VSAN, IVR, FC Redirect, etc.

Port-Group 1 Port-Group 2 Port-Group 3

Page 10: IBM SAN c-type update · IBM SAN c-type directors –Performance • Industry’s first 1.5 Tbps per slot switching capacity directors • Non-blocking, Non-Oversubscribed architecture

• Front end port – 24 x 2/4/8/10/16G FC, 8 x 1/10 GE IPS, 2 x 40GE IPS

• Hardware based encryption and compression of FCIP traffic

• All FCIP ports are line rate

• Two internal FCIP engines, each serves 4 x 1/10 GbE and 1 x 40 GbE ports

• No additional license for FCIP. All ports and FCIP engines are enabled by default

24/10 SAN Extension Module

24 x 16G FC ports 8 x 1/10 GbE 2 x 40 GbE*

Shared FCIP Engine

Shared FCIP Engine

* Future support

Page 11: IBM SAN c-type update · IBM SAN c-type directors –Performance • Industry’s first 1.5 Tbps per slot switching capacity directors • Non-blocking, Non-Oversubscribed architecture

IBM SAN c-type – Designed for non-stop operations

Upgrade software and hardware without any impact to switch operations

Optional N+1 and N+NXBAR redundancy

N+N power grid redundancy

Redundant FAN trays, redundant FANs inside a

single fan tray

Standby supervisor takes over if active fails, without any

impact to switch operationsSupervisor redundancy

XBAR redundancy

Redundant components

Non-disruptiveupgrades

Mission CriticalDirectors

Page 12: IBM SAN c-type update · IBM SAN c-type directors –Performance • Industry’s first 1.5 Tbps per slot switching capacity directors • Non-blocking, Non-Oversubscribed architecture

Port-channels included in base license

High bandwidth and increased resiliency

Member interfaces can be spread across port-groups and slots

IBM SAN c-type – Fabric reliability with port-channelsPort-Channels on IBM c-type directors

No restrictions on port-channel members

High scale up to 16 members in a single port-channel

No extra license for port-channels

Page 13: IBM SAN c-type update · IBM SAN c-type directors –Performance • Industry’s first 1.5 Tbps per slot switching capacity directors • Non-blocking, Non-Oversubscribed architecture

SAN384C-6 Port-channel member links example

10

9

12

11

2

1

4

3

14

13

16

15

6

5

8

7

• 32G FC module has 3 port groups

• Each port-group has 16 ports

• Recommendation - Distribute links uniformly

Port-Group 1 Port-Group 2 Port-Group 3

32G FC module for IBM c-type directors

Page 14: IBM SAN c-type update · IBM SAN c-type directors –Performance • Industry’s first 1.5 Tbps per slot switching capacity directors • Non-blocking, Non-Oversubscribed architecture

IBM SAN768C-6 for consolidation and expansion

224 224 224

96

1:1

7:1

7:1

Total = 672

Total = 96

32 ISLs each @ 32 Gbps

Total = 768

SAN768-C

• 1 x SAN768C-6 can provide connectivity to as many end-devices as by 4 (or more) x any other director

• Less number of directors ➔ Less operational overhead

• More backplane switching, smaller networks

Page 15: IBM SAN c-type update · IBM SAN c-type directors –Performance • Industry’s first 1.5 Tbps per slot switching capacity directors • Non-blocking, Non-Oversubscribed architecture

IBM SAN50C-R

• Up to 40 ports @ 2/4/8/16 FC, 8 ports @ 1/10GbE FCoE and 2 ports @ 1/10GbE IPS

• Base license includes 20 x FC ports. FCIP included in base license

• Line-rate FCIP Performance

• Hardware based encryption and compression of FCIP traffic

• Additional intelligent features like IO Acceleration (IOA) and Data Mobility Manager (DMM)

40 x 2/4/8/16G FC 8 x 1/10GbE FCoE

2 x 1/10GbE IPS

Page 16: IBM SAN c-type update · IBM SAN c-type directors –Performance • Industry’s first 1.5 Tbps per slot switching capacity directors • Non-blocking, Non-Oversubscribed architecture

IBM c-type fabric switches

Designed for all Flash and NVMe workloads

Full-duplex 32G line rate Ports

Proactive resolution of problems

Integrated Analytics & Telemetry

Investment Protection

NVMe/FC qualified

Non-stop operations

Enterprise-class redundancy

8 ports

16 ports

24 ports

32 ports

24 ports

32 ports

40 ports

48 ports

48 ports

64 ports

80 ports

96 ports

Port on demand options

SAN32C-6 SAN48C-6SAN96C-6

Page 17: IBM SAN c-type update · IBM SAN c-type directors –Performance • Industry’s first 1.5 Tbps per slot switching capacity directors • Non-blocking, Non-Oversubscribed architecture

IBM SAN c-type fabric switches port-groups

SAN32C-6

SAN48C-6

SAN96C-6

Port-Group 1 Port-Group 2

Port-Group 1 Port-Group 2 Port-Group 3

Port-Group 1 Port-Group 2 Port-Group 3

Port-Group 4 Port-Group 5 Port-Group 6

Page 18: IBM SAN c-type update · IBM SAN c-type directors –Performance • Industry’s first 1.5 Tbps per slot switching capacity directors • Non-blocking, Non-Oversubscribed architecture

Forward Error Correction & Internal CRC checking

• Faulty equipment, loose SFPs, dirty or damaged cables, can corrupt packets

• IBM SAN c-type prevent flooding of corrupt frames by dropping them -CRC Checking at 3 stages

• FEC may correct frames corrupted in-flight

Drop frame

Ingress CRC Checking

Forward Error Correction

Ingress CRC Checking

When FEC can’t recover corrupted frame

Drop frame

Page 19: IBM SAN c-type update · IBM SAN c-type directors –Performance • Industry’s first 1.5 Tbps per slot switching capacity directors • Non-blocking, Non-Oversubscribed architecture

NVMe/FC

Page 20: IBM SAN c-type update · IBM SAN c-type directors –Performance • Industry’s first 1.5 Tbps per slot switching capacity directors • Non-blocking, Non-Oversubscribed architecture

Fibre Channel Architecture & NVMe

• NVMe/FC extends benefits of NVMe over Fibre Channel fabric

• Utilize FC benefits like plug-n-play, fabric server, name server, zone server, etc.

• Deploy in existing infra using latest HBAs, switches, & management software

• NVMe/FC, SCSI-FCP & FICON can be transported concurrently in the same fabric

FC-4

FC-2

FC-1

FC-0

Application

SCSI FICON IP

Framing & Flow Control

Encoding

IPNVMe

FC-3 Generic Services

Physical Interface (1-32G FC)

ULP Mapping

Page 21: IBM SAN c-type update · IBM SAN c-type directors –Performance • Industry’s first 1.5 Tbps per slot switching capacity directors • Non-blocking, Non-Oversubscribed architecture

NVMe/FC – Phased & Seamless transition

• Dual-stack end-devices – concurrent support of NVMe & SCSI transport

• Multiprotocol switching in IBM c-type – simultaneous switching of NVMe & SCSI transport encapsulated in Fibre Channel frames

• SCSI-only or NVMe capability of end-devices is auto-detected and advertised

• Similar to the existing plug-and-play architecture of Fibre Channel

• NVMe/FC is independent of FC speed. Higher speeds recommended.

Traditional FC-SCSI capable initiator

NVMe/FC capable initiator

Traditional FC-SCSI capable target

NVMe/FC capable target

FCSCSIHBA

HBA

FCSCSI

FCSCSI NVMe

Cisco C-series Rack Servers

IBM SAN c-typeFC

SCSI NVMe

Page 22: IBM SAN c-type update · IBM SAN c-type directors –Performance • Industry’s first 1.5 Tbps per slot switching capacity directors • Non-blocking, Non-Oversubscribed architecture

NVMe/FC – Phased & Seamless transition

• End-devices register Upper Layer Protocol (ULP) with FCNS database, to be advertised to other end-devices in the same zone

FCNS databaseTraditional FC-SCSI capable initiator

NVMe/FC capable initiator

Traditional FC-SCSI capable target

NVMe/FC capable target

FCSCSIHBA

HBA

FCSCSI

FCSCSI NVMe

Cisco C-series Rack Servers

IBM SAN c-typeFC

SCSI NVMe

IBM_c_type# show fcns database vsan 160

VSAN 160:

--------------------------------------------------------------------------

FCID TYPE PWWN (VENDOR) FC4-TYPE:FEATURE

--------------------------------------------------------------------------

0x590020 N 10:00:00:90:fa:e0:08:5d (Emulex) scsi-fcp:init

0x590140 N 21:00:00:24:ff:7f:06:39 (Qlogic) scsi-fcp:init(showing entries only for dual-stack NVMe capable initiators. Other devices will look similar)

NVMe:init

NVMe:init

Page 23: IBM SAN c-type update · IBM SAN c-type directors –Performance • Industry’s first 1.5 Tbps per slot switching capacity directors • Non-blocking, Non-Oversubscribed architecture

NVMe over Fabrics – What changes on wire?

SCSI & HDD basedstorage arrays

IO IO IO

Inter Frame Gap

Inter IO Gap

SCSI & SSD based storage arrays

IO IO IO IO IO

Inter IO Gap

Inter Frame Gap

NVMe & upcoming NVM based storage

arraysInter Frame Gap

IO IO IO IO IO IO IO IO IO IO IO IO IO

Inter IO Gap

Page 24: IBM SAN c-type update · IBM SAN c-type directors –Performance • Industry’s first 1.5 Tbps per slot switching capacity directors • Non-blocking, Non-Oversubscribed architecture

NVMe over Fabrics – What changes on wire?

SCSI & HDD basedstorage arrays

IO IO IO

Inter Frame Gap

Inter IO Gap

SCSI & SSD based storage arrays

IO IO IO IO IO

Inter IO Gap

Inter Frame Gap

NVMe & upcoming NVM based storage

arraysInter Frame Gap

IO IO IO IO IO IO IO IO IO IO IO IO IO

Inter IO Gap

• Large inter-frame gap• Large inter-IO gap• Occasional line-rate

utilization

• Smaller inter-frame gap• Smaller inter IO gaps• Frequent line-rate bursts

• Minimum inter-frame gap• Minimum inter-IO gap• Sustained line-rate

utilization

Page 25: IBM SAN c-type update · IBM SAN c-type directors –Performance • Industry’s first 1.5 Tbps per slot switching capacity directors • Non-blocking, Non-Oversubscribed architecture

Storage network possibilities with NVMe workloads

Dedicated underlying networksShared underlying network

Traffic segregation via VSAN and Virtual Links

Non-NVMetargets

NVMetargets

Non-NVMeinitiators

NVMeinitiators

Non-NVMetargets

NVMetargets

Non-NVMeinitiators

NVMeinitiators

Page 26: IBM SAN c-type update · IBM SAN c-type directors –Performance • Industry’s first 1.5 Tbps per slot switching capacity directors • Non-blocking, Non-Oversubscribed architecture

IBM SAN c-type for NVMe/FC

NVMe analytics

Fully integrated visibility into thousands of flows and real-time analytics

Seamless InsertionNo extra config on c-type to connect NVMe/FC end-devices

Superior ArchitectureConsistent & predictable frame switching with hardware based congestion detection & avoidance

Investment Protection3-generation of speeds within same c-type director without any forklift upgrade

Multiprotocol FlexibilityCo-existence of SCSI and NVMe workloads over FC or FCoE

Page 27: IBM SAN c-type update · IBM SAN c-type directors –Performance • Industry’s first 1.5 Tbps per slot switching capacity directors • Non-blocking, Non-Oversubscribed architecture

SAN Analytics and Telemetry

Page 28: IBM SAN c-type update · IBM SAN c-type directors –Performance • Industry’s first 1.5 Tbps per slot switching capacity directors • Non-blocking, Non-Oversubscribed architecture

Are you ready for All Flash and NVMe storage?

IBM Flashsystem 9100

10 million IOPS, 136 GB/s throughput

• Is your environment ready to derive that performance today? and

• Maintain that performance for 6/12/18 months?

BUT

Page 29: IBM SAN c-type update · IBM SAN c-type directors –Performance • Industry’s first 1.5 Tbps per slot switching capacity directors • Non-blocking, Non-Oversubscribed architecture

You think you can go fast?

Think again

Page 30: IBM SAN c-type update · IBM SAN c-type directors –Performance • Industry’s first 1.5 Tbps per slot switching capacity directors • Non-blocking, Non-Oversubscribed architecture

The problem statement

Compute & Applications

DatabaseServer

WebServer

Video Streaming

Server

OLTP

Application issues

Application

File System

Block

SCSI

FC Driver

HBA (firmware)

• Too many components involved

• Every role limited by own view

• Virtualization adds complexity

• Hybrid-shared environments

• Bare-metals & virtualized servers

• Spinning disks & All flash arrays

• Multiple speed (2/4/8/16/32G FC)

Storage

All Flash Arrays

Spinning Disk Arrays

Drive enclosure

Backend connect

Storage Processor

FC Driver

HBA (firmware)

SAN

Writes

Reads

When application user complains, where do you start troubleshooting? 30

Not a App/Host

issue

Not a SAN issue

Not a Storage issue

Page 31: IBM SAN c-type update · IBM SAN c-type directors –Performance • Industry’s first 1.5 Tbps per slot switching capacity directors • Non-blocking, Non-Oversubscribed architecture

Complete I/O visibility using IBM SAN c-type

Compute & Applications

DatabaseServer

WebServer

Video Streaming

Server

OLTP

Application issues

Application

File System

Block

SCSI

FC Driver

HBA (firmware)

• Deep packet visibility

• FC & SCSI/NVMe headers only

• Monitor in real time

• Vendor neutral monitoring

Storage

All Flash Arrays

Spinning Disk Arrays

Drive enclosure

Backend connect

Storage Processor

FC Driver

HBA (firmware)

SAN

Writes

Reads

31

FC SCSI/NVMe Data

Monitor the wire – Know I/O traffic pattern – Solve problems proactively

Page 32: IBM SAN c-type update · IBM SAN c-type directors –Performance • Industry’s first 1.5 Tbps per slot switching capacity directors • Non-blocking, Non-Oversubscribed architecture

Complete I/O visibility using IBM SAN c-type

Compute & Applications

DatabaseServer

WebServer

Video Streaming

Server

OLTP

Application issues

Application

File System

Block

SCSI

FC Driver

HBA (firmware)

Storage

All Flash Arrays

Spinning Disk Arrays

Drive enclosure

Backend connect

Storage Processor

FC Driver

HBA (firmware)

SAN

Writes

Reads

32

FC SCSI/NVMe Data

Monitor the wire – Know I/O traffic pattern – Solve problems proactively

I/O metric streamingI/O metrics available to external

receivers in open format

Automatic baseline & deviation calculations for all monitored end-devices on DCNM

DCNM = Data Center Network Manager

Page 33: IBM SAN c-type update · IBM SAN c-type directors –Performance • Industry’s first 1.5 Tbps per slot switching capacity directors • Non-blocking, Non-Oversubscribed architecture

ISL Ports Host PortsStorage Ports

IBM SAN c-type – Analytics deploymentAnalytics enabled port(s)

• Inspection of traffic at least once in the end-to-end data path is enough

• Rip and replace of existing switches or modules not required

Closest to storage High capacity 32G ISLs Closest to apps

or

Enable analytics where you want, when you want

Page 34: IBM SAN c-type update · IBM SAN c-type directors –Performance • Industry’s first 1.5 Tbps per slot switching capacity directors • Non-blocking, Non-Oversubscribed architecture

SAN Analytics on IBM SAN c-typeSimple

Enabled by single command on switch ports leading to automatic learning of flows

Scalable

Scales with the size of your fabric - A few to thousands of ports

FlexibleDeploy when you want, where you want

Affordable

No expensive traffic inspection devices

Open and Programmable

Metrics available in open format for easy 3rd party integration

SAN Insights – An integrated analytics engine within DCNM for end-to-end visibility, automatic baseline and deviation calculations and more…

Page 35: IBM SAN c-type update · IBM SAN c-type directors –Performance • Industry’s first 1.5 Tbps per slot switching capacity directors • Non-blocking, Non-Oversubscribed architecture

Slow DrainDetection,

Troubleshooting and Recovery

Page 36: IBM SAN c-type update · IBM SAN c-type directors –Performance • Industry’s first 1.5 Tbps per slot switching capacity directors • Non-blocking, Non-Oversubscribed architecture

SAN Congestion – Overview on IBM SAN c-type

Detection Troubleshooting Automatic Recovery

2.5 µsgranularity

1 msgranularity

Page 37: IBM SAN c-type update · IBM SAN c-type directors –Performance • Industry’s first 1.5 Tbps per slot switching capacity directors • Non-blocking, Non-Oversubscribed architecture

Troubleshooting Automatic Recovery

SAN Congestion – Overview on IBM SAN c-type

Detection

Real time credit unavailability duration at microsecond granularity TXWait period for frames

Real-time credit unavailability duration at millisecond granularity Slowport-monitor

Real-time credit unavailability duration at 100 millisecond granularity Credit unavailability at 100 ms

Could not respond to Link Reset due to non empty receive queueLR Rcvd B2B

Number of Tx B2B credits agreed initially & instantaneous available value Credits and remaining Credits

Remaining Tx B2B credit count went to zeroCredit transition to zero

Remaining Tx B2B credits were zero for longer duration (1s for F, 1.5 for E) Credit Loss

Fabric wide single-pane-of-glass visibility for pin-pointing within minutesDCNM

Page 38: IBM SAN c-type update · IBM SAN c-type directors –Performance • Industry’s first 1.5 Tbps per slot switching capacity directors • Non-blocking, Non-Oversubscribed architecture

SAN Congestion – Overview on IBM SAN c-type

Detection

Real time credit unavailability duration at microsecond granularity TXWait period for frames

Real-time credit unavailability duration at millisecond granularity Slowport-monitor

Real-time credit unavailability duration at 100 millisecond granularity Credit unavailability at 100 ms

Could not respond to Link Reset due to non empty receive queueLR Rcvd B2B

Number of Tx B2B credits agreed initially & instantaneous available value Credits and remaining Credits

Remaining Tx B2B credit count went to zeroCredit transition to zero

Remaining Tx B2B credits were zero for longer duration (1s for F, 1.5 for E) Credit Loss

Fabric wide single-pane-of-glass visibility for pin-pointing within minutesDCNM

Troubleshooting Automatic Recovery

Page 39: IBM SAN c-type update · IBM SAN c-type directors –Performance • Industry’s first 1.5 Tbps per slot switching capacity directors • Non-blocking, Non-Oversubscribed architecture

SAN Congestion – Overview on IBM SAN c-type

Detection

Key information of dropped frames due to timeout

Real-time display of frames in ingress queue

Request denied by arbiter to send frame from ingress to egress via xbar

Frames not being switched out of switch within timeout

History of events over weeks or months along with time stamp

Graphical representation of TxWait for last 60 min, 1 hr & 72 hr

Troubleshooting Automatic Recovery

Dropped frame information

Display frames in ingress Q

Arbitration timeout

Timeout discards

OBFL logging

History Graph

Automated collection of counters with end-to-end troubleshootingDCNM

Page 40: IBM SAN c-type update · IBM SAN c-type directors –Performance • Industry’s first 1.5 Tbps per slot switching capacity directors • Non-blocking, Non-Oversubscribed architecture

SAN Congestion – Overview on IBM SAN c-type

Detection

Key information of dropped frames due to timeout

Real-time display of frames in ingress queue

Request denied by arbiter to send frame from ingress to egress via xbar

Frames not being switched out of switch within timeout

History of events over weeks or months along with time stamp

Graphical representation of TxWait for last 60 min, 1 hr & 72 hr

Troubleshooting Automatic Recovery

Dropped frame information

Display frames in ingress Q

Arbitration timeout

Timeout discards

OBFL logging

History Graph

Automated collection of counters with end-to-end troubleshootingDCNM

Page 41: IBM SAN c-type update · IBM SAN c-type directors –Performance • Industry’s first 1.5 Tbps per slot switching capacity directors • Non-blocking, Non-Oversubscribed architecture

SAN Congestion – Overview on IBM SAN c-type

Detection

Prevent head-of-line blocking

Alert only (9 configurable counters) – Manual recovery

Frame in switch > congestion-drop timeout? Drop it.

Frames not being switched out of switch within timeout

Send Link Reset primitive if Tx credits unavailable for longer duration

Flexibility of port flap, shutdown or isolation to a slow virtual link by an automated policy

Troubleshooting Automatic Recovery

Virtual Output Queues (VOQ)

SNMP Traps

Congestion-drop timeout

No-credit-drop timeout

Credit-loss recovery

Port-flap

Error-disable

Isolation to Virtual Links

Page 42: IBM SAN c-type update · IBM SAN c-type directors –Performance • Industry’s first 1.5 Tbps per slot switching capacity directors • Non-blocking, Non-Oversubscribed architecture

SAN Congestion – Overview on IBM SAN c-type

Detection

TXWait period for frames

Slowport-monitor

Credit unavailability at 100 ms

LR Rcvd B2B

Credits and remaining Credits

Credit transition to zero

Credit Loss

DCNM

Dropped frame information

Display frames in ingress Q

Arbitration timeout

Timeout discards

OBFL logging

History Graph

Virtual Output Queues (VOQ)

SNMP Traps

Congestion-drop timeout

No-credit-drop timeout

Credit-loss recovery

Port-flap

Error-disable

Isolation to Virtual Links

Troubleshooting Automatic Recovery

Page 43: IBM SAN c-type update · IBM SAN c-type directors –Performance • Industry’s first 1.5 Tbps per slot switching capacity directors • Non-blocking, Non-Oversubscribed architecture

• TxWait is a hardware counter with nanosecond (ns) visibility

• Increments 2-3 ns if port is at 0 Tx B2B credits & frames are queued for transmit. Reported in units of 2.5 microseconds (µs) because

• FICON requirements

• Nanosecond is too fast to interpret

• Time in seconds a port was unable to transmit a queued frame due to Tx B2B credit unavailability = (TxWait * 2.5) / 1000000

• 5642973696 * 2.5/1000000 = 14107 seconds

• FC1/1 was unable to transmit for 14107 secs since the last counter clear

TxWait – Congestion detection at 2.5µs

Other convenient approaches are available to monitor TxWait

SAN384C-6# show interface fc1/1 counters | include wait

5642973696 2.5us Tx waits due to lack of transmit credits

Page 44: IBM SAN c-type update · IBM SAN c-type directors –Performance • Industry’s first 1.5 Tbps per slot switching capacity directors • Non-blocking, Non-Oversubscribed architecture

• Intuitive way of reporting of the duration the frames could not be transmitted

• In below output, frames could not be transmitted out of port fc1/13 for 1% duration in last 1 second, 5% duration in last 1 minute and so on due to lack of transmit B2B credits

TxWait – Monitoring percentage

SAN192C-6# show interface fc1/13 counters fc1/13

<snip>

5 Transmit B2B credit transitions to zero

2 Receive B2B credit transitions to zero

0 2.5us TxWait due to lack of transmit credits

Percentage Tx credits not available for last 1s/1m/1h/72h: 1%/5%/3%/2%

32 receive B2B credit remaining

128 transmit B2B credit remaining

128 low priority transmit B2B credit remaining

<snip>

Page 45: IBM SAN c-type update · IBM SAN c-type directors –Performance • Industry’s first 1.5 Tbps per slot switching capacity directors • Non-blocking, Non-Oversubscribed architecture

TxWait – Health report of port• Graphical display of time when

credits were not available

• 3 graphs per port

• Last 60 seconds

• Last 60 minutes

• Last 72 hours

• Top 3 rows(read vertically) Actual TxWait in ms

• Middle 10 rows(graph plot using #)

• Bottom 2 rows (last 60 seconds)

• Example: @ 15th second, TxWait = 989ms, @35th second, TxWait = 752ms

SAN768C-6# show process creditmon txwait-history

TxWait history for port fc1/13:

==============================

79998 79993 999999

08887 58882 9899999

000000000000299870000000000000000029994000000000000362999500

1000 ### ### ######

900 #### ### ######

800 #### #### ######

700 ##### #### ######

600 ##### #### ######

500 ##### #### ######

400 ##### #### ######

300 ##### ##### ######

200 ##### ##### ######

100 ##### ##### #######

0....5....1....1....2....2....3....3....4....4....5....5....6

0 5 0 5 0 5 0 5 0 5 0

Credit Not Available per second (last 60 seconds)

# = TxWait (ms)

Page 46: IBM SAN c-type update · IBM SAN c-type directors –Performance • Industry’s first 1.5 Tbps per slot switching capacity directors • Non-blocking, Non-Oversubscribed architecture

TxWait – Granular & long duration reportingswitch# show logging onboard txwait

Notes:

- sampling period is 20 seconds

- only txwait delta value >= 100 ms are logged

---------------------------------

Module: 1 txwait count

---------------------------------

-----------------------------------------------------------------------------

| Interface | Delta TxWait Time | Congestion | Timestamp |

| | 2.5us ticks | seconds | | |

-----------------------------------------------------------------------------

| fc1/11 | 3435973 | 08 | 42% | Sun Sep 30 05:23:05 2001 |

| fc1/11 | 6871947 | 17 | 85% | Sun Sep 30 05:22:25 2001 |

• TxWait delta value is logged periodically(20 seconds) into OBFL, if delta value >=100ms.

• Displays TxWait time in 2.5us ticks as well as in seconds.

• Congestion value is displayed in percentage over period of 20 seconds.

• Timestamp of event occurrence also recorded.

• OBFL = On-board Failure Logging (Buffer)

Page 47: IBM SAN c-type update · IBM SAN c-type directors –Performance • Industry’s first 1.5 Tbps per slot switching capacity directors • Non-blocking, Non-Oversubscribed architecture

Slowport Monitor

SAN384C-6(config)# system timeout slowport-monitor ?

<1-500> Configure number of milliseconds

default Default timeout value for HW slowport monitoring

SAN384C-6(config)# system timeout slowport-monitor 1 ?

logical-type Enter the port mode

SAN384C-6(config)# system timeout slowport-monitor 1 logical-type ?

core E mode

edge F mode

• Shows real time delay of data traffic on all ports

• Duration of Tx B2B credit unavailability on a port (and hence, no transmit of frames)

• Monitoring at as low as 1ms

• Hardware assisted! No overhead on CPU

• Recommendation: Always Turn it on!

Page 48: IBM SAN c-type update · IBM SAN c-type directors –Performance • Industry’s first 1.5 Tbps per slot switching capacity directors • Non-blocking, Non-Oversubscribed architecture

Understanding Slowport Monitor output

SAN384C-6# show process creditmon slowport-monitor-events

Module: 01 Slowport Detected: YES

===========================================================

Interface = fc1/18

------------------------------------------------------------

| admin | slowport | oper | Timestamp

| delay | detection | delay |

| (ms) | count | (ms) |

------------------------------------------------------------

| 1 | 128 | 9 | Wed Jul 2 19:47:19.922 2014

| 1 | 127 | 4 | Wed Jul 2 19:47:19.618 2014

| 1 | 119 | 10 | Wed Jul 2 19:47:19.518 2014

| 1 | 109 | 10 | Wed Jul 2 19:47:19.418 2014

| 1 | 101 | 10 | Wed Jul 2 19:47:19.318 2014

| 1 | 100 | 4 | Wed Jul 2 19:47:19.118 2014

| 1 | 93 | 10 | Wed Jul 2 19:47:19.017 2014

| 1 | 83 | 10 | Wed Jul 2 19:47:18.917 2014

| 1 | 74 | 12 | Wed Jul 2 19:47:18.818 2014

Configured Delay via CLI. All delay values

larger than this value will be logged

Number of times the delay was detected.

Subtract from previous value for recent change

Duration of Tx B2B credit unavailability on the port

Timestamp when delay was observed

Output is also stored in OBFL for longer duration

Page 49: IBM SAN c-type update · IBM SAN c-type directors –Performance • Industry’s first 1.5 Tbps per slot switching capacity directors • Non-blocking, Non-Oversubscribed architecture

Slow Drain due to Tx B2B credit starvation

Array 1

B

BBBB

BBB

BB

BBBB

BBBB

Server 1

Switch 1 Switch 2

FrameFrameFrame

FrameFrame

FrameFrameFrameFrame

FrameFrameFrameFrame

FrameFrameFrame

Frame

Frame

R_RDYBackPressure

R_RDY

R_RDY BackPressure

Array 2

B

B

Frame

Frame

R_RDY

BBB

FrameFrameFrame

BBServer 2

BBB

Culprit

Impacted

ImpactedImpacted

• A single misbehaving host, not sending R_RDY fast enough, is a slow drain device & causes congestion

• Multiple end-devices sharing the same pair of switches & ISLs are impacted

• Switchport connected to a slow drain device is starved for Tx B2B credits

• Resolution depends on the duration of Tx B2B credit unavailability on switchport connected to the slow drain device

Page 50: IBM SAN c-type update · IBM SAN c-type directors –Performance • Industry’s first 1.5 Tbps per slot switching capacity directors • Non-blocking, Non-Oversubscribed architecture

Automatic recovery via no-credit-drop timeout

Array 1

B

BBBB

BBB

BB

BBBB

BBBB

Server 1

Switch 1 Switch 2

FrameFrameFrame

FrameFrame

Frame

Frame

R_RDYBackPressure

R_RDY

R_RDY BackPressure

Array 2

B

B

R_RDY

BBB

BBServer 2

BBB

Culprit

Impacted

ImpactedImpacted

• Recovery of traffic to healthy edge device depends on the efficiency of frame drop going to slow drain device

• Lower the no-credit-drop timeout, better is the efficiency and hence, better traffic recovery

• Automatic on and off natively by port-ASIC. Minimal granularity of 1 millisecond (ms)

Drop frame at edge port connected to slow drain device

Frame

Frame

Page 51: IBM SAN c-type update · IBM SAN c-type directors –Performance • Industry’s first 1.5 Tbps per slot switching capacity directors • Non-blocking, Non-Oversubscribed architecture

Automatic recovery via Isolation to Virtual Links

Array 1

B

BBBB

BBB

BB

BBBB

BBBB

Server 1

Switch 1 Switch 2

FrameFrameFrame

Frame

Frame

R_RDYR_RDY

Array 2

B

B

R_RDY

BBB

BBServer 2

BBB

• Virtual links have dedicated B2B credits and flow-control mechanism

Frame

Frame

VL0VL1

VL2VL3

FrameFrameFrame

FrameFrameFrameFrame

Frame FrameFrame FrameFrameFrame R_RDY

R_RDY

R_RDY

R_RDY

R_RDY

Without congestion - All frames use VL3

ISL

Virtual Links

Page 52: IBM SAN c-type update · IBM SAN c-type directors –Performance • Industry’s first 1.5 Tbps per slot switching capacity directors • Non-blocking, Non-Oversubscribed architecture

Automatic recovery via Isolation to Virtual Links

Array 1

B

BBBB

BBB

BB

BBBB

BBBB

Server 1

Switch 1 Switch 2

FrameFrameFrame

Frame

Frame

R_RDYR_RDY

Array 2

B

B

R_RDY

BBB

BBServer 2

BBB

• Automatic Isolation of port connected to slow drain device

• Traffic going to slow drain device is automatically moved to VL2

• Congestion is isolated to VL2. Other traffic in VL3 remains unaffected

Frame

Frame

VL0VL1

VL2VL3

FrameFrameFrame

FrameFrameFrameFrame

Frame

FrameFrame Frame

FrameFrame

R_RDY

R_RDY

R_RDY

Under congestion – Traffic to slow drain device is isolated to VL2. Other traffic remains on VL3

R_RDY

Culprit

Impacted

ISL

Virtual Links

Page 53: IBM SAN c-type update · IBM SAN c-type directors –Performance • Industry’s first 1.5 Tbps per slot switching capacity directors • Non-blocking, Non-Oversubscribed architecture

Slow Drain Recovery Approaches – Usage

Tx B2B credit continuous unavailability duration on port (ms)

100 200 300 400 500 1000

no-credit-droptimeout

congestion-droptimeout

Port-flap

Port-shutdown(Error Disable)

Port Isolation(Virtual Links)

Enable all the features together – one for every duration

Page 54: IBM SAN c-type update · IBM SAN c-type directors –Performance • Industry’s first 1.5 Tbps per slot switching capacity directors • Non-blocking, Non-Oversubscribed architecture

Port-monitor – Automated Alerting and Recovery

• Use pre-built templates or save your own custom template

• Push to the selected switches by port type

Page 55: IBM SAN c-type update · IBM SAN c-type directors –Performance • Industry’s first 1.5 Tbps per slot switching capacity directors • Non-blocking, Non-Oversubscribed architecture

IBM SAN c-type summaryIntegrated Analytics for Deep Visibility

• Industry’s first & only switch integrated visibility and analytics• Open and Programmable architecture

1

Performance & Scalability• Industry’s first 1.5 /3 Tbps per slot capacity for All Flash Arrays• Industry’s only director with 768 ports for CapEx & OpEx savings

3

Exceptional Reliability & Availability• Dual Supervisors, Grid power redundancy, Optional fabric redundancy• Hardware base SAN Congestion detection & automatic recovery

4

Operational Simplicity• Programmability via switch native Python, TCL & RESTful APIs• Quick provisioning & simple management via DCNM

5

Investment Protection• No forklift upgrade for higher speeds & newer Protocols (NVMe)• Industry’s 1st 64G-ready directors

2

Page 56: IBM SAN c-type update · IBM SAN c-type directors –Performance • Industry’s first 1.5 Tbps per slot switching capacity directors • Non-blocking, Non-Oversubscribed architecture