nfs/rdma over 40gbps iwarp · 2019. 12. 21. · reliable datagram service . rds iscsi rdma protocol...
TRANSCRIPT
2014 Storage Developer Conference. © Chelsio Communications, Inc. All Rights Reserved.
NFS/RDMA over 40Gbps iWARP
Wael Noureddine Chelsio Communications
2014 Storage Developer Conference. © Chelsio Communications, Inc. All Rights Reserved.
Outline
RDMA Motivating trends iWARP
NFS over RDMA Overview Chelsio T5 support Performance results
2
2014 Storage Developer Conference. © Chelsio Communications, Inc. All Rights Reserved.
Adoption Rate of 40GbE
3 Source: Crehan Research - 2Q14 CREHAN Quarterly Market Share Tables
2014 Storage Developer Conference. © Chelsio Communications, Inc. All Rights Reserved.
Software Defined Everything
4 Source: European Telecommunications Standards Institute http://portal.etsi.org/nfv/nfv_white_paper.pdf October, 2012
2014 Storage Developer Conference. © Chelsio Communications, Inc. All Rights Reserved.
Motivating Trends
Unprecedented curve in 40GbE growth (and pricing) Consolidation and virtualization
Software defined storage (everything) using commodity hardware
Rise of the data center Power efficiency
High speed, ultra low latency SSDs Need for high performance, high efficiency fabric
Ethernet remains the preferred technology TCP/IP for scalability, reliability, robustness and reach
iWARP RDMA over Ethernet 5
2014 Storage Developer Conference. © Chelsio Communications, Inc. All Rights Reserved.
RDMA Overview Direct memory-to-memory
transfer All protocol processing handled
by the NIC Must be in hardware
Protection handled by the NIC User space access requires both
local and remote enforcement Asynchronous communication
model Reduced host involvement
Performance Latency – polling Throughput
Efficiency Zero copy Kernel bypass (user space I/O) CPU bypass
6
T5 T5
Wireless/LAN/Datacenter/WAN
Network
Protection
Protocol Processing
MEMORY MEMORY
Payload Notifications
CQ
Payload
Host Host
CQ
Notifications
Packets Packets
Performance and efficiency in return for new communication paradigm
2014 Storage Developer Conference. © Chelsio Communications, Inc. All Rights Reserved.
iWARP RDMA over Ethernet
IETF RFCs in 2007 Open standard Multiple vendors
Ongoing standardization Extensions to maintain API
uniformity with InfiniBand Recent RFC 7306 by
Broadcom, Chelsio and Intel Mature stack
3rd generation hardware RDMA over TCP/IP/Ethernet
TCP reliability, scalability, congestion and flow control
IP routability Ethernet ubiquity
Wireless ready Near 10Gbps, low latency
Cloud ready Standard TCP/IP foundation No network restrictions
Full featured implementation All RDMA benefits
High performance High packet rate Low latency (1.5usec user-to-
user) Line rate 40Gb with single
connection
7
2014 Storage Developer Conference. © Chelsio Communications, Inc. All Rights Reserved.
iWARP Benefits Convergence
Coexists with all other traffic on same port
No special treatment needed Familiar protocol stack
Standard tools for monitoring/debugging
Standard network function appliances (security, load balancing…)
Plug-and-play No need for lossless network
operation Leverages existing
infrastructure Less expensive network
hardware Easy to deploy and manage
Leverages decades of TCP/IP experience Congestion avoidance and
control Critical for network stability
Reliability at hardware speeds Retransmission and re-
ordering Routable
Goes wherever IP is spoken Scalable across
Network size Network architecture Distance
Reliable, robust, scalable
8
2014 Storage Developer Conference. © Chelsio Communications, Inc. All Rights Reserved.
Linux RDMA Architecture
9
RDMA NIC R-NIC
Host Channel Adapter
HCA
User Direct Access Programming Lib
UDAPL
Reliable Datagram Service
RDS
iSCSI RDMA Protocol (Initiator)
iSER
SCSI RDMA Protocol (Initiator)
SRP
Sockets Direct Protocol
SDP
IP over InfiniBand IPoIB
Performance Manager Agent
PMA
Subnet Manager Agent
SMA
Management Datagram
MAD
Subnet Administrator SA
Common
InfiniBand
iWARP
Key
InfiniBand HCA iWARP R-NIC
Hardware Specific Driver
Hardware Specific Driver
Connection Manager
MAD
InfiniBand OpenFabrics Kernel Level Verbs / API iWARP
SA Client
Connection Manager
Connection Manager Abstraction (CMA)
InfiniBand OpenFabrics User Level Verbs / API iWARP
SDP IPoIB SRP iSER RDS
SDP Lib
User Level MAD API
Open SM
Diag Tools
Hardware
Provider
Mid-Layer
Upper Layer Protocol
User APIs
Kernel Space
User Space
NFS-RDMA RPC
Cluster File Sys
Application Level
SMA
Clustered DB Access
Sockets Based Access
Various MPIs
Access to File
Systems
Block Storage Access
IP Based App
Access
Apps & Access
Methods for using OF Stack
UDAPL
Ker
nel b
ypas
s
Ker
nel b
ypas
s
2014 Storage Developer Conference. © Chelsio Communications, Inc. All Rights Reserved.
NFS over RDMA Timeline
NetApp/Sun 2007 IETF RFCs
RFC 5532 problem statement in 2009 RFC 5666 RDMA for RPC in 2010 RFC 5667 NFS DDP in 2010
Renewed effort with rise in RDMA interest Under active development – mostly client side Chelsio, Emulex, Intel, LANL, Mellanox, NASA, NetApp,
Oracle…
10
2014 Storage Developer Conference. © Chelsio Communications, Inc. All Rights Reserved.
NFS over RDMA Overview
NFS extensions to use RDMA fabric (for NFSv2,3,4) Client sends RPC in RDMA messages Server initiates RDMA data transfer transactions
Reduces client side CPU utilization Eliminates client side data copies Leverages low latency fabric Requires NIC with RDMA offload at both server and
client ends
11
2014 Storage Developer Conference. © Chelsio Communications, Inc. All Rights Reserved.
NFS Client Stack
12
NFS Client
RPC Transport Switch
TCP/IP or UDP/IP RPC
RDMA CM
IB CM IW CM
Host Stack
TCP Offload Module
RDMA Driver Network Device Driver
Kernel
RDMA Offload
TCP Offload T5 NIC
RDMA Verbs
RDMA RPC
2014 Storage Developer Conference. © Chelsio Communications, Inc. All Rights Reserved.
Chelsio T5 Ethernet Controller ASIC
13
Single processor data-flow pipelined architecture
Up to 1M connections Concurrent multi-protocol
operation Full TCP/IPv4|IPv6 offload in
4CLK @500MHz
1G/10G/40G MAC Embedded
Layer 2 Ethernet Switch
Lookup, filtering and Firewall Cut-Through RX Memory
Cut-Through TX Memory
Data-flow Protocol Engine
Traffic Manager
Application Co-Processor TX
Application Co-Processor RX
DMA
Engi
ne
PCI-e
, X8,
Gen
3
General Purpose Processor
Optional external DDR3 memory
1G/10G/40G MAC
100M/1G/10G MAC
100M/1G/10G MAC
On-Chip DRAM Memory Controller
2014 Storage Developer Conference. © Chelsio Communications, Inc. All Rights Reserved.
T5 Storage Protocol Support
14
NFS
Lower Layer Driver
iSCSI
iSER
SMB Direct
NDK
FCoE NVMe
RPC
Network Driver RDMA Driver iSCSI Driver
T5 Network Controller
FCoE Driver
SMB
RDMA Offload
TCP Offload
iSCSI Offload FCoE
Offload NIC
T10-DIX
2014 Storage Developer Conference. © Chelsio Communications, Inc. All Rights Reserved.
Test Configuration Clients (x4)
OS RHEL6.5 Kernel 3.16.0, NFSv4 + latest NFSRDMA fixes Processor Intel(R) Xeon(R) CPU E5-2687W [email protected] No of Processors 2 No of Cores Total 16 (HT Disabled) RAM 64 GB Card Type T580-CR Card Core Clock 500MHz
Server
OS RHEL6.1 Kernel 3.16.0, NFSv4 + latest NFSRDMA fixes Processor Intel(R) Xeon(R) CPU E5-2687W @ 3.10GHz No of Processors 2 No of Cores Total 16 (HT Disabled) RAM 64 GB Card Type T580-CR Card Core Clock 500MHz Share 32GB ramdisk w/ ext2 filesystem.
15
• Clients connected through switch to server with all 40Gbps links • Sequential I/O direct (no buffer caching) • Need OFED 3.12+ for 40G iWARP support
Clients
40 Gb Switch
NFS/RDMA Server
40 Gb 40 Gb 40 Gb
40 Gb
2014 Storage Developer Conference. © Chelsio Communications, Inc. All Rights Reserved.
NFS Write – iWARP vs. L2 NIC
16
0
500
1000
1500
2000
2500
3000
3500
4000
4500
5000
4 8 16 32 64 128 256 512 1024 2048 4096
Thr
ough
put i
n M
B/s
ec
I/O Size in KB
Write
RDMA
NIC
2014 Storage Developer Conference. © Chelsio Communications, Inc. All Rights Reserved.
NFS Write Client Ints/sec – iWARP vs. L2 NIC
17
05000
100001500020000250003000035000400004500050000
4 8 16 32 64 128 256 512 1024 2048 4096
Inte
rrup
ts/s
ec
I/O Size in KB
Write Ints/Sec
RDMA-Clis
NIC-Clis
2014 Storage Developer Conference. © Chelsio Communications, Inc. All Rights Reserved.
NFS Read – iWARP vs. L2 NIC
18
0
500
1000
1500
2000
2500
3000
3500
4000
4500
5000
4 8 16 32 64 128 256 512 1024 2048 4096
Thr
ough
put i
n M
B/s
ec
I/O Size in KB
Read
RDMA
NIC
2014 Storage Developer Conference. © Chelsio Communications, Inc. All Rights Reserved.
NFS Read Client Ints/sec – iWARP vs. L2 NIC
19
05000
100001500020000250003000035000400004500050000
4 8 16 32 64 128 256 512 1024 2048 4096
Inte
rrup
ts/s
ec
I/O Size in KB
Read Ints/Sec
RDMA-Clis
NIC-Clis
2014 Storage Developer Conference. © Chelsio Communications, Inc. All Rights Reserved.
NFS Write – iWARP vs. InfiniBand
20
0500
100015002000250030003500400045005000
4 8 16 32 64 128 256 512 1024 2048 4096
Thro
ughp
ut in
MB/
sec
I/O Size in KB
Write Throughput
IW
IB
RHEL6.4, NFS Share: 40GB ramdisk, ext2 file system Kernel: 3.16.0 + NFSv4 + latest NFSRDMA/cxgb4 fixes, default settings CPU: Intel(R) Xeon(R) CPU E5-2687W 0 @ 3.10GHz 64GB RAM 2 CPUs, 16 cores total, no HT IW HW: Chelsio Communications Inc T580-LP-CR Unified Wire Ethernet Controller IB HW: Mellanox Technologies MT27500 Family [ConnectX-3] FDR
2014 Storage Developer Conference. © Chelsio Communications, Inc. All Rights Reserved.
NFS Write – iWARP vs. FDR InfiniBand
21
0102030405060708090
100
4 8 16 32 64 128 256 512 1024 2048 4096
% C
PU
I/O Size in KB
Write Idle CPU
IW-Srv
IB-Srv
IW-Clis
IB-Clis
2014 Storage Developer Conference. © Chelsio Communications, Inc. All Rights Reserved.
NFS Read – iWARP vs. InfiniBand
22
0500
100015002000250030003500400045005000
4 8 16 32 64 128 256 512 1024 2048 4096
Thro
ughp
ut in
MB/
sec
I/O Size in KB
Read Throughput
IW
IB
2014 Storage Developer Conference. © Chelsio Communications, Inc. All Rights Reserved.
NFS Read – iWARP vs. InfiniBand
23
0102030405060708090
100
4 8 16 32 64 128 256 512 1024 2048 4096
% C
PU
I/O Size in KB
Read Idle CPU
IW-Srv
IB-Srv
IW-Clis
IB-Clis
2014 Storage Developer Conference. © Chelsio Communications, Inc. All Rights Reserved.
Conclusions
RDMA fabric offers potential for improved efficiency SMB v3.0 RDMA transport demonstrated significant gains
Renewed interest in NFS/RDMA Work in progress Performance benefits compared to NIC
iWARP RDMA is shipping at 40Gbps High performance Ethernet alternative to InfiniBand
Chelsio adapter enables simultaneous operation of RDMA, NIC, TOE, iSCSI, FCoE… TCP/IP for Wireless, LAN, Datacenter and Cloud networking Remains “a great all-in-one adapter”*
Call to action Contribute to RDMA and NFS/RDMA in Linux Mailing lists linux-rdma and linux-nfs on vger.kernel.org 24
* Helen Chen et al. OFA NFS/RDMA Presentation 2007
2014 Storage Developer Conference. © Chelsio Communications, Inc. All Rights Reserved.
Thank You
Please visit www.chelsio.com for more info
25