nfs/rdma over 40gbps iwarp - chelsio · nfs/rdma over 40gbps iwarp ... european telecommunications...
TRANSCRIPT
2014 Storage Developer Conference. © Chelsio Communications, Inc. All Rights Reserved.
NFS/RDMA over 40Gbps iWARP
Wael Noureddine
Chelsio Communications
2014 Storage Developer Conference. © Chelsio Communications, Inc. All Rights Reserved.
Outline
RDMA
Motivating trends
iWARP
NFS over RDMA
Overview
Chelsio T5 support
Performance results
2
2014 Storage Developer Conference. © Chelsio Communications, Inc. All Rights Reserved.
Adoption Rate of 40GbE
3Source: Crehan Research - 2Q14 CREHAN Quarterly Market Share Tables
2014 Storage Developer Conference. © Chelsio Communications, Inc. All Rights Reserved.
Software Defined Everything
4Source: European Telecommunications Standards Institute http://portal.etsi.org/nfv/nfv_white_paper.pdf October, 2012
2014 Storage Developer Conference. © Chelsio Communications, Inc. All Rights Reserved.
Motivating Trends
Unprecedented curve in 40GbE growth (and pricing)
Consolidation and virtualization
Software defined storage (everything) using commodity
hardware
Rise of the data center
Power efficiency
High speed, ultra low latency SSDs
Need for high performance, high efficiency fabric
Ethernet remains the preferred technology
TCP/IP for scalability, reliability, robustness and reach
iWARP RDMA over Ethernet 5
2014 Storage Developer Conference. © Chelsio Communications, Inc. All Rights Reserved.
RDMA Overview
Direct memory-to-memory transfer
All protocol processing handled by the NIC Must be in hardware
Protection handled by the NIC User space access requires both
local and remote enforcement
Asynchronous communication model Reduced host involvement
Performance Latency – polling
Throughput
Efficiency Zero copy
Kernel bypass (user space I/O)
CPU bypass
6
T5T5
Wireless/LAN/Datacenter/WAN
Network
Protection
Protocol Processing
MEMORYMEMORY
PayloadNotifications
CQ
Payload
HostHost
CQ
Notifications
Packets Packets
Performance and efficiency in return
for new communication paradigm
2014 Storage Developer Conference. © Chelsio Communications, Inc. All Rights Reserved.
iWARP RDMA over Ethernet
IETF RFCs in 2007
Open standard
Multiple vendors
Ongoing standardization
Extensions to maintain API
uniformity with InfiniBand
Recent RFC 7306 by
Broadcom, Chelsio and Intel
Mature stack
3rd generation hardware
RDMA over TCP/IP/Ethernet
TCP reliability, scalability,
congestion and flow control
IP routability
Ethernet ubiquity
Wireless ready
Near 10Gbps, low latency
Cloud ready
Standard TCP/IP foundation
No network restrictions
Full featured implementation
All RDMA benefits
High performance
High packet rate
Low latency (1.5usec user-to-
user)
Line rate 40Gb with single
connection
7
2014 Storage Developer Conference. © Chelsio Communications, Inc. All Rights Reserved.
iWARP Benefits
Convergence Coexists with all other traffic
on same port
No special treatment needed
Familiar protocol stack Standard tools for
monitoring/debugging
Standard network function appliances (security, load balancing…)
Plug-and-play No need for lossless network
operation
Leverages existing infrastructure
Less expensive network hardware
Easy to deploy and manage
Leverages decades of TCP/IP experience Congestion avoidance and
control
Critical for network stability
Reliability at hardware speeds Retransmission and re-
ordering
Routable Goes wherever IP is spoken
Scalable across Network size
Network architecture
Distance
Reliable, robust, scalable
8
2014 Storage Developer Conference. © Chelsio Communications, Inc. All Rights Reserved.
Linux RDMA Architecture
9
RDMA NICR-NIC
Host Channel
Adapter
HCA
User Direct Access
Programming Lib
UDAPL
Reliable Datagram
Service
RDS
iSCSI RDMA Protocol
(Initiator)
iSER
SCSI RDMA Protocol
(Initiator)
SRP
Sockets Direct
Protocol
SDP
IP over InfiniBandIPoIB
Performance
Manager Agent
PMA
Subnet Manager
Agent
SMA
Management
Datagram
MAD
Subnet AdministratorSA
Common
InfiniBand
iWARP
Key
InfiniBand HCA iWARP R-NIC
Hardware
Specific Driver
Hardware Specific
Driver
Connection
ManagerMAD
InfiniBand OpenFabrics Kernel Level Verbs / API iWARP
SA
ClientConnection
Manager
Connection Manager
Abstraction (CMA)
InfiniBand OpenFabrics User Level Verbs / API iWARP
SDPIPoIB SRP iSER RDS
SDP Lib
User Level
MAD API
Open
SM
Diag
Tools
Hardware
Provider
Mid-Layer
Upper Layer
Protocol
User
APIs
Kernel Space
User Space
NFS-RDMA
RPC
Cluster
File Sys
Application
Level
SMA
Clustered
DB Access
Sockets
Based
Access
Various
MPIs
Access to
File
Systems
Block
Storage
Access
IP Based
App
Access
Apps &
Access
Methods
for using
OF Stack
UDAPL
Kern
el bypass
Kern
el bypass
2014 Storage Developer Conference. © Chelsio Communications, Inc. All Rights Reserved.
NFS over RDMA Timeline
NetApp/Sun 2007
IETF RFCs
RFC 5532 problem statement in 2009
RFC 5666 RDMA for RPC in 2010
RFC 5667 NFS DDP in 2010
Renewed effort with rise in RDMA interest
Under active development – mostly client side
Chelsio, Emulex, Intel, LANL, Mellanox, NASA, NetApp,
Oracle…
10
2014 Storage Developer Conference. © Chelsio Communications, Inc. All Rights Reserved.
NFS over RDMA Overview
NFS extensions to use RDMA fabric (for NFSv2,3,4)
Client sends RPC in RDMA messages
Server initiates RDMA data transfer transactions
Reduces client side CPU utilization
Eliminates client side data copies
Leverages low latency fabric
Requires NIC with RDMA offload at both server and
client ends
11
2014 Storage Developer Conference. © Chelsio Communications, Inc. All Rights Reserved.
NFS Client Stack
12
NFS Client
RPC Transport Switch
TCP/IP or UDP/IP RPC
RDMA CM
IB CM IW CM
Host Stack
TCP Offload
Module
RDMA DriverNetwork Device Driver
Kernel
RDMA Offload
TCP Offload
T5 NIC
RDMA Verbs
RDMA RPC
2014 Storage Developer Conference. © Chelsio Communications, Inc. All Rights Reserved.
Chelsio T5 Ethernet Controller ASIC
13
Single processor data-flow pipelined architecture
Up to 1M connections Concurrent multi-protocol
operation Full TCP/IPv4|IPv6 offload in
4CLK @500MHz
1G/10G/40G MAC
Embedded Layer 2
Ethernet Switch
Lookup, filtering and FirewallCut-Through RX Memory
Cut-Through TX Memory
Data-flow Protocol Engine
Traffic Manager
Application Co-Processor TX
Application Co-Processor RX
DM
A E
ngi
ne
PC
I-e,
X8
, Gen
3
General Purpose Processor
Optional external DDR3 memory
1G/10G/40G MAC
100M/1G/10G MAC
100M/1G/10G MAC
On-Chip DRAMMemory Controller
2014 Storage Developer Conference. © Chelsio Communications, Inc. All Rights Reserved.
T5 Storage Protocol Support
14
NFS
Lower Layer Driver
iSCSI
iSER
SMB
Direct
NDK
FCoENVMe
RPC
Network Driver RDMA Driver iSCSI Driver
T5 Network Controller
FCoE
Driver
SMB
RDMA Offload
TCP Offload
iSCSI OffloadFCoE
Offload
NIC
T10-DIX
2014 Storage Developer Conference. © Chelsio Communications, Inc. All Rights Reserved.
Test Configuration
Clients (x4)
OS RHEL6.5
Kernel 3.16.0, NFSv4 + latest NFSRDMA fixes
Processor Intel(R) Xeon(R) CPU E5-2687W [email protected]
No of Processors 2
No of Cores Total 16 (HT Disabled)
RAM 64 GB
Card Type T580-CR
Card Core Clock 500MHz
Server
OS RHEL6.1
Kernel 3.16.0, NFSv4 + latest NFSRDMA fixes
Processor Intel(R) Xeon(R) CPU E5-2687W @ 3.10GHz
No of Processors 2
No of Cores Total 16 (HT Disabled)
RAM 64 GB
Card Type T580-CR
Card Core Clock 500MHz
Share 32GB ramdisk w/ ext2 filesystem.
15
• Clients connected through switch to server with all 40Gbps links• Sequential I/O direct (no buffer caching)• Need OFED 3.12+ for 40G iWARP support
Clients
40 Gb Switch
NFS/RDMA Server
40 Gb40 Gb40 Gb
40 Gb
2014 Storage Developer Conference. © Chelsio Communications, Inc. All Rights Reserved.
NFS Write – iWARP vs. L2 NIC
16
0
500
1000
1500
2000
2500
3000
3500
4000
4500
5000
4 8 16 32 64 128 256 512 1024 2048 4096
Th
ro
ugh
pu
t in
MB
/sec
I/O Size in KB
Write
RDMA
NIC
2014 Storage Developer Conference. © Chelsio Communications, Inc. All Rights Reserved.
NFS Write Client Ints/sec – iWARP vs. L2 NIC
17
0
5000
10000
15000
20000
25000
30000
35000
40000
45000
50000
4 8 16 32 64 128 256 512 1024 2048 4096
Inte
rru
pts
/se
c
I/O Size in KB
Write Ints/Sec
RDMA-Clis
NIC-Clis
2014 Storage Developer Conference. © Chelsio Communications, Inc. All Rights Reserved.
NFS Read – iWARP vs. L2 NIC
18
0
500
1000
1500
2000
2500
3000
3500
4000
4500
5000
4 8 16 32 64 128 256 512 1024 2048 4096
Th
ro
ugh
pu
t in
MB
/sec
I/O Size in KB
Read
RDMA
NIC
2014 Storage Developer Conference. © Chelsio Communications, Inc. All Rights Reserved.
NFS Read Client Ints/sec – iWARP vs. L2 NIC
19
0
5000
10000
15000
20000
25000
30000
35000
40000
45000
50000
4 8 16 32 64 128 256 512 1024 2048 4096
Inte
rru
pts
/se
c
I/O Size in KB
Read Ints/Sec
RDMA-Clis
NIC-Clis
2014 Storage Developer Conference. © Chelsio Communications, Inc. All Rights Reserved.
NFS Write – iWARP vs. InfiniBand
20
0
500
1000
1500
2000
2500
3000
3500
4000
4500
5000
4 8 16 32 64 128 256 512 1024 2048 4096
Thro
ugh
pu
t in
MB
/se
c
I/O Size in KB
Write Throughput
IW
IB
RHEL6.4, NFS Share: 40GB ramdisk, ext2 file system
Kernel: 3.16.0 + NFSv4 + latest NFSRDMA/cxgb4 fixes, default settings
CPU: Intel(R) Xeon(R) CPU E5-2687W 0 @ 3.10GHz 64GB RAM 2 CPUs, 16 cores total, no HT
IW HW: Chelsio Communications Inc T580-LP-CR Unified Wire Ethernet Controller
IB HW: Mellanox Technologies MT27500 Family [ConnectX-3] FDR
2014 Storage Developer Conference. © Chelsio Communications, Inc. All Rights Reserved.
NFS Write – iWARP vs. FDR InfiniBand
21
0
10
20
30
40
50
60
70
80
90
100
4 8 16 32 64 128 256 512 1024 2048 4096
% C
PU
I/O Size in KB
Write Idle CPU
IW-Srv
IB-Srv
IW-Clis
IB-Clis
2014 Storage Developer Conference. © Chelsio Communications, Inc. All Rights Reserved.
NFS Read – iWARP vs. InfiniBand
22
0
500
1000
1500
2000
2500
3000
3500
4000
4500
5000
4 8 16 32 64 128 256 512 1024 2048 4096
Thro
ugh
pu
t in
MB
/se
c
I/O Size in KB
Read Throughput
IW
IB
2014 Storage Developer Conference. © Chelsio Communications, Inc. All Rights Reserved.
NFS Read – iWARP vs. InfiniBand
23
0
10
20
30
40
50
60
70
80
90
100
4 8 16 32 64 128 256 512 1024 2048 4096
% C
PU
I/O Size in KB
Read Idle CPU
IW-Srv
IB-Srv
IW-Clis
IB-Clis
2014 Storage Developer Conference. © Chelsio Communications, Inc. All Rights Reserved.
Conclusions
RDMA fabric offers potential for improved efficiency
SMB v3.0 RDMA transport demonstrated significant gains
Renewed interest in NFS/RDMA
Work in progress
Performance benefits compared to NIC
iWARP RDMA is shipping at 40Gbps
High performance Ethernet alternative to InfiniBand
Chelsio adapter enables simultaneous operation of RDMA, NIC, TOE,
iSCSI, FCoE…
TCP/IP for Wireless, LAN, Datacenter and Cloud networking
Remains “a great all-in-one adapter”*
Call to action
Contribute to RDMA and NFS/RDMA in Linux
Mailing lists linux-rdma and linux-nfs on vger.kernel.org24
* Helen Chen et al. OFA NFS/RDMA Presentation 2007
2014 Storage Developer Conference. © Chelsio Communications, Inc. All Rights Reserved.
Thank You
Please visit www.chelsio.com for more info
25