virtual rdma networking for containerized clouds · freeflow: software-based virtual rdma...
TRANSCRIPT
FreeFlow: Software-based Virtual RDMA Networking for Containerized Clouds
Daehyeok Kim
Tianlong Yu1, Hongqiang Liu3, Yibo Zhu4, Jitu Padhye2, Shachar Raindel2
Chuanxiong Guo4, Vyas Sekar1, Srinivasan Seshan1
Carnegie Mellon University1, Microsoft2, Alibaba group3, Bytedance4
Two Trends in Cloud Applications
• Lightweight isolation
• Portability
1
• Higher networking performance
Containerization RDMA networking
Benefits of Containerization
2
NIC
Container 1
IP: 10.0.0.1
IP: 30.0.0.1
NetworkApp
Container 2
IP: 20.0.0.1
NetworkApp
Host 1 Host 2
NIC
Container 2
NetworkApp
IP: 20.0.0.1
IP: 40.0.0.1
Migration
Namespace Isolation
PortabilitySoftware Switch
Software Switch
Containerization and RDMA are in Conflict!
3
RDMA NIC
Container 1
IP: 10.0.0.1
IP: 10.0.0.1
RDMAApp
Container 2
IP: 10.0.0.1
RDMAApp
Host 1 Host 2
RDMA NIC
Container 2
RDMAApp
IP: 20.0.0.1
IP: 20.0.0.1
Migration
Namespace Isolation
Portability
Existing H/W based Virtualization Isn’t Working
4
Container 1
IP: 10.0.0.1
IP: 10.0.0.1
RDMAApp
Container 2
IP: 10.0.0.2
RDMAApp
Container 2
RDMAApp
IP: 20.0.0.1
Host 1 Host 2
VF 1 VF 2
NIC Switch
RDMA NIC
IP: 10.0.0.2
VF
NIC Switch
IP: 20.0.0.1
Migration
Using Single Root I/O Virtualization (SR-IOV)
VF Virtual Function
Namespace Isolation
Portability
Sub-optimal Performance of Containerized Apps
5
0
1000
2000
3000
Resnet-50 Inception-v3 AlexnetTr
ain
ing
Spee
d
(Im
ages
/sec
)Model
Native RDMA
Container+TCP
RDMA networking can improve the training speed of NN model by ~ 10x !
9.2x14.4x
Speech recognition RNN training Image classification CNN training
0
0.5
1
0 10 20 30 40
CD
F
Time per step (sec)
Native RDMA Container+TCP
Our Work: FreeFlow
• Enable high speed RDMA networking capabilities for containerized applications
• Compatible with existing RDMA applications
• Close to native RDMA performance• Evaluation with real-world data-intensive applications
6
Outline
• Motivation
• FreeFlow Design
• Implementation and Evaluation
7
FreeFlow Design Overview
8
RDMA App
RDMA NIC
Native RDMA FreeFlow
RDMA NIC
Container 1
IP: 10.0.0.1RDMA App
Container 2
IP: 20.0.0.1
FreeFlow
IP: 30.0.0.1
Host Host
Verbs library
Verbs library
Verbs APIVerbs API
RDMA App
Verbs API
NIC command
Background on RDMA
9
RDMA App
RDMA NIC
MEM-1RDMA CTX
RDMA App
RDMA NIC
MEM-2 RDMA CTX
1. Control path- Setup RDMA Context- Post work requests (e.g., write)
2. Data path- NIC processes work requests- NIC directly accesses memory
“Host 1 wants to write contents in MEM-1 to MEM-2 on Host 2”
Host 1 Host 2
Verbs library Verbs library
FreeFlow in the Scene
10
RDMA App
RDMA NIC
MEM-1RDMA CTX
“Container 1 wants to write contents in MEM-1 to MEM-2 on Container 2”
Container 1 Container 2
FreeFlow
S-RDMA CTX S-MEM-1
RDMA App
RDMA NIC
RDMA CTXMEM-2
FreeFlow
S-MEM-2 S-RDMA CTX
C1: How to forward verbs calls?
Verbs library Verbs library
C2: How to synchronize memory?
Challenge 1: Verbs forwarding in Control Path
11
RDMA App
RDMA NIC
FreeFlow
Verbs library
Container
NIC command
Verbs API
?
RDMA App
Shim
ibv_post_send (struct ibv_qp* qp, …)
Attempt 1: Forward “as it is”➔Incorrect
Attempt 2: “Serialize” and forward➔ Inefficient
struct ibv_qp {struct ibv_context *context;….
};
Internal Structure of Verbs Library
12
RDMA App
RDMA NIC
FreeFlow
Verbs library
Container
NIC command
Verbs API
?
RDMA App
Shim
ibv_post_send (struct ibv_qp* qp, …)
struct ibv_qp {struct ibv_context *context;….
};
Parameters are serialized by Verbs library!
FreeFlow Control Path Channel
13
RDMA App
FreeFlow library
Write (VNIC_fd, serialized parameters)
Parameters are forwarded correctly without manual serialization!
Idea: Leveraging the serialized output of verbs library
Verbs library
Shim
RDMA App
RDMA NIC
FreeFlow Router
Verbs library
Container
NIC command
Verbs API
VNIC
ibv_post_send (struct ibv_qp* qp, ….)
FreeFlow Router
VNIC
Challenge 2: Synchronizing Memory for Data Path
14
• Shadow memory in FreeFlow router• A copy of application’s memory region• Directly accessed by NICs
• S-MEM and MEM must be synchronized.
• How to synchronize S-MEM and MEM?
RDMA App
RDMA NIC
MEMRDMA CTX
FreeFlow Router
S-RDMA CTX S-MEM
Verbs library
VNIC
Container
Strawman Approach for Synchronization
15
“Container 1 wants to write contents in MEM-1 to MEM-2 on Container 2”
RDMA App
RDMA NIC
MEM-1RDMA CTX
FreeFlow Router
S-RDMA CTX S-MEM-1
Verbs library
VNIC
Container
RDMA App
RDMA NIC
RDMA CTXMEM-2
FreeFlow Router
S-MEM-2 S-RDMA CTX
Verbs library
VNIC
Container
DATA
?Explicit synchronizationHigh freq.➔ High overheadLow freq.➔Wrong data for app
Containers can Share Memory Regions
16
RDMA App
RDMA NIC
MEM-1RDMA CTX
FreeFlow Router
S-RDMA CTX S-MEM-1
Verbs library
VNIC
ContainerHost
Shared memory
MEM
MEM and S-MEM can be located onthe same physical memory region
• FreeFlow router is running in a container
Zero-copy Synchronization in Data Path
17
RDMA App
RDMA NIC
MEM-1RDMA CTX
FreeFlow Router
S-RDMA CTX S-MEM-1
Verbs library
VNIC
ContainerHost
Shared memory
MEM
Synchronization without explicit memory copy:Method1: Allocate shared buffers with FreeFlow APIsMethod2: Re-map app’s memory space to shadowmemory space
FreeFlow supports both!
How to allocated MEM-1 to shadow memory space?
FreeFlow Design Summary
18
FreeFlow control path channel
Zero-copy memory synchronization
FreeFlow provides near native RDMA performance for containers!
RDMA NIC
Container 1
IP: 10.0.0.1
RDMA App
Container 2
IP: 20.0.0.1
FreeFlow Router
IP: 30.0.0.1
Verbs library
RDMA App
VNIC VNIC
Outline
• Motivation
• FreeFlow Design
• Implementation and Evaluation
19
Implementation and Experimental Setup
• FreeFlow Library• Add 4000 lines in C to libibverbs and libmlx4.
• FreeFlow Router• 2000 lines in C++
• Testbed setup• Two Intel Xeon E5-2620 8-core CPUs, 64 GB RAM
• 56 Gbps Mellanox ConnectX-3 NICs
• Docker containers
20
Does FreeFlow Support Low Latency?
21
0
1
2
3
4
64 256 1K 4K
Late
ncy
(u
s)
Message size (B)
Native RDMA FreeFlow
0.38μs
Does FreeFlow Support High Throughput?
22
0
20
40
60
2K 8K 32K 128K 512K 1M
Thro
ugh
pu
t (G
bp
s)
Message size (B)
Native RDMA
FreeFlow
Bounded by control path channel performance
Do Applications Benefit from FreeFlow?
23
0
0.5
1
0 10 20 30 40
CD
F
Time per step (sec)
Container+TCP Native RDMA FreeFlow
8.7x
Summary
• Containerization today can’t benefit from speed of RDMA.
• Existing solutions for NIC virtualization don’t work (e.g., SR-IOV).
• FreeFlow enables containerized apps to use RDMA.
• Challenges and Key Ideas• Control path: Leveraging Verbs library structure for efficient Verbs forwarding
• Data path: Zero-copy memory synchronization
• Performance close to native RDMA
24github.com/microsoft/freeflow