s9709 dynamic sharing of gpus and io in a pcie network · 2019-03-29 · pci express (pcie) is the...
TRANSCRIPT
![Page 1: S9709 Dynamic Sharing of GPUs and IO in a PCIe Network · 2019-03-29 · PCI Express (PCIe) is the most widely adopted I/O interconnection technology used in computer systems today](https://reader033.vdocument.in/reader033/viewer/2022052718/5f05fcde7e708231d415ba9c/html5/thumbnails/1.jpg)
S9709Dynamic Sharing of GPUs and IO in a PCIe Network
Håkon Kvale StenslandSenior Research Scientist / Associate Professor
Simula Research Laboratory / University of Oslo
![Page 2: S9709 Dynamic Sharing of GPUs and IO in a PCIe Network · 2019-03-29 · PCI Express (PCIe) is the most widely adopted I/O interconnection technology used in computer systems today](https://reader033.vdocument.in/reader033/viewer/2022052718/5f05fcde7e708231d415ba9c/html5/thumbnails/2.jpg)
• Motivation
• PCIe Overview
• Non-Transparent Bridges
• Dolphin SmartIO
• Example Application
• NVMe sharing
• SmartIO in Virtual Machines
Outline
![Page 3: S9709 Dynamic Sharing of GPUs and IO in a PCIe Network · 2019-03-29 · PCI Express (PCIe) is the most widely adopted I/O interconnection technology used in computer systems today](https://reader033.vdocument.in/reader033/viewer/2022052718/5f05fcde7e708231d415ba9c/html5/thumbnails/3.jpg)
Distributed applications may need to access and use IO resources that are physically located inside remote hosts
Front-end
Interconnect
. . .. . .
. . .
. . .Control +
Signaling + Data
Compute nodeCompute nodeCompute node
…… …
![Page 4: S9709 Dynamic Sharing of GPUs and IO in a PCIe Network · 2019-03-29 · PCI Express (PCIe) is the most widely adopted I/O interconnection technology used in computer systems today](https://reader033.vdocument.in/reader033/viewer/2022052718/5f05fcde7e708231d415ba9c/html5/thumbnails/4.jpg)
• rCUDA
• CUDA-aware Open MPI
• Custom GPUDirect RDMA implementation
• . . .
Software abstractions simplify the use and allocation of resources in a cluster and facilitate development of distributed applications
Front-end
Logical view of resources
…
…
…
…
…
…
. . .
Control + Signaling +
Data
Handled in software
![Page 5: S9709 Dynamic Sharing of GPUs and IO in a PCIe Network · 2019-03-29 · PCI Express (PCIe) is the most widely adopted I/O interconnection technology used in computer systems today](https://reader033.vdocument.in/reader033/viewer/2022052718/5f05fcde7e708231d415ba9c/html5/thumbnails/5.jpg)
Local
Remote
Application Application
Local resource Remote resource using middleware
CUDA library + driver CUDA – middleware integration
PCIe IO bus
CUDA driver
Interconnect transport (RDMA)
Interconnect transport (RDMA)
Middleware service/daemon
Interconnect
Middleware service
PCIe IO bus
![Page 6: S9709 Dynamic Sharing of GPUs and IO in a PCIe Network · 2019-03-29 · PCI Express (PCIe) is the most widely adopted I/O interconnection technology used in computer systems today](https://reader033.vdocument.in/reader033/viewer/2022052718/5f05fcde7e708231d415ba9c/html5/thumbnails/6.jpg)
In PCIe clusters, the same fabric is used both as local IO bus within a single node and as the interconnect between separate nodes
External PCIe cable
PCIe interconnect switch RAMMemory bus
PCIe bus
PCIe interconnect host adapter
PCIe IO device
CPU and chipset
Interconnect switch
![Page 7: S9709 Dynamic Sharing of GPUs and IO in a PCIe Network · 2019-03-29 · PCI Express (PCIe) is the most widely adopted I/O interconnection technology used in computer systems today](https://reader033.vdocument.in/reader033/viewer/2022052718/5f05fcde7e708231d415ba9c/html5/thumbnails/7.jpg)
Local
Remote
Application
Local resource
CUDA library + driver
PCIe IO bus
PCIe IO bus
PCIe-based interconnect
Remote resource over native fabric
Application
CUDA library + driver
PCIe IO bus
![Page 8: S9709 Dynamic Sharing of GPUs and IO in a PCIe Network · 2019-03-29 · PCI Express (PCIe) is the most widely adopted I/O interconnection technology used in computer systems today](https://reader033.vdocument.in/reader033/viewer/2022052718/5f05fcde7e708231d415ba9c/html5/thumbnails/8.jpg)
PCIe Overview
![Page 9: S9709 Dynamic Sharing of GPUs and IO in a PCIe Network · 2019-03-29 · PCI Express (PCIe) is the most widely adopted I/O interconnection technology used in computer systems today](https://reader033.vdocument.in/reader033/viewer/2022052718/5f05fcde7e708231d415ba9c/html5/thumbnails/9.jpg)
PCI Express (PCIe) is the most widely adopted I/O interconnection technology used in computer systems today
0
10
20
30
40
50
60
70
Gen3 Gen4 Gen5
Gig
abyt
es p
er s
eco
nd
(G
B/s
)
PCIe x4
PCIe x8
PCIe x16
Most common
today
Current standard
Near future -ish
![Page 10: S9709 Dynamic Sharing of GPUs and IO in a PCIe Network · 2019-03-29 · PCI Express (PCIe) is the most widely adopted I/O interconnection technology used in computer systems today](https://reader033.vdocument.in/reader033/viewer/2022052718/5f05fcde7e708231d415ba9c/html5/thumbnails/10.jpg)
The PCIe fabric is structured as a tree, where devices form the leaf nodes (endpoints) and the CPU is on top of the root
Device
(endpoint)
SwitchRoot port
![Page 11: S9709 Dynamic Sharing of GPUs and IO in a PCIe Network · 2019-03-29 · PCI Express (PCIe) is the most widely adopted I/O interconnection technology used in computer systems today](https://reader033.vdocument.in/reader033/viewer/2022052718/5f05fcde7e708231d415ba9c/html5/thumbnails/11.jpg)
The PCIe fabric is structured as a tree, where devices form the leaf nodes (endpoints) and the CPU is on top of the root
$ lspci –tv
![Page 12: S9709 Dynamic Sharing of GPUs and IO in a PCIe Network · 2019-03-29 · PCI Express (PCIe) is the most widely adopted I/O interconnection technology used in computer systems today](https://reader033.vdocument.in/reader033/viewer/2022052718/5f05fcde7e708231d415ba9c/html5/thumbnails/12.jpg)
Memory reads and writes are handled by PCIe as transactions that are packet-switched through the fabric depending on the address
RAM
PCIe device PCIe device
PCIe device
CPU and chipset
• Upstream• Downstream• Peer-to-peer (shortest path)
![Page 13: S9709 Dynamic Sharing of GPUs and IO in a PCIe Network · 2019-03-29 · PCI Express (PCIe) is the most widely adopted I/O interconnection technology used in computer systems today](https://reader033.vdocument.in/reader033/viewer/2022052718/5f05fcde7e708231d415ba9c/html5/thumbnails/13.jpg)
IO devices and the CPU share the same physical address space, allowing devices to access system memory and other devices
RAM
PCIe device PCIe device
PCIe device
CPU and chipset
Address space0x00000…
0xFFFFF…
IO device
IO device
IO device
Interrupt vecs
• Memory-mapped IO (MMIO / PIO)• Direct Memory Access (DMA)• Message-Signaled Interrupts (MSI-X)
RAM
0xfee00xxx
![Page 14: S9709 Dynamic Sharing of GPUs and IO in a PCIe Network · 2019-03-29 · PCI Express (PCIe) is the most widely adopted I/O interconnection technology used in computer systems today](https://reader033.vdocument.in/reader033/viewer/2022052718/5f05fcde7e708231d415ba9c/html5/thumbnails/14.jpg)
Non-Transparent Bridges
![Page 15: S9709 Dynamic Sharing of GPUs and IO in a PCIe Network · 2019-03-29 · PCI Express (PCIe) is the most widely adopted I/O interconnection technology used in computer systems today](https://reader033.vdocument.in/reader033/viewer/2022052718/5f05fcde7e708231d415ba9c/html5/thumbnails/15.jpg)
We can interconnect separate PCIe root complexes and translateaddresses between them using a non-transparent bridge (NTB)
ExternalPCIe Cable
Non-TransparentBridge (NTB)
![Page 16: S9709 Dynamic Sharing of GPUs and IO in a PCIe Network · 2019-03-29 · PCI Express (PCIe) is the most widely adopted I/O interconnection technology used in computer systems today](https://reader033.vdocument.in/reader033/viewer/2022052718/5f05fcde7e708231d415ba9c/html5/thumbnails/16.jpg)
RAM
Remote address space can be mapped into local address space by using PCIe Non-Transparent Bridges (NTBs)
PCIe NTB adapter
CPU and chipset
PCIe NTB adapter
CPU and chipset
Address space
NTB
Local host
Remote host
Local RAM
Local Remote
0xf000 0x9000
. . . . . .
NTB addr mapping
RAM
![Page 17: S9709 Dynamic Sharing of GPUs and IO in a PCIe Network · 2019-03-29 · PCI Express (PCIe) is the most widely adopted I/O interconnection technology used in computer systems today](https://reader033.vdocument.in/reader033/viewer/2022052718/5f05fcde7e708231d415ba9c/html5/thumbnails/17.jpg)
Using NTBs, each node in the cluster take part in a shared address space and have their own “window” into the global address space
NTB-based interconnect
Global addr space
A B C
Addr space in B
Addr space in C
Local RAM
Local IO devices
Global addr space
A’s addr space
Local RAM
Local IO devices
C’s addr space
Exported address range
Addr space in A
![Page 18: S9709 Dynamic Sharing of GPUs and IO in a PCIe Network · 2019-03-29 · PCI Express (PCIe) is the most widely adopted I/O interconnection technology used in computer systems today](https://reader033.vdocument.in/reader033/viewer/2022052718/5f05fcde7e708231d415ba9c/html5/thumbnails/18.jpg)
SmartIO
![Page 19: S9709 Dynamic Sharing of GPUs and IO in a PCIe Network · 2019-03-29 · PCI Express (PCIe) is the most widely adopted I/O interconnection technology used in computer systems today](https://reader033.vdocument.in/reader033/viewer/2022052718/5f05fcde7e708231d415ba9c/html5/thumbnails/19.jpg)
Local
Remote
Application
Borrowed remote resource
CUDA library + driver
PCIe IO bus
PCIe IO bus
PCIe NTB interconnect
Unmodified local driver(with hot-plug support)
Resource appears local to OS, driver, and app
Hardware mappings ensure fast data path
Works with any PCIe device(even individual SR-IOV functions)
![Page 20: S9709 Dynamic Sharing of GPUs and IO in a PCIe Network · 2019-03-29 · PCI Express (PCIe) is the most widely adopted I/O interconnection technology used in computer systems today](https://reader033.vdocument.in/reader033/viewer/2022052718/5f05fcde7e708231d415ba9c/html5/thumbnails/20.jpg)
Local
Remote
Application Application
Borrowed remote resource Remote resource using middleware
CUDA library + driver CUDA – middleware integration
CUDA driver
Interconnect transport (RDMA)
Interconnect transport (RDMA)
Middleware service/daemon
Interconnect
Middleware service
PCIe IO bus
PCIe IO bus
PCIe IO bus
PCIe NTB interconnect
![Page 21: S9709 Dynamic Sharing of GPUs and IO in a PCIe Network · 2019-03-29 · PCI Express (PCIe) is the most widely adopted I/O interconnection technology used in computer systems today](https://reader033.vdocument.in/reader033/viewer/2022052718/5f05fcde7e708231d415ba9c/html5/thumbnails/21.jpg)
Device to host transfers: Comparing local to borrowed GPU
0
2
4
6
8
10
12
14
4 KB 8 KB 16 KB 32 KB 64 KB 128 KB 256 KB 512 KB 1 MB 2 MB 4 MB 8 MB 16 MB
Gig
abyt
es p
er s
eco
nd
(G
B/s
)
Transfer size
bandwidthTest (Local) bandwidthTest (Borrowed) PXH830 DMA (GPUDirect RDMA)
![Page 22: S9709 Dynamic Sharing of GPUs and IO in a PCIe Network · 2019-03-29 · PCI Express (PCIe) is the most widely adopted I/O interconnection technology used in computer systems today](https://reader033.vdocument.in/reader033/viewer/2022052718/5f05fcde7e708231d415ba9c/html5/thumbnails/22.jpg)
Device pool
Using Device Lending, nodes in a PCIe cluster can share resources through a process of borrowing and giving back devices
Task A Task B Task C
GPU
SSD SSD SSD
NIC FPGA GPU
GPU GPU GPU
CPU + chipset
RAM
CPU + chipset
RAM
CPU + chipset
RAM
NTB
NTB
NTB
SSDSSDSSD
NIC
FPGA
Task B
Task A
GPU GPU GPU SSD
FPGA NIC
GPU SSD
SSD
GPUGPUGPU
Peer-to-peer
Task C
![Page 23: S9709 Dynamic Sharing of GPUs and IO in a PCIe Network · 2019-03-29 · PCI Express (PCIe) is the most widely adopted I/O interconnection technology used in computer systems today](https://reader033.vdocument.in/reader033/viewer/2022052718/5f05fcde7e708231d415ba9c/html5/thumbnails/23.jpg)
Device pool
Using Device Lending, nodes in a PCIe cluster can share resources through a process of borrowing and giving back devices
GPU
SSD SSD SSD
NIC FPGA GPU
GPU GPU GPU
CPU + chipset
RAM
CPU + chipset
RAM
CPU + chipset
RAM
NTB
NTB
NTB
SSDSSDSSD
NIC
FPGA
Task A
Task A
Task A
GPU SSD GPU SSD GPU SSD
GPUGPUGPU
Task A
![Page 24: S9709 Dynamic Sharing of GPUs and IO in a PCIe Network · 2019-03-29 · PCI Express (PCIe) is the most widely adopted I/O interconnection technology used in computer systems today](https://reader033.vdocument.in/reader033/viewer/2022052718/5f05fcde7e708231d415ba9c/html5/thumbnails/24.jpg)
Processing of Medical Videos
P9258 - Efficient Processing of Medical Videos in a Multi-auditory Environment Using GPU Lending
Example Application
![Page 25: S9709 Dynamic Sharing of GPUs and IO in a PCIe Network · 2019-03-29 · PCI Express (PCIe) is the most widely adopted I/O interconnection technology used in computer systems today](https://reader033.vdocument.in/reader033/viewer/2022052718/5f05fcde7e708231d415ba9c/html5/thumbnails/25.jpg)
Scenario: Real-time computer-aided polyp detection
• PCIe fiber cables can be up to 100 meters. • Enable ”thin clients” to use GPUs in remote
machine room
![Page 26: S9709 Dynamic Sharing of GPUs and IO in a PCIe Network · 2019-03-29 · PCI Express (PCIe) is the most widely adopted I/O interconnection technology used in computer systems today](https://reader033.vdocument.in/reader033/viewer/2022052718/5f05fcde7e708231d415ba9c/html5/thumbnails/26.jpg)
Flexible sharing of GPU resources between multiple examination rooms
• System uses a combination of classic computer vision algorithms and machine learning.
• Research prototype since 2016.
![Page 27: S9709 Dynamic Sharing of GPUs and IO in a PCIe Network · 2019-03-29 · PCI Express (PCIe) is the most widely adopted I/O interconnection technology used in computer systems today](https://reader033.vdocument.in/reader033/viewer/2022052718/5f05fcde7e708231d415ba9c/html5/thumbnails/27.jpg)
Sharing of NVMe drives
For more details:S9563 - Efficient Distributed Storage I/O using NVMe and GPU Direct in a PCIe Network
or
Visit Dolphin Interconnect Solutions in booth 1520
![Page 28: S9709 Dynamic Sharing of GPUs and IO in a PCIe Network · 2019-03-29 · PCI Express (PCIe) is the most widely adopted I/O interconnection technology used in computer systems today](https://reader033.vdocument.in/reader033/viewer/2022052718/5f05fcde7e708231d415ba9c/html5/thumbnails/28.jpg)
RAM
CPU and chipset
Disk memory(registers)
0x00000…
0xFFFFF…
Queue0 doorbellQueue1 doorbellQueueN doorbell
Interrupt vectors
RAM
Command Queue0
Command Queue1
Command QueueN
DataNVMe disk
NVMe driver
Read N blocks to address 0x9000
Command
complete
Example: NVMe disk operation (simplified)
![Page 29: S9709 Dynamic Sharing of GPUs and IO in a PCIe Network · 2019-03-29 · PCI Express (PCIe) is the most widely adopted I/O interconnection technology used in computer systems today](https://reader033.vdocument.in/reader033/viewer/2022052718/5f05fcde7e708231d415ba9c/html5/thumbnails/29.jpg)
RAM
CPU and chipset
Disk memory(registers)
0x00000…
0xFFFFF…
Queue0 doorbellQueue1 doorbellQueueN doorbell
GPU memory
NVMe disk
CUDA program
GPUUserspace NVMe driver using GPUDirecthttps://github.com/enfiskutensykkel/ssd-gpu-dma
SmartIO enabled driver: NVMe on GPU
Command Queue0
Command Queue1
Command QueueN
Mapped doorbells
buf = cudaMalloc(...);
addr = nvidia_p2p_get_pages(buf);
ptr = mmap(...);
devptr = cudaHostRegister(ptr);
Peer-to-peer
![Page 30: S9709 Dynamic Sharing of GPUs and IO in a PCIe Network · 2019-03-29 · PCI Express (PCIe) is the most widely adopted I/O interconnection technology used in computer systems today](https://reader033.vdocument.in/reader033/viewer/2022052718/5f05fcde7e708231d415ba9c/html5/thumbnails/30.jpg)
NTB-based interconnect
RAM
CPU and chipset
NVMe disk
A B C
NVMe disk
C’s addr space
Exported address range
Queue doorbells
RAM
NTB
Mapped doorbell
A’s addr space
Command Queue0
RAM
NTB
Mapped doorbell
B’s addr space
Command Queue1
NTB
Mapped Queue0
Exported
Example: NVMe queues hosted remotely
![Page 31: S9709 Dynamic Sharing of GPUs and IO in a PCIe Network · 2019-03-29 · PCI Express (PCIe) is the most widely adopted I/O interconnection technology used in computer systems today](https://reader033.vdocument.in/reader033/viewer/2022052718/5f05fcde7e708231d415ba9c/html5/thumbnails/31.jpg)
Read latency for reading blocks from a NVMe disk into a GPU: Local versus borrowed disk
![Page 32: S9709 Dynamic Sharing of GPUs and IO in a PCIe Network · 2019-03-29 · PCI Express (PCIe) is the most widely adopted I/O interconnection technology used in computer systems today](https://reader033.vdocument.in/reader033/viewer/2022052718/5f05fcde7e708231d415ba9c/html5/thumbnails/32.jpg)
SmartIO in Virtual Machines
![Page 33: S9709 Dynamic Sharing of GPUs and IO in a PCIe Network · 2019-03-29 · PCI Express (PCIe) is the most widely adopted I/O interconnection technology used in computer systems today](https://reader033.vdocument.in/reader033/viewer/2022052718/5f05fcde7e708231d415ba9c/html5/thumbnails/33.jpg)
SmartIO fully supports to lend devices to virtual machines running in Linux KVM uning Virtual Function IO API (VFIO)
![Page 34: S9709 Dynamic Sharing of GPUs and IO in a PCIe Network · 2019-03-29 · PCI Express (PCIe) is the most widely adopted I/O interconnection technology used in computer systems today](https://reader033.vdocument.in/reader033/viewer/2022052718/5f05fcde7e708231d415ba9c/html5/thumbnails/34.jpg)
Pass-through allows physical devices to be used by VMs withminimal overhead, but is not as flexible as resource virtualization
Physical View
Pass-through of Physical Resources
Virtual or Paravirtualized Resources
Minimal VirtualizationOverhead
Dynamic Provisioning &Flexible Composition
![Page 35: S9709 Dynamic Sharing of GPUs and IO in a PCIe Network · 2019-03-29 · PCI Express (PCIe) is the most widely adopted I/O interconnection technology used in computer systems today](https://reader033.vdocument.in/reader033/viewer/2022052718/5f05fcde7e708231d415ba9c/html5/thumbnails/35.jpg)
Traversing the NTB
Guest OS: Ubuntu 17.04, Host OS: CentOS 7VM: Qemu 2.17 using KVMNVMe Disk: Intel 900P Optane (PCIe x4 Gen3)
Almost same bandwidth
Passing through a remote NVMe disk to a VM only adds the latencyof traversing the NTB and is comparable to a physical borrower
![Page 36: S9709 Dynamic Sharing of GPUs and IO in a PCIe Network · 2019-03-29 · PCI Express (PCIe) is the most widely adopted I/O interconnection technology used in computer systems today](https://reader033.vdocument.in/reader033/viewer/2022052718/5f05fcde7e708231d415ba9c/html5/thumbnails/36.jpg)
Thank you!
“Device Lending in PCI Express Networks”
ACM NOSSDAV 2016
“Efficient Processing of Video in a Multi Auditory
Environment using Device Lending of GPUs”
ACM Multimedia Systems 2016 (MMSys’16)
“Flexible Device Sharing in PCIe Clusters using
Device Lending”, International Conference on
Parallel Processing Companion (ICPP'18 Comp)
SmartIO & Device Lending demo with GPUs, NVMe and moreVisit Dolphin in the exhibition area (booth 1520)
Selected publications