gso, pacing and zerocopy for content delivery: willemb
TRANSCRIPT
![Page 1: GSO, pacing and zerocopy for content delivery: willemb](https://reader034.vdocument.in/reader034/viewer/2022042702/6265e2c0980572059e7b06e9/html5/thumbnails/1.jpg)
Optimizing UDPfor content delivery:GSO, pacing and zerocopyWillem de [email protected]
![Page 2: GSO, pacing and zerocopy for content delivery: willemb](https://reader034.vdocument.in/reader034/viewer/2022042702/6265e2c0980572059e7b06e9/html5/thumbnails/2.jpg)
Workload: QUIC
>35 % of Google egress
● stream multiplexing, low latency connection establishment, …
● 2x higher cycle/Byte than TCP
● serving ~10K concurrent 1 MBps flows per server
"The QUIC Transport Protocol", SIGCOMM 2017"QUIC - Developing and Deploying a TCP Replacement for the Web", netdevconf 0x12"Live encoder settings, bitrates, and resolutions", support.google.com/youtube/answer/2853702
![Page 3: GSO, pacing and zerocopy for content delivery: willemb](https://reader034.vdocument.in/reader034/viewer/2022042702/6265e2c0980572059e7b06e9/html5/thumbnails/3.jpg)
UDP
unreliable datagrams.. but also:
rapid experimentation & deployment
○ widely available
○ no superuser privileges
○ middlebox support
○ thin service, so highly extensible
![Page 4: GSO, pacing and zerocopy for content delivery: willemb](https://reader034.vdocument.in/reader034/viewer/2022042702/6265e2c0980572059e7b06e9/html5/thumbnails/4.jpg)
UDP cycle efficiency
calls/s Mcycles/s Speed-up (%)
TCP 19040 618 487
UDP 812000 2801 100
tools/testing/selftests/net/udpgso_bench_tx
![Page 5: GSO, pacing and zerocopy for content delivery: willemb](https://reader034.vdocument.in/reader034/viewer/2022042702/6265e2c0980572059e7b06e9/html5/thumbnails/5.jpg)
UDP cycle efficiency
calls/s Mcycles/s Speed-up (%)
TCP no-segs 19040 2800 100
TCP gso 19040 1856 162
TCP tso 19040 618 487
UDP 812000 2801 100
ethtool -k $DEV tso off gso on
![Page 6: GSO, pacing and zerocopy for content delivery: willemb](https://reader034.vdocument.in/reader034/viewer/2022042702/6265e2c0980572059e7b06e9/html5/thumbnails/6.jpg)
Optimizing the UDP serving stack
● UDP_SEGMENT● GSO_PARTIAL● MSG_ZEROCOPY● SO_TXTIME● UDP_GRO
![Page 7: GSO, pacing and zerocopy for content delivery: willemb](https://reader034.vdocument.in/reader034/viewer/2022042702/6265e2c0980572059e7b06e9/html5/thumbnails/7.jpg)
Work of many others
● Alexander Duyk● Boris Pismenny● Edward Cree● Eric Dumazet● Jesus Sanchez-Palencia● Paolo Abeni● Steffen Klassert● ...
![Page 8: GSO, pacing and zerocopy for content delivery: willemb](https://reader034.vdocument.in/reader034/viewer/2022042702/6265e2c0980572059e7b06e9/html5/thumbnails/8.jpg)
GSO:fewer, larger packets
![Page 9: GSO, pacing and zerocopy for content delivery: willemb](https://reader034.vdocument.in/reader034/viewer/2022042702/6265e2c0980572059e7b06e9/html5/thumbnails/9.jpg)
UDP GSO
virtual high MTU link
~45x reduction in stack traversals
![Page 10: GSO, pacing and zerocopy for content delivery: willemb](https://reader034.vdocument.in/reader034/viewer/2022042702/6265e2c0980572059e7b06e9/html5/thumbnails/10.jpg)
UDP GSO: stack traversal
![Page 11: GSO, pacing and zerocopy for content delivery: willemb](https://reader034.vdocument.in/reader034/viewer/2022042702/6265e2c0980572059e7b06e9/html5/thumbnails/11.jpg)
UDP GSO != UFO
![Page 12: GSO, pacing and zerocopy for content delivery: willemb](https://reader034.vdocument.in/reader034/viewer/2022042702/6265e2c0980572059e7b06e9/html5/thumbnails/12.jpg)
UDP GSO: interface
int gso_size = ETH_DATA_LEN - sizeof(struct ipv6hdr) - sizeof(struct udphdr);
if (setsockopt(fd, SOL_UDP, UDP_SEGMENT, &gso_size, sizeof(gso_size)))
error(1, errno, "setsockopt udp segment");
cm = CMSG_FIRSTHDR(&msg);
cm->cmsg_level = SOL_UDP;
cm->cmsg_type = UDP_SEGMENT;
cm->cmsg_len = CMSG_LEN(sizeof(uint16_t));
*((uint16_t *) CMSG_DATA(cm)) = gso_size;
ret = sendmsg(fd, &msg, 0);
![Page 13: GSO, pacing and zerocopy for content delivery: willemb](https://reader034.vdocument.in/reader034/viewer/2022042702/6265e2c0980572059e7b06e9/html5/thumbnails/13.jpg)
UDP GSO: evaluation
calls/s Mcycles/s Speed-up (%)
TCP no-segs 19040 2800 100
TCP gso 19040 1856 162
TCP tso 19040 618 487
UDP 812000 2801 100
UDP gso 18248 1726 174
![Page 14: GSO, pacing and zerocopy for content delivery: willemb](https://reader034.vdocument.in/reader034/viewer/2022042702/6265e2c0980572059e7b06e9/html5/thumbnails/14.jpg)
UDP GSO: evaluation
calls/s Mcycles/s Speed-up (%)
TCP no-segs 19040 2800 100
TCP gso 19040 1856 162
TCP tso 19040 618 487
UDP 812000 2801 100
UDP gso 18248 1726 174
UDP lso .. .. ..
"Udp segmentation offload", netdevconf 0x12
![Page 15: GSO, pacing and zerocopy for content delivery: willemb](https://reader034.vdocument.in/reader034/viewer/2022042702/6265e2c0980572059e7b06e9/html5/thumbnails/15.jpg)
UDP GSO: hardware
"Encapsulation Offloads: LCO, GSO_PARTIAL, [..]", netdevconf 1.2
![Page 16: GSO, pacing and zerocopy for content delivery: willemb](https://reader034.vdocument.in/reader034/viewer/2022042702/6265e2c0980572059e7b06e9/html5/thumbnails/16.jpg)
UDP GSO: hybrid
"Encapsulation Offloads: LCO, GSO_PARTIAL, [..]", netdevconf 1.2
GSO_PARTIAL
![Page 17: GSO, pacing and zerocopy for content delivery: willemb](https://reader034.vdocument.in/reader034/viewer/2022042702/6265e2c0980572059e7b06e9/html5/thumbnails/17.jpg)
UDP GSO: implementation details
● choosing gso_size○ ETH_DATA_LEN○ IP_MTU_DISCOVER
● choosing number of segments○ fit in network layer○ <= UDP_MAX_SEGMENTS○ > gso_size
● checksum offload○ csum_and_copy_from_user
tools/testing/selftests/net/udpgso
![Page 18: GSO, pacing and zerocopy for content delivery: willemb](https://reader034.vdocument.in/reader034/viewer/2022042702/6265e2c0980572059e7b06e9/html5/thumbnails/18.jpg)
MSG_ZEROCOPY:tx copy avoidance
![Page 19: GSO, pacing and zerocopy for content delivery: willemb](https://reader034.vdocument.in/reader034/viewer/2022042702/6265e2c0980572059e7b06e9/html5/thumbnails/19.jpg)
MSG_ZEROCOPY
perf record netperf -t TCP_STREAM -H $host
Samples: 42K of event 'cycles', Event count (approx.): 21258597313 79.41% 33884 netperf [kernel.kallsyms] [k] copy_user_generic_string 3.27% 1396 netperf [kernel.kallsyms] [k] tcp_sendmsg 1.66% 694 netperf [kernel.kallsyms] [k] get_page_from_freelist 0.79% 325 netperf [kernel.kallsyms] [k] tcp_ack 0.43% 188 netperf [kernel.kallsyms] [k] __alloc_skb
"sendmsg copy avoidance with MSG_ZEROCOPY", netdevconf 2.1
![Page 20: GSO, pacing and zerocopy for content delivery: willemb](https://reader034.vdocument.in/reader034/viewer/2022042702/6265e2c0980572059e7b06e9/html5/thumbnails/20.jpg)
MSG_ZEROCOPY: evaluation
Copy
copy % Mcyc/s
TCP 26.7 618
UDP 3.11 2800
![Page 21: GSO, pacing and zerocopy for content delivery: willemb](https://reader034.vdocument.in/reader034/viewer/2022042702/6265e2c0980572059e7b06e9/html5/thumbnails/21.jpg)
MSG_ZEROCOPY: evaluation
Copy
copy % Mcyc/s
TCP 4.35 2800
TCP gso 10.3 1856
TCP tso 26.7 618
UDP 3.11 2800
UDP gso 13.4 1727
UDP gso (CT) 21.2 1916
![Page 22: GSO, pacing and zerocopy for content delivery: willemb](https://reader034.vdocument.in/reader034/viewer/2022042702/6265e2c0980572059e7b06e9/html5/thumbnails/22.jpg)
MSG_ZEROCOPY: evaluation
Copy Zerocopy Speed-up
copy % Mcyc/s Mcyc/s %
TCP 4.35 2800 2800 100
TCP gso 10.3 1856 1704 109
TCP tso 26.7 618 425 145
UDP 3.11 2800 2800 100
UDP gso 13.4 1727 1690 102
UDP gso (CT) 21.2 1916 1694 113
![Page 23: GSO, pacing and zerocopy for content delivery: willemb](https://reader034.vdocument.in/reader034/viewer/2022042702/6265e2c0980572059e7b06e9/html5/thumbnails/23.jpg)
Pacing:avoid retransmits
![Page 24: GSO, pacing and zerocopy for content delivery: willemb](https://reader034.vdocument.in/reader034/viewer/2022042702/6265e2c0980572059e7b06e9/html5/thumbnails/24.jpg)
Pacing
● 10k clients at 1MBps○ RR? 1MB in 100 usec
● Bursts○ -> higher drops○ -> higher retransmit○ -> higher cyc/B
● Pace: send at 1 msec interval
● Pacing offload: reduce jitter, reduce cycle/B■ SO_MAX_PACING_RATE■ SCH_FQ
![Page 25: GSO, pacing and zerocopy for content delivery: willemb](https://reader034.vdocument.in/reader034/viewer/2022042702/6265e2c0980572059e7b06e9/html5/thumbnails/25.jpg)
Pacing: SO_TXTIME interface
const int flags = 0;
setsockopt(fd, SOL_SOCKET, SO_TXTIME, &flags, sizeof(flags))
clock_gettime(CLOCK_TAI, &ts);uint64_t txtime = ts.tv_sec * 1000000000ULL + ts.tv_nsec + txdelay_ns;
cmsg = CMSG_FIRSTHDR(&msg);cmsg->cmsg_level = SOL_SOCKET;cmsg->cmsg_type = SCM_TXTIME;cmsg->cmsg_len = CMSG_LEN(sizeof(__u64));*((__u64 *) CMSG_DATA(cmsg)) = txtime;
"Scheduled packet transmission: Etf", https://lwn.net/Articles/758592/
![Page 26: GSO, pacing and zerocopy for content delivery: willemb](https://reader034.vdocument.in/reader034/viewer/2022042702/6265e2c0980572059e7b06e9/html5/thumbnails/26.jpg)
Pacing: larger bursts with GSO
![Page 27: GSO, pacing and zerocopy for content delivery: willemb](https://reader034.vdocument.in/reader034/viewer/2022042702/6265e2c0980572059e7b06e9/html5/thumbnails/27.jpg)
Pacing: larger bursts with GSO
![Page 28: GSO, pacing and zerocopy for content delivery: willemb](https://reader034.vdocument.in/reader034/viewer/2022042702/6265e2c0980572059e7b06e9/html5/thumbnails/28.jpg)
Pacing & GSO
● Pacing at millisecond granularity○ 1 MBps*
○ 1KB per msec○ < 1 MSS!
● Conflicting goals○ maximize batching○ send at msec interval
![Page 29: GSO, pacing and zerocopy for content delivery: willemb](https://reader034.vdocument.in/reader034/viewer/2022042702/6265e2c0980572059e7b06e9/html5/thumbnails/29.jpg)
Pacing & GSO: evaluation
Pacing interval (msec) CPU time % Loss %
1 100 100
2 92 103
4 88 110
8 84 117
![Page 30: GSO, pacing and zerocopy for content delivery: willemb](https://reader034.vdocument.in/reader034/viewer/2022042702/6265e2c0980572059e7b06e9/html5/thumbnails/30.jpg)
UDP_GRO:batch receive
![Page 31: GSO, pacing and zerocopy for content delivery: willemb](https://reader034.vdocument.in/reader034/viewer/2022042702/6265e2c0980572059e7b06e9/html5/thumbnails/31.jpg)
UDP GRO
● Inverse operation○ larger, fewer packets○ forwarding to GSO○ local delivery
■ transparent● segment● frag list● netfilter redirect
■ large packets○ Listification
"udp: implement gro support", https://lwn.net/Articles/768995/"Handle multiple received packets at each stage", http://patchwork.ozlabs.org/project/netdev/list/?series=53249&state=*
![Page 32: GSO, pacing and zerocopy for content delivery: willemb](https://reader034.vdocument.in/reader034/viewer/2022042702/6265e2c0980572059e7b06e9/html5/thumbnails/32.jpg)
UDP GRO: interface
setsockopt(fd, IPPROTO_UDP, UDP_GRO, &enable, sizeof(enable));
recvmsg(fd, &msg, 0);
for (cm = CMSG_FIRSTHDR(&msg); cm; cm = CMSG_NXTHDR(&msg, cmsg)) if (cm->cmsg_level == SOL_UDP && cm->cmsg_type == UDP_GRO) gso_size = *(uint16_t *) CMSG_DATA(cmsg);
![Page 33: GSO, pacing and zerocopy for content delivery: willemb](https://reader034.vdocument.in/reader034/viewer/2022042702/6265e2c0980572059e7b06e9/html5/thumbnails/33.jpg)
UDP GRO: evaluation
Caveat: no sufficient packet trains across WAN in practice?
Gbps calls/s Mcycles/c Speed-up (%)
UDP 798 568000 3564 100
UDP GRO 1022 40250 2498 182
![Page 34: GSO, pacing and zerocopy for content delivery: willemb](https://reader034.vdocument.in/reader034/viewer/2022042702/6265e2c0980572059e7b06e9/html5/thumbnails/34.jpg)
Summary
UDP_SEGMENTGSO_PARTIALMSG_ZEROCOPYSO_TXTIMEUDP_GRO
Questions?
![Page 35: GSO, pacing and zerocopy for content delivery: willemb](https://reader034.vdocument.in/reader034/viewer/2022042702/6265e2c0980572059e7b06e9/html5/thumbnails/35.jpg)
backup
![Page 36: GSO, pacing and zerocopy for content delivery: willemb](https://reader034.vdocument.in/reader034/viewer/2022042702/6265e2c0980572059e7b06e9/html5/thumbnails/36.jpg)
UDP GRO: configurable GRO
[show interface + cat /proc/sys/ipv4/gro_avail output]
![Page 37: GSO, pacing and zerocopy for content delivery: willemb](https://reader034.vdocument.in/reader034/viewer/2022042702/6265e2c0980572059e7b06e9/html5/thumbnails/37.jpg)
QUIC server architecture
![Page 38: GSO, pacing and zerocopy for content delivery: willemb](https://reader034.vdocument.in/reader034/viewer/2022042702/6265e2c0980572059e7b06e9/html5/thumbnails/38.jpg)
MSG_ZEROCOPY: interface (recap)
send(fd, buf, sizeof(buf), MSG_ZEROCOPY);
pfd = {.fd = fd};
poll(&pfd, 1, -1);
if (pfd.revents & POLLERR)
recvmsg(fd, &msg, MSG_ERRQUEUE);
"udp: zerocopy", http://patchwork.ozlabs.org/patch/899630/