vyattanetwork os - · pdf filevyatta high level architecture 3 ipv4/ipv6 unicast firewall...

46
Vyatta Network OS (vRouter) March 1, 2017 SV Linux Users Group, San Jose, CA Robyn Gutierrez Sven-Thorsten Dietrich Becca Nitzan

Upload: voquynh

Post on 09-Feb-2018

234 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: VyattaNetwork OS - · PDF fileVyatta High Level Architecture 3 IPv4/IPv6 Unicast Firewall Encrypt / Decrypt Tunnels (GRE, mGRE) Multicast DPDK QoS NAT Etc Data Plane (vPlane) CLI REST

Vyatta Network OS(vRouter)

March 1, 2017 SV Linux Users Group, San Jose, CA

Robyn GutierrezSven-Thorsten DietrichBecca Nitzan

Page 2: VyattaNetwork OS - · PDF fileVyatta High Level Architecture 3 IPv4/IPv6 Unicast Firewall Encrypt / Decrypt Tunnels (GRE, mGRE) Multicast DPDK QoS NAT Etc Data Plane (vPlane) CLI REST

© 2017 BROCADE COMMUNICATIONS SYSTEMS, INC. 2

• High Level Architecture Overview and Constraints• Forwarding Performance Using Intel 2690 V2

• Topo• Interface load distribution

• No load, one flow• Multiple imbalanced flows• Multiple balanced flows• Mutiple balanced flows with interface affinity configs

• Hugepages• Know your HW - Limitations when you’re least expecting it!

• then there’s inter versus intra nic• and Power

• Forwarding Performance Using Intel 2690 V3• Topo• Tuning comparisons

• Out of the zone, and into the hack

Topics

Page 3: VyattaNetwork OS - · PDF fileVyatta High Level Architecture 3 IPv4/IPv6 Unicast Firewall Encrypt / Decrypt Tunnels (GRE, mGRE) Multicast DPDK QoS NAT Etc Data Plane (vPlane) CLI REST

Vyatta High Level Architecture

3

IPv4/IPv6 Unicast

Firewall

Encrypt / Decrypt

Tunnels (GRE, mGRE)

Multicast

DPDK

QoS

NAT

Etc

Data Plane (vPlane)CLI

REST

Netconf

GUI

Script API AAA RoutingProtocols

Hybrid DevOpsData Model

Shadow Interfaces FIBvPlaned

Session State

Control Plane

Vyatta High Performance User-Space Networking Architecture

Shadow Interfaces UIO / VFIOLinux Kernel

StorageMulti-Queue NICs

(up to 40Gb)

Hardware / Virtualization

Console USB

AF_PACKET

WANUSB

© 2017 Brocade Communications Systems, Inc.

Page 4: VyattaNetwork OS - · PDF fileVyatta High Level Architecture 3 IPv4/IPv6 Unicast Firewall Encrypt / Decrypt Tunnels (GRE, mGRE) Multicast DPDK QoS NAT Etc Data Plane (vPlane) CLI REST

Why User-Space / Kernel Offload

© 2016 Brocade Communications Systems, Inc. proprietary and confidential — Discussed under NDA Only 4

Page 5: VyattaNetwork OS - · PDF fileVyatta High Level Architecture 3 IPv4/IPv6 Unicast Firewall Encrypt / Decrypt Tunnels (GRE, mGRE) Multicast DPDK QoS NAT Etc Data Plane (vPlane) CLI REST

Dataplane (basic) Packet Service Architecture

5

DPDK

Data Plane Packet Forwarder Threads

Vyatta High Performance User-Space Networking Architecture

UIO / VFIO

Linux Kernel

CPU0 NIC6

Hardware / Virtualization

CPU1 CPU2 CPU3 NIC2 NIC3NIC1 NIC4 NIC5

CPU 1pkt fwd

CPU 2pkt fwd

CPU 3pkt fwd

© 2017 Brocade Communications Systems, Inc.

Page 6: VyattaNetwork OS - · PDF fileVyatta High Level Architecture 3 IPv4/IPv6 Unicast Firewall Encrypt / Decrypt Tunnels (GRE, mGRE) Multicast DPDK QoS NAT Etc Data Plane (vPlane) CLI REST

Packet Service TimingPacket arrival / transmit average periods

© 2017 Brocade Communications Systems, Inc. 6

Link Speed Frame Size Time / Packet1 G 64 640 ns

10 G 64 64 ns40 G 64 16 ns

Page 7: VyattaNetwork OS - · PDF fileVyatta High Level Architecture 3 IPv4/IPv6 Unicast Firewall Encrypt / Decrypt Tunnels (GRE, mGRE) Multicast DPDK QoS NAT Etc Data Plane (vPlane) CLI REST

© 2017 BROCADE COMMUNICATIONS SYSTEMS, INC.

March 1, 2017 SV Linux Users Group, San Jose, CA

Robyn GutierrezSven-Thorsten DietrichBecca Nitzan

Host and Hardware Tuning for Optimal Forwarding Performance

Page 8: VyattaNetwork OS - · PDF fileVyatta High Level Architecture 3 IPv4/IPv6 Unicast Firewall Encrypt / Decrypt Tunnels (GRE, mGRE) Multicast DPDK QoS NAT Etc Data Plane (vPlane) CLI REST

Nominal Forwarding Performance

© 2017 BROCADE COMMUNICATIONS SYSTEMS, INC. 8

Page 9: VyattaNetwork OS - · PDF fileVyatta High Level Architecture 3 IPv4/IPv6 Unicast Firewall Encrypt / Decrypt Tunnels (GRE, mGRE) Multicast DPDK QoS NAT Etc Data Plane (vPlane) CLI REST

Using a simple topo for vRouter performance analysis, using intel E5-2690 v2

9

host_u5> grep "model name" /proc/cpuinfo | uniq

model name : Intel(R) Xeon(R) CPU E5-2690 v2 @ 3.00GHz

Spirent11/3

Spirent11/2

10G10G

Host OS: Ubuntu 14.04.5Hyperv: KVM

sriovsriovp2p1

traffic traffic

mgmt

bridged

p1p1

em1

vRouter8 vcpus8G RAM

dp0s6

dp0s2

dp0s5

Page 10: VyattaNetwork OS - · PDF fileVyatta High Level Architecture 3 IPv4/IPv6 Unicast Firewall Encrypt / Decrypt Tunnels (GRE, mGRE) Multicast DPDK QoS NAT Etc Data Plane (vPlane) CLI REST

CLI cmd line output of vRouter interface to cpu mapping shows no load and 1 flow load

10

Dataplane CPU activity

Core Interface RX Rate TX Rate--------------------------------------------------------

1 dp0s5 0[crypt] 0

2 dp0s6 0dp0s5 0

3 dp0s6 04 dp0s6 05 dp0s2 66 dp0s2 27 dp0s5 0

Dataplane CPU activity

Core Interface RX Rate TX Rate--------------------------------------------------------

1 dp0s5 0[crypt] 0

2 dp0s6 7.4Mdp0s5 7.4M

3 dp0s6 04 dp0s6 7.4M5 dp0s2 56 dp0s2 27 dp0s5 7.4M

No load One flow

Page 11: VyattaNetwork OS - · PDF fileVyatta High Level Architecture 3 IPv4/IPv6 Unicast Firewall Encrypt / Decrypt Tunnels (GRE, mGRE) Multicast DPDK QoS NAT Etc Data Plane (vPlane) CLI REST

With multiple flows per direction, two RX queues per interface are used, improving performance at 20.8 Mpps..

11

Dataplane CPU activity

Core Interface RX Rate TX Rate--------------------------------------------------------

1 dp0s5 3.5M[crypt] 0

2 dp0s6 6.9Mdp0s5 10.4M

3 dp0s6 3.5M4 dp0s6 10.4M5 dp0s2 56 dp0s2 17 dp0s5 6.9M

However:1) there are only 3 flows per direction, leading to a statistically imbalanced load2) cpus 1 and 3 have 3.5 Mpps, whereas cpus 1 and 7 carry 6.9 Mpps

Page 12: VyattaNetwork OS - · PDF fileVyatta High Level Architecture 3 IPv4/IPv6 Unicast Firewall Encrypt / Decrypt Tunnels (GRE, mGRE) Multicast DPDK QoS NAT Etc Data Plane (vPlane) CLI REST

With a statistically balanced number of flows, cpu load is more evenly distributed - performance is up at 23.2 Mpps..

12

Dataplane CPU activity

Core Interface RX Rate TX Rate--------------------------------------------------------

1 dp0s5 5.8M[crypt] 0

2 dp0s6 5.8Mdp0s5 11.6M

3 dp0s6 5.8M4 dp0s6 11.6M5 dp0s2 66 dp0s2 27 dp0s5 5.8M

While taking into account:1) mgmt interface dp0s2 is assigned to 2 cores (only 1 needed for this test)2) the crypto thread is sharing a cpu with a forwarding interface3) these can be adjusted via configs for better performance

Page 13: VyattaNetwork OS - · PDF fileVyatta High Level Architecture 3 IPv4/IPv6 Unicast Firewall Encrypt / Decrypt Tunnels (GRE, mGRE) Multicast DPDK QoS NAT Etc Data Plane (vPlane) CLI REST

A brief diversion: recall what’s in a minimum sized IP packet over 10G Ethernet..

13

preamble8 bytes

interframe gap12 bytes

mac destination6 bytes

ether type2 bytes

mac source6 bytes

IP minimum 46 bytes

CRC4 bytes

Consider:• 84 bytes total taken up on the wire per minimum sized IP packet• theoretical max pps per direction is 14,880,952, ~29.76 Mpps total bidr• ~70 overhead per packet, 26 bytes data

Page 14: VyattaNetwork OS - · PDF fileVyatta High Level Architecture 3 IPv4/IPv6 Unicast Firewall Encrypt / Decrypt Tunnels (GRE, mGRE) Multicast DPDK QoS NAT Etc Data Plane (vPlane) CLI REST

For a deterministic interface/cpu mapping, it’s possible to configure affinity bits per interface, as a result rates are up to near line rate at 28.4 Mpps

14

Dataplane CPU activity

Core Interface RX Rate TX Rate--------------------------------------------------------

1 dp0s2 6dp0s2 1

[crypt] 02 dp0s5 7.1M3 dp0s5 7.1M4 dp0s5 14.2M5 dp0s6 7.1M6 dp0s6 7.1M7 dp0s6 14.2M

Notes:Ø forwarding ints, dp0s5 and sp0s6, use 3 distinct cpus (no longer overlapping cpu 1 or cpu 2)Ø mgmt int dp0s2 now shares 1 cpu with the crypto thread Ø cpu 0 is retained for the control planeØ see backup slides for vrouter config example

Page 15: VyattaNetwork OS - · PDF fileVyatta High Level Architecture 3 IPv4/IPv6 Unicast Firewall Encrypt / Decrypt Tunnels (GRE, mGRE) Multicast DPDK QoS NAT Etc Data Plane (vPlane) CLI REST

Making the load evenly distributed across multiple RX queues, can more than double pps throughput

15

Page 16: VyattaNetwork OS - · PDF fileVyatta High Level Architecture 3 IPv4/IPv6 Unicast Firewall Encrypt / Decrypt Tunnels (GRE, mGRE) Multicast DPDK QoS NAT Etc Data Plane (vPlane) CLI REST

Hugepages can impact performance by more than 50%

16

à11.3 Mpps no hugepages

à28.4 Mpps with hugepages

Host memory info:

u5_hm> cat /proc/meminfo | grep -i hugeAnonHugePages: 28672 kBHugePages_Total: 120

HugePages_Free: 112 ß VM is using 8GHugePages_Rsvd: 0HugePages_Surp: 0Hugepagesize: 1048576 kB

u5_hm> free -gtotal used free shared buffers cached

Mem: 157G 122G 34G 2.3M 73M 1.1G-/+ buffers/cache: 121G 36GSwap: 63G 0B 63G

Page 17: VyattaNetwork OS - · PDF fileVyatta High Level Architecture 3 IPv4/IPv6 Unicast Firewall Encrypt / Decrypt Tunnels (GRE, mGRE) Multicast DPDK QoS NAT Etc Data Plane (vPlane) CLI REST

Inter-nic performance can be better than intra-nic..

17

• Port-to-port on different nics can be up to 10% better than port–to-port on the same nic

port2port1

port2port1

port2port1

do thisdon’t do that

..other stuff

server

slot 1

slot 2

slot 3

Page 18: VyattaNetwork OS - · PDF fileVyatta High Level Architecture 3 IPv4/IPv6 Unicast Firewall Encrypt / Decrypt Tunnels (GRE, mGRE) Multicast DPDK QoS NAT Etc Data Plane (vPlane) CLI REST

Weird hardware limitations present themselves when you’re least expecting it

18

Here’s one:“Place low latency or high performing PCI-e card in slot 1,2,4,5 or 6 (depending on the type of secondary riser board that might be installed).“

Ok... I guess we should avoid slot 3 then

Yes indeed, we should avoid it:à bare metal, traffic bidirectional with min IPv4 packet size:

- using slot 1 and 3: ~20 Mpps- using slot 1 and 2: ~29 Mpps (100% line rate)

Page 19: VyattaNetwork OS - · PDF fileVyatta High Level Architecture 3 IPv4/IPv6 Unicast Firewall Encrypt / Decrypt Tunnels (GRE, mGRE) Multicast DPDK QoS NAT Etc Data Plane (vPlane) CLI REST

It’s really just hw dependent, but for this particular case:

19

port2port1

port2port1

port2port1

do this

..stuff

server

slot 1

slot 2

slot 3

And, just to quote myself, we’re back to this:“Ironically, knowledge of host HW is mandatory for SDN configs whereforwarding performance is concerned”

don’t do that

Page 20: VyattaNetwork OS - · PDF fileVyatta High Level Architecture 3 IPv4/IPv6 Unicast Firewall Encrypt / Decrypt Tunnels (GRE, mGRE) Multicast DPDK QoS NAT Etc Data Plane (vPlane) CLI REST

Power is another one, in particular, redundant power..

20

• With some HW, redundant power is essential• Without it, periodic drops seriously impacts performance• Adding redundant power improved performance by 50%..

• Bios setting changes:• Dynamic Power Savings Mode à to Static High Performance

And there are many other tweaks that canbe made, only included the ones with the biggest impacts.

Page 21: VyattaNetwork OS - · PDF fileVyatta High Level Architecture 3 IPv4/IPv6 Unicast Firewall Encrypt / Decrypt Tunnels (GRE, mGRE) Multicast DPDK QoS NAT Etc Data Plane (vPlane) CLI REST

A few simple HW adjustments can double pps throughput

21

Page 22: VyattaNetwork OS - · PDF fileVyatta High Level Architecture 3 IPv4/IPv6 Unicast Firewall Encrypt / Decrypt Tunnels (GRE, mGRE) Multicast DPDK QoS NAT Etc Data Plane (vPlane) CLI REST

Different performance tuning parameters matter when using Intel E5-2690 V3

22

SpirentSpirent10G10G

Host OS: Debian 8.7KVM/QEMU: KVM 2.1.2Libvirt: 1.2.9

dp0s7

traffic traffic

mgmt

bridged

dp0s6

em1

vRouter10 vcpus16G RAM

Page 23: VyattaNetwork OS - · PDF fileVyatta High Level Architecture 3 IPv4/IPv6 Unicast Firewall Encrypt / Decrypt Tunnels (GRE, mGRE) Multicast DPDK QoS NAT Etc Data Plane (vPlane) CLI REST

The tuning items that mattered on the platform (using V2 chipset), are not as apparent with a more recent platform, using V3

• The biggest impact was when cpus are pinned to hyperthreaded siblings (negative test), was ~18 % hit

• PCI passthrough versus SRIOV was ~.05% hit

• With and without HugePages, in the noise level, .0004% hit

23

Page 24: VyattaNetwork OS - · PDF fileVyatta High Level Architecture 3 IPv4/IPv6 Unicast Firewall Encrypt / Decrypt Tunnels (GRE, mGRE) Multicast DPDK QoS NAT Etc Data Plane (vPlane) CLI REST

Tuning may or may not matter, depends on the host HW..

24

PCI_PT => PCI PasthroughSRIOV => Single Root I/O Virtualization

Page 25: VyattaNetwork OS - · PDF fileVyatta High Level Architecture 3 IPv4/IPv6 Unicast Firewall Encrypt / Decrypt Tunnels (GRE, mGRE) Multicast DPDK QoS NAT Etc Data Plane (vPlane) CLI REST

Optimal Forwarding Performance

25

Page 26: VyattaNetwork OS - · PDF fileVyatta High Level Architecture 3 IPv4/IPv6 Unicast Firewall Encrypt / Decrypt Tunnels (GRE, mGRE) Multicast DPDK QoS NAT Etc Data Plane (vPlane) CLI REST

© 2017 BROCADE COMMUNICATIONS SYSTEMS, INC.

March 1, 2017 SV Linux Users Group, San Jose, CA

Robyn GutierrezSven-Thorsten DietrichBecca Nitzan

Sofware Tuning for Optimal Forwarding Performance

Page 27: VyattaNetwork OS - · PDF fileVyatta High Level Architecture 3 IPv4/IPv6 Unicast Firewall Encrypt / Decrypt Tunnels (GRE, mGRE) Multicast DPDK QoS NAT Etc Data Plane (vPlane) CLI REST

Vyatta High-Performance Architecture

• NUMA / Memory-bandwidth aware • CPU topology aware• Minimal TLB footprint / huge pages• Tickless Kernel• No system calls or context switches• Zero-copy• Lockless fast table lookup and updates• Real-Time processing to avoid packet drops ???

27© 2017 Brocade Communications Systems, Inc.

Page 28: VyattaNetwork OS - · PDF fileVyatta High Level Architecture 3 IPv4/IPv6 Unicast Firewall Encrypt / Decrypt Tunnels (GRE, mGRE) Multicast DPDK QoS NAT Etc Data Plane (vPlane) CLI REST

Vyatta Real-Time Network Packet Processing

• What’s Real Time?

A: The working definition of a real-time system is “the delay between an event and the program response is known and bounded

• Perfect -- this is exactly what we need! What could possibly go wrong?

A: Programming is a skill best acquired by practice and example rather than from books. – A. Turing

© 2017 BROCADE COMMUNICATIONS SYSTEMS, INC. 28

Page 29: VyattaNetwork OS - · PDF fileVyatta High Level Architecture 3 IPv4/IPv6 Unicast Firewall Encrypt / Decrypt Tunnels (GRE, mGRE) Multicast DPDK QoS NAT Etc Data Plane (vPlane) CLI REST

Vyatta Real-Time Network Packet Processing

• Becca: “when I drive 64 byte packets into the NICs at line-rate, my SSH session locks up and OSPF flaps”• Sven: Excellent. This proves that the real-time scheduler is working

exactly as designed: All cpus cycles are devoted to forwarding packets.• Becca: grumble…

© 2017 BROCADE COMMUNICATIONS SYSTEMS, INC. 29

Page 30: VyattaNetwork OS - · PDF fileVyatta High Level Architecture 3 IPv4/IPv6 Unicast Firewall Encrypt / Decrypt Tunnels (GRE, mGRE) Multicast DPDK QoS NAT Etc Data Plane (vPlane) CLI REST

Vyatta Real-Time Network Packet Processing

• Sven: “This is why we reserve CPU0 for control plane processes. That way the admin console always remains responsive.”• Becca: “I configured an admin console and it locks up too. And my SSH

session is on the admin network and that locks up.”• Sven: “Are you sure you aren’t driving traffic at line rate on the admin

network, causing packets to be dropped and TCP timeouts?”• Becca: “Yes. Files bug: Router won't boot, reboot & console / ssh non-

responsive if high rate traffic is running.”

© 2017 BROCADE COMMUNICATIONS SYSTEMS, INC. 30

Page 31: VyattaNetwork OS - · PDF fileVyatta High Level Architecture 3 IPv4/IPv6 Unicast Firewall Encrypt / Decrypt Tunnels (GRE, mGRE) Multicast DPDK QoS NAT Etc Data Plane (vPlane) CLI REST

Vyatta Real-Time Debugging[ 242.150195] INFO: task sshd:6404 blocked for more than 120 seconds.[ 242.225355] Not tainted 3.14.51-1-amd64-vyatta #1[ 242.288038] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.…[ 242.382015] Call Trace:[ 242.382022] [<ffffffff811ed61a>] ? wait_transaction_locked+0x7a/0xb0[ 242.382025] [<ffffffff81092140>] ? finish_wait+0x90/0x90[ 242.382029] [<ffffffff811ed961>] ? start_this_handle+0x261/0x560[ 242.382032] [<ffffffff8114da6d>] ? __inode_permission+0x2d/0xb0[ 242.382036] [<ffffffff811b129f>] ? ext4_file_open+0x6f/0x1b0[ 242.382039] [<ffffffff8114da6d>] ? __inode_permission+0x2d/0xb0[ 242.382043] [<ffffffff811edf38>] ? jbd2__journal_start+0x128/0x1c0[ 242.382046] [<ffffffff811bc63c>] ? ext4_dirty_inode+0x2c/0x80[ 242.382049] [<ffffffff8116b009>] ? __mark_inode_dirty+0x39/0x240[ 242.382052] [<ffffffff8115cf09>] ? update_time+0x89/0xe0[ 242.382055] [<ffffffff8115a1ea>] ? dput+0x1a/0x110[ 242.382057] [<ffffffff8115cffd>] ? file_update_time+0x9d/0x100[ 242.382059] [<ffffffff811516c0>] ? do_last+0x2d0/0xf10[ 242.382063] [<ffffffff810e64ba>] ? __generic_file_aio_write+0x19a/0x3e0[ 242.382065] [<ffffffff810e675e>] ? generic_file_aio_write+0x5e/0xe0[ 242.382068] [<ffffffff811b1cae>] ? ext4_file_write+0xce/0x420[ 242.382070] [<ffffffff8118bf40>] ? __posix_lock_file+0x210/0x530[ 242.382073] [<ffffffff81142c8a>] ? do_sync_write+0x5a/0x90[ 242.382075] [<ffffffff811437fd>] ? vfs_write+0xbd/0x1f0[ 242.382077] [<ffffffff81143d0b>] ? SyS_write+0x4b/0xb0[ 242.382080] [<ffffffff81517ee7>] ? tracesys+0xdd/0xe2

© 2017 BROCADE COMMUNICATIONS SYSTEMS, INC. 31

Page 32: VyattaNetwork OS - · PDF fileVyatta High Level Architecture 3 IPv4/IPv6 Unicast Firewall Encrypt / Decrypt Tunnels (GRE, mGRE) Multicast DPDK QoS NAT Etc Data Plane (vPlane) CLI REST

Vyatta Real-Time Debugging[ 241.917265] INFO: task vbash:6403 blocked for more than 120 seconds.[ 241.993460] Not tainted 3.14.51-1-amd64-vyatta #1[ 242.150118] Call Trace:[ 242.150122] [<ffffffff81170c00>] ? do_thaw_one+0x60/0x60[ 242.150125] [<ffffffff81513c18>] ? io_schedule+0x88/0xd0[ 242.150127] [<ffffffff81170c09>] ? sleep_on_buffer+0x9/0x10[ 242.150130] [<ffffffff81514292>] ? __wait_on_bit+0x52/0x80[ 242.150133] [<ffffffff812271a8>] ? submit_bio+0x68/0x130[ 242.150135] [<ffffffff81170c00>] ? do_thaw_one+0x60/0x60[ 242.150138] [<ffffffff8151433c>] ? out_of_line_wait_on_bit+0x7c/0xa0[ 242.150142] [<ffffffff810921a0>] ? wake_atomic_t_function+0x30/0x30[ 242.150149] [<ffffffffa003a3a0>] ? squashfs_read_data+0x3a0/0x690 [squashfs][ 242.150153] [<ffffffffa003a7f3>] ? squashfs_cache_get+0x163/0x3a0 [squashfs][ 242.150156] [<ffffffffa003bb0c>] ? squashfs_readpage+0xac/0x8e0 [squashfs][ 242.150159] [<ffffffff810ee0c8>] ? __alloc_pages_nodemask+0x158/0xaa0[ 242.150163] [<ffffffff810e5844>] ? add_to_page_cache_locked+0xc4/0x190[ 242.150165] [<ffffffff810f1208>] ? __do_page_cache_readahead+0x198/0x200[ 242.150168] [<ffffffff810f13ab>] ? ondemand_readahead+0x13b/0x2b0[ 242.150170] [<ffffffff810e5983>] ? pagecache_get_page+0x33/0x1e0[ 242.150173] [<ffffffff810e76e6>] ? generic_file_aio_read+0x4b6/0x6d0[ 242.150176] [<ffffffff81512ed5>] ? schedule_timeout+0x1c5/0x230[ 242.150178] [<ffffffff811428fa>] ? do_sync_read+0x5a/0x90[ 242.150181] [<ffffffff811439d5>] ? vfs_read+0xa5/0x180[ 242.150184] [<ffffffff81148a51>] ? kernel_read+0x41/0x60[ 242.150187] [<ffffffff8114a96b>] ? do_execve_common.isra.35+0x45b/0x610[ 242.150190] [<ffffffff8114ad27>] ? SyS_execve+0x27/0x40[ 242.150193] [<ffffffff81518289>] ? stub_execve+0x69/0xa0

© 2017 BROCADE COMMUNICATIONS SYSTEMS, INC. 32

Page 33: VyattaNetwork OS - · PDF fileVyatta High Level Architecture 3 IPv4/IPv6 Unicast Firewall Encrypt / Decrypt Tunnels (GRE, mGRE) Multicast DPDK QoS NAT Etc Data Plane (vPlane) CLI REST

Vyatta Real-Time Debugging[ 241.442768] INFO: task auditd:4806 blocked for more than 120 seconds.[ 241.519914] Not tainted 3.14.51-1-amd64-vyatta #1[ 241.676300] Call Trace:[ 241.676306] [<ffffffff811ed61a>] ? wait_transaction_locked+0x7a/0xb0[ 241.676308] [<ffffffff81015b95>] ? sched_clock+0x5/0x10[ 241.676312] [<ffffffff81092140>] ? finish_wait+0x90/0x90[ 241.676314] [<ffffffff811ed961>] ? start_this_handle+0x261/0x560[ 241.676316] [<ffffffff81015b95>] ? sched_clock+0x5/0x10[ 241.676318] [<ffffffff810845f6>] ? get_vtime_delta+0x16/0x80[ 241.676320] [<ffffffff81015b95>] ? sched_clock+0x5/0x10[ 241.676322] [<ffffffff81015b3d>] ? native_sched_clock+0x2d/0x80[ 241.676324] [<ffffffff81015b95>] ? sched_clock+0x5/0x10[ 241.676326] [<ffffffff81084ede>] ? arch_vtime_task_switch+0x6e/0x90[ 241.676328] [<ffffffff811edf38>] ? jbd2__journal_start+0x128/0x1c0[ 241.676333] [<ffffffff811bc63c>] ? ext4_dirty_inode+0x2c/0x80[ 241.676335] [<ffffffff8116b009>] ? __mark_inode_dirty+0x39/0x240[ 241.676338] [<ffffffff8115cf09>] ? update_time+0x89/0xe0[ 241.676340] [<ffffffff8115cffd>] ? file_update_time+0x9d/0x100[ 241.676344] [<ffffffff810e64ba>] ? __generic_file_aio_write+0x19a/0x3e0[ 241.676346] [<ffffffff810e675e>] ? generic_file_aio_write+0x5e/0xe0[ 241.676349] [<ffffffff811b1cae>] ? ext4_file_write+0xce/0x420[ 241.676354] [<ffffffff810b4a18>] ? do_futex+0x128/0xb10[ 241.676357] [<ffffffff8126978c>] ? __percpu_counter_sum+0x6c/0x80[ 241.676359] [<ffffffff811c5e12>] ? ext4_statfs+0x112/0x160[ 241.676362] [<ffffffff81142c8a>] ? do_sync_write+0x5a/0x90[ 241.676364] [<ffffffff811437fd>] ? vfs_write+0xbd/0x1f0[ 241.676366] [<ffffffff81143d0b>] ? SyS_write+0x4b/0xb0[ 241.676369] [<ffffffff81517ee7>] ? tracesys+0xdd/0xe2 © 2017 BROCADE COMMUNICATIONS SYSTEMS, INC. 33

Page 34: VyattaNetwork OS - · PDF fileVyatta High Level Architecture 3 IPv4/IPv6 Unicast Firewall Encrypt / Decrypt Tunnels (GRE, mGRE) Multicast DPDK QoS NAT Etc Data Plane (vPlane) CLI REST

Control / Data Plane Task Scheduling (Typical)

34

Control Plane Data Plane Packet Forwarder Threads

Vyatta High Performance User-Space Networking Architecture

Linux Kernel

CPU0

Hardware / Virtualization

CPU1 CPU2 CPU3

CPU 1pkt fwd

CPU 2pkt fwd

CPU 3pkt fwd

© 2017 Brocade Communications Systems, Inc.

auditbashbgp

kworker/0 kworker/1 kworker/2 kworker/3

ribospfsshd

Page 35: VyattaNetwork OS - · PDF fileVyatta High Level Architecture 3 IPv4/IPv6 Unicast Firewall Encrypt / Decrypt Tunnels (GRE, mGRE) Multicast DPDK QoS NAT Etc Data Plane (vPlane) CLI REST

Control Task Migrates, Starts I/O on DP CPU

35

Control Plane Data Plane Packet Forwarder Threads

Vyatta High Performance User-Space Networking Architecture

Linux Kernel

CPU0

Hardware / Virtualization

CPU1 CPU2 CPU3

CPU 1pkt fwd

CPU 2pkt fwd

CPU 3pkt fwd

© 2017 Brocade Communications Systems, Inc.

sshdbash

sshd

kworker/0 kworker/1 kworker/2 kworker/3

audit

bgp

ribospf

Page 36: VyattaNetwork OS - · PDF fileVyatta High Level Architecture 3 IPv4/IPv6 Unicast Firewall Encrypt / Decrypt Tunnels (GRE, mGRE) Multicast DPDK QoS NAT Etc Data Plane (vPlane) CLI REST

Control Task Preempted and Indefinitely Blocked

36

Control Plane Data Plane Packet Forwarder Threads

Vyatta High Performance User-Space Networking Architecture

Linux Kernel

CPU0

Hardware / Virtualization

CPU1 CPU2 CPU3

CPU 1pkt fwd

CPU 2pkt fwd

CPU 3pkt fwd

© 2017 Brocade Communications Systems, Inc.

sshd

auditbash

kworker/0 kworker/1 kworker/2 kworker/3

bgp

ribospf

Page 37: VyattaNetwork OS - · PDF fileVyatta High Level Architecture 3 IPv4/IPv6 Unicast Firewall Encrypt / Decrypt Tunnels (GRE, mGRE) Multicast DPDK QoS NAT Etc Data Plane (vPlane) CLI REST

Vyatta Real-Time Debugging

Priority Live Lock!1. Data plane forwarder on CPU (1, 2 or 3) goes idle (traffic gap)2. Scheduler moves control process from CPU 0 to 1, 2, or 33. Control plane task, does I/O (e.g. write log entry), blocks.4. Data plane forwarder goes active (traffic resumes)5. Control process chain holding I/O locks, but Dataplane forwarder

does no I/O. No opportunity for PI, migration: other processes pile up on the same lock.

6. Solution: Stop Traffic? Drop Real-Time?© 2017 BROCADE COMMUNICATIONS SYSTEMS, INC. 37

Page 38: VyattaNetwork OS - · PDF fileVyatta High Level Architecture 3 IPv4/IPv6 Unicast Firewall Encrypt / Decrypt Tunnels (GRE, mGRE) Multicast DPDK QoS NAT Etc Data Plane (vPlane) CLI REST

Vyatta Packet Forwarder Performance

• RT scheduler requires special handling – see above• So what happens if we just drop Real-Time?

© 2017 BROCADE COMMUNICATIONS SYSTEMS, INC. 38

Page 39: VyattaNetwork OS - · PDF fileVyatta High Level Architecture 3 IPv4/IPv6 Unicast Firewall Encrypt / Decrypt Tunnels (GRE, mGRE) Multicast DPDK QoS NAT Etc Data Plane (vPlane) CLI REST

Under heavy 29 Mpps load, the system get’s into the zone, that is, fewer sleep/wake cycles make it better..

Which brings up short diversion into IMIX distros and pps:

à Some IMIXs have a higher percentage of large packets, and ~1.1 Mpps on a 10Gimix_dnload_l3_pktsize=[48, 128, 256, 576, 1500]

imix_dnload_l3_weight=[ 25, 5, 3, 2, 65]

à And some IMIXs have a higher percentage of small packets, this one is ~3.1 Mpps on a 10Gimix_upload_l3_pktsize=[48, 128, 256, 576, 1500]

imix_upload_l3_weight=[ 70, 5, 3, 2, 20]

39

Page 40: VyattaNetwork OS - · PDF fileVyatta High Level Architecture 3 IPv4/IPv6 Unicast Firewall Encrypt / Decrypt Tunnels (GRE, mGRE) Multicast DPDK QoS NAT Etc Data Plane (vPlane) CLI REST

So seeing drops happening on IMIX, alarms go off..

We mostly run performance regressions with high pps rates and other loads, such as NAT, firewall, etc. The heavy lifters.

But it’s a mistake to ignore the light-weights. For example, one issue seemingly out of the blue, showed drops while running IMIX (this is forbidden)

%rate %drop 1_flow %drop 10_flow %drop 1_flow up_dn_imix_same %drop 10_flow up_dn_imix_same----- ------------ ------------- ---------------------------- -----------------------------

20 0.000 0.018 0.013 0.069

40 0.000 0.008 0.003 0.009

60 0.000 0.000 0.002 0.066

80 0.003 0.000 0.000 0.016

100 0.000 0.001 0.000 0.008

…and the journey continues (Sven)!

40

Page 41: VyattaNetwork OS - · PDF fileVyatta High Level Architecture 3 IPv4/IPv6 Unicast Firewall Encrypt / Decrypt Tunnels (GRE, mGRE) Multicast DPDK QoS NAT Etc Data Plane (vPlane) CLI REST

Vyatta Packet Forwarder Performance

• By dropping real-time we encountered “Scheduling Fairness”• The CFS scheduler penalizes CPU-bound SCHED_OTHER processes• This would lead to the packet drops Becca observed

© 2017 BROCADE COMMUNICATIONS SYSTEMS, INC. 41

Page 42: VyattaNetwork OS - · PDF fileVyatta High Level Architecture 3 IPv4/IPv6 Unicast Firewall Encrypt / Decrypt Tunnels (GRE, mGRE) Multicast DPDK QoS NAT Etc Data Plane (vPlane) CLI REST

Control / Data Plane Task Scheduling (Typical)

42

Control Plane Data Plane Packet Forwarder Threads

Vyatta High Performance User-Space Networking Architecture

Linux Kernel

CPU0

Hardware / Virt

CPU1 CPU2 CPU3

CPU 1pkt fwd

CPU 2pkt fwd

CPU 3pkt fwd

© 2017 Brocade Communications Systems, Inc.

auditbashbgp

kworker/0 kworker/1 kworker/2 kworker/3

ribospfsshd

Page 43: VyattaNetwork OS - · PDF fileVyatta High Level Architecture 3 IPv4/IPv6 Unicast Firewall Encrypt / Decrypt Tunnels (GRE, mGRE) Multicast DPDK QoS NAT Etc Data Plane (vPlane) CLI REST

Dynamic Control Plane / Data Plane Resourcing

43

Control Plane: CPU 0+1 Data Plane Packet Forwarders: CPU 2+3

Vyatta High Performance User-Space Networking Architecture

Linux Kernel

CPU0

Hardware / Virtualization

CPU1 CPU2 CPU3

CPU 2pkt fwd

CPU 3pkt fwd

© 2017 Brocade Communications Systems, Inc.

auditbash

kworker/0 kworker/1 kworker/2 kworker/3

rib

bgpospfsshd

Page 44: VyattaNetwork OS - · PDF fileVyatta High Level Architecture 3 IPv4/IPv6 Unicast Firewall Encrypt / Decrypt Tunnels (GRE, mGRE) Multicast DPDK QoS NAT Etc Data Plane (vPlane) CLI REST

What’s New in OSS and Vyatta R&D?

© 2017 BROCADE COMMUNICATIONS SYSTEMS, INC. 44

Networking: 100 G and beyond• No lack of small packets (Twitter, SMS, IOT messaging)• More network queues and associated spectrum provisioned to drive network traffic

Processor Silicon• CPU clock speed is on the long tail of the asymptote• Semi process shrinks approaching single-atom wires• Architecture tweak gains in 5 – 10% range for revisions• Moore’s law per core count maps to software performance via Ahmdahl’s law

Linux Kernel• Context switch / system call overhead essentially static• Unix socket / Memory copy model not scaling at 10G. Sidelined at 40 and 100.• Offload libraries replicating driver and netstack code, long-game solution needed

Page 45: VyattaNetwork OS - · PDF fileVyatta High Level Architecture 3 IPv4/IPv6 Unicast Firewall Encrypt / Decrypt Tunnels (GRE, mGRE) Multicast DPDK QoS NAT Etc Data Plane (vPlane) CLI REST

All Done!

Thank You

45

Sven-Thorsten [email protected]

Robyn [email protected]

Becca [email protected]

Page 46: VyattaNetwork OS - · PDF fileVyatta High Level Architecture 3 IPv4/IPv6 Unicast Firewall Encrypt / Decrypt Tunnels (GRE, mGRE) Multicast DPDK QoS NAT Etc Data Plane (vPlane) CLI REST

Thank you

© 2017 Brocade Communications Systems, Inc.