VSPERF Benchmarking the
network data plane of NFV
vdevices and vlinks
Maryam Tahhan
Maciek Konstantynowicz
Outline
• Overview opnfv.org/vsperf
• VSPERF Level Test Design spec – 3x4 Matrix
• VSPERF applicability to virtual networking deployments
– virtual networking topologies and use cases
– network workloads on computer – what matters
– finding boundaries of deterministic performance of virtual switch
11/12/2015 OPNFV Summit
OVS with DPDKOVS with DPDKOVS with DPDK
VSPERF Overview
Define, implement and execute a test suite to characterize the performance of a virtual switch in the NFVI
Drive standardization
Promote a defined platform and reuse
Establish best practice
30+ committers and contributors
VSPERF Deliverables
Consumable by:
Traffic Gen
DUT
vSwitch
VNF(s)
Traffic Gen
Client
VSPERF
Modular Test Framework
Consumable by:
Test SpecificationIETF Draft
VSPERF 3x4 Matrix LTD CoverageSPEED ACCURACY RELIABILITY SCALE
Activation
• RFC2889. AddressLearningRate• RFC2889. AddressCachingCapacity• InitialPacketProcessingLatency• LatencyAndLatencyVariation
• CPDP.Coupling.Flow.Addition
• RFC2544.SystemRecoveryTime• RFC2544.ResetTime
• RFC2889.AddressCachingCapacity
Operation
• RFC2544.PacketLossRatio• RFC2544.PacketLossRateFrmMod• RFC2544.BackToBackFrames• RFC2889.MaxForwardingRate• RFC2889.ForwardPressure• RFC2889.BroadcastFrameForwardin
g• RFC2889 Broadcast Frame Latency
test• CPU.RFC2544.0PacketLoss• RFC2544.WorstN-BestN• InterPacketDelayVariation.RFC5481
• RFC2889.ErrorFramesFiltering
• RFC2544.Profile
• RFC2889.Soak• RFC2889.SoakFrameModificati
on• PacketDelayVariation.RFC3393.
Soak
• Scalability.RFC2544.0PacketLoss
• MemoryBandwidth.RFC2544.0PacketLoss.Scalability
De-Activation
RFC2544 Benchmarking Methodology for Network Interconnect DevicesRFC2889 Benchmarking Methodology for LAN switching Devices
x86
vSwitch
Hardware
Linux Kernel
User Space
Pkt-Gen
Guest OS
NIC NIC
Phy-VS-nVNFvm
VNF VNF VNF
Guest OS Guest OS
Nx40GE
Nx10GE
vNetwork: vSwitch + VNFs
vNetwork: vSwitch + VNF-Chains
Use Case Performance Benchmarking
NIC
vSwitch
Hardware
Linux Kernel
User Space
Pkt-Gen
Guest OS
NIC NIC
x86
Phy-VS-VNFvm
VNF
Nx40GE
Nx10GE
vSwitch
Hardware
Linux Kernel
User Space
Pkt-Gen
Guest OS
NIC NIC
x86
Phy-VSvm-VNFvm
VNF
Guest OS
Guest OS
Nx40GE
Nx10GE
vSwitch
Hardware
Linux Kernel
User Space
Pkt-Gen
Guest OS
NIC NIC
x86
Phy-VSvm-nVNFvm
VNF VNF VNF
Guest OS Guest OS
Guest OS
Nx40GE
Nx10GE
vSwitch
Hardware
Linux Kernel
User Space
Pkt-Gen
Guest OS
NIC NIC
x86
Phy-VS-VNFvmCh
VNF VNF VNF
Guest OS Guest OS
Nx40GE
Nx10GE
vSwitch
Hardware
Linux Kernel
User Space
Pkt-Gen
Guest OS
NIC NIC
x86
Phy-VSvm-VNFvmCh
VNF VNF VNF
Guest OS Guest OS
Guest OS
Nx40GE
Nx10GE
vSwitch
Hardware
Linux Kernel
User Space
Pkt-Gen
NIC NIC
x86
Phy-VSvm-nVNFvmCh
Guest OS
VNF VNF
Guest OS
VNF
Guest OS
Nx40GE
Nx10GE
virtual Infrastructure vNetwork: vSwitch only
vNetwork: VNF only
NIC
vSwitch
Hardware
Linux Kernel
User Space
Pkt-Gen
NIC NIC
x86
Phy-VS
Nx40GE
Nx10GE
NIC
refVNF
Hardware
Linux Kernel
User Space
Pkt-Gen
NIC NIC
x86
Phy-refVNF
Nx40GE
Nx10GENIC
refVNF
Hardware
Linux Kernel
User Space
Pkt-Gen
Guest OS
NIC NIC
x86
Phy-refVNFvm
Nx40GE
Nx10GE
vSwitch
Hardware
Linux Kernel
User Space
Pkt-Gen
Guest OS
x86
VNF VNF
Guest OS
refVNFvm-VS-refVNFvm
NICNIC
Nx40GE
Nx10GE
vSwitch
Hardware
Linux Kernel
User Space
Pkt-Gen
Guest OS
x86
VNF
Phy-VS-refVNFvm
NIC NIC
Nx40GE
Nx10GE
NIC
vSwitch
Hardware
Linux Kernel
User Space
Pkt-Gen
Guest OS
NIC NIC
x86
Phy-VSvm
Nx40GE
Nx10GE
NIC
VNF
Hardware
Linux Kernel
User Space
Pkt-Gen
Guest OS
NIC NIC
x86
Phy-VNFvm
Nx40GE
Nx10GE
VSPERF Topologies - Overview
Baseline Performance Benchmarking
vSwitch
x86
Hardware
Linux Kernel
User Space
Pkt-Gen
NIC NIC
Phy-VS-nVNFvmCh
Guest OS
VNF VNF
Guest OS
VNF
Guest OS
Nx40GE
Nx10GE
vNetwork: vSwitch + VNFs
vNetwork: vSwitch + VNF-Chains
Use Case Performance Benchmarking
NIC
vSwitch
Hardware
Linux Kernel
User Space
Pkt-Gen
Guest OS
NIC NIC
x86
Phy-VS-VNFvm
VNF
Nx40GE
Nx10GE
vSwitch
Hardware
Linux Kernel
User Space
Pkt-Gen
Guest OS
NIC NIC
x86
Phy-VSvm-VNFvm
VNF
Guest OS
Guest OS
Nx40GE
Nx10GE
vSwitch
Hardware
Linux Kernel
User Space
Pkt-Gen
Guest OS
NIC NIC
x86
Phy-VSvm-nVNFvm
VNF VNF VNF
Guest OS Guest OS
Guest OS
Nx40GE
Nx10GE
vSwitch
Hardware
Linux Kernel
User Space
Pkt-Gen
Guest OS
NIC NIC
x86
Phy-VS-VNFvmCh
VNF VNF VNF
Guest OS Guest OS
Nx40GE
Nx10GE
vSwitch
Hardware
Linux Kernel
User Space
Pkt-Gen
Guest OS
NIC NIC
x86
Phy-VSvm-VNFvmCh
VNF VNF VNF
Guest OS Guest OS
Guest OS
Nx40GE
Nx10GE
vSwitch
Hardware
Linux Kernel
User Space
Pkt-Gen
NIC NIC
x86
Phy-VSvm-nVNFvmCh
Guest OS
VNF VNF
Guest OS
VNF
Guest OS
Nx40GE
Nx10GE
virtual Infrastructure vNetwork: vSwitch only
vNetwork: VNF only
NIC
vSwitch
Hardware
Linux Kernel
User Space
Pkt-Gen
NIC NIC
x86
Phy-VS
Nx40GE
Nx10GE
NIC
refVNF
Hardware
Linux Kernel
User Space
Pkt-Gen
NIC NIC
x86
Phy-refVNF
Nx40GE
Nx10GENIC
refVNF
Hardware
Linux Kernel
User Space
Pkt-Gen
Guest OS
NIC NIC
x86
Phy-refVNFvm
Nx40GE
Nx10GE
vSwitch
Hardware
Linux Kernel
User Space
Pkt-Gen
Guest OS
x86
VNF VNF
Guest OS
refVNFvm-VS-refVNFvm
NICNIC
Nx40GE
Nx10GE
vSwitch
Hardware
Linux Kernel
User Space
Pkt-Gen
Guest OS
x86
VNF
Phy-VS-refVNFvm
NIC NIC
Nx40GE
Nx10GE
NIC
vSwitch
Hardware
Linux Kernel
User Space
Pkt-Gen
Guest OS
NIC NIC
x86
Phy-VSvm
Nx40GE
Nx10GE
NIC
VNF
Hardware
Linux Kernel
User Space
Pkt-Gen
Guest OS
NIC NIC
x86
Phy-VNFvm
Nx40GE
Nx10GE
VSPERF Topologies - Overview
Baseline Performance Benchmarking
Lots of combinations!Prioritize based on applicability and performance ...
x86
vSwitch
Hardware
Linux Kernel
User Space
Pkt-Gen
Guest OS
NIC NIC
Phy-VS-nVNFvm
VNF VNF VNF
Guest OS Guest OS
Nx40GE
Nx10GE
vSwitch
x86
Hardware
Linux Kernel
User Space
Pkt-Gen
NIC NIC
Phy-VS-nVNFvmCh
Guest OS
VNF VNF
Guest OS
VNF
Guest OS
Nx40GE
Nx10GE
• They are just a little BIT different
• They are all about processing packets
• At 10GE, 64B frames can arrive at 14.88Mfps – that’s 67nsec per frame.
• With 2GHz CPU core each clock cycle is 0.5nsec – that’s 134 clock cycles per frame.
• BUT it takes ~70nsec to access memory – not much time to do work if waiting for memory access.
• Efficiency of dealing with packets within the computer is essential
• Moving packets:• Packets arrive on physical interfaces (NICs) and virtual interfaces (VNFs) - need CPU optimized drivers for both.
• Drivers and buffer management software must not rely on memory access – see time budget above.
• Processing packets:• Header manipulation, encaps/decaps, lookups, classifiers, counters.
• Need packet processing optimized for CPU platforms.
• CONCLUSION - Need to pay attention to Computer efficiency for Network workloads
• Computer efficiency for x86 = close to optimal use of x86 uarchitectures: core and uncore resources.
Network workloads vs. compute workloads
Network workloads – what mattersx86 HW resources – high-level metrics
Topology of the x86 2 socket XEON E5 SandyBridge system(‘lstopo –of’)
• Static resource footprint
• CPU cores.
• Memory.
• Disk not that much.
• Dynamic resource footprint
• CPU cores utilization (CPU heatmaps).
• CPU cache efficiency: hits, misses, evictions.
• PCI bus, QPI bus, Memory lanes utilization.
• Computer provides lots of telemetry data
• But it takes skill to make sense of it and use it!
NIC
vSwitch
Hardware
Linux Kernel
User Space
Pkt-Gen
NIC NIC
x86
Phy-VS
Nx40GE
Nx10GE
vSwitch
Hardware
Linux Kernel
User Space
Pkt-Gen
Guest OS
x86
VNF VNF
Guest OS
refVNFvm-VS-refVNFvm
NICNIC
Nx40GE
Nx10GE
vSwitch
Hardware
Linux Kernel
User Space
Pkt-Gen
Guest OS
x86
VNF
Phy-VS-refVNFvm
NIC NIC
Nx40GE
Nx10GE
• Virtual Switch baseline benchmarking
• Moving packets between NIC and vSwitch
• Moving packets between VNFs* and vSwitch
• vSwitch packet processing
• Counting the clock cycles per packet ...
PHY-to-PHY VM-to-VMPHY-to-VM-to-
PHY
Applying VSPERF Methodology and Tools to Evaluate Solutions
Virtual Network Infrastructure
* Using reference VNF VM e.g. L2FWD app.
vSwitch
x86
Hardware
Linux Kernel
User Space
Pkt-Gen
NIC NIC
Phy-VS-nVNFvmCh
Guest OS
VNF VNF
Guest OS
VNF
Guest OS
Nx40GE
Nx10GE
x86
vSwitch
Hardware
Linux Kernel
User Space
Pkt-Gen
Guest OS
NIC NIC
Phy-VS-nVNFvm
VNF VNF VNF
Guest OS Guest OS
Nx40GE
Nx10GE
• Virtual Switch VNF use case benchmarking
• VNFs* connected in parallel
• VNFs* in service chains
• Box-full load tests
• VNF solution benchmarking
• VNFs* connected in parallel
• VNFs* in service chains
• Box-full load tests
• Counting clock cycles per packet ...
* Using reference VNF VM e.g. L2FWD app.
N of VNFs N of VNF service chains
Applying VSPERF Methodology and Tools to Evaluate Solutions
Deployment Use Cases
① FrameLossRate-Sweep
• Provides overview of system packet throughput and loss across the entire range of offered load.
• [VSPERF] LTD.Throughput.RFC2544.Profile (existing)
② ThroughputBinarySearch-SingleRun
• Measures Throughput per packet size based on a single measurement per packet rate.
• [VSPERF] LTD.Throughput.RFC2544.PacketLossRatio (modifed by vNet-SLA)
③ ThroughputBinarySearch-BestNWorstN
• Measures Throughput per packet size based on a multiple measurements per packet rate.
• [VSPERF] LTD.Throughput.RFC2544.WorstN-BestN (new by vNet-SLA)
④ LatencyAndLatencyVariation
• Measures per packet latency and latency variation, reports values in specified percentiles of packets in latency stream.
• [VSPERF] LTD.InterPacketDelayVariation.RFC5481 (new by vNet-SLA – under review)
Cisco vNet-SLA – Testing Methodologies into [VSPERF]
All tests based on modified RFC2544 and RFC1242, adopted to virtualized networking environment.
All tests realized using automated tooling, ready for use in continuous integration devops model.
VSPERF Test Framework
• A Python based test framework for characterizing the performance of virtual
switches.
• As of today, capable of conducting the following RFC2544 tests on stock
OVS and OVS with DPDK:
• Supported deployments: Phy2Phy, PVP and PVVP.
Future Work
• Integrating multiple traffic gens: IXIA, Spirent, Moongen and
Xena.
• Methodology extensions: iterations for the short trial tests.
• Prove out and refine methodology and tests through the
framework.
• Add more tests to the LTD and the framework.
• Continuous Integration support.
Summary
• Continue to automate virtual switch benchmarking across defined dimensions
– Repeatable and reusable results.
– Best practice reference for benchmarking the virtual networking platform.
• Find boundaries of deterministic performance
– Of virtual switch and reference virtual network topologies.
– Single instance and at full density on compute machines.
• Identify and quantify resources that matter
– Publish benchmarks with telemetry data.
• Drive community to optimize virtual networking infrastructures
– Lower the cost of moving the packet within the computer.
• Make it applicable to real deployments
– Feedback loop with industry communities.