metron nfv service chains at the true speed of the ...georgios p. katsikas 2017 metron nfv service...
TRANSCRIPT
-
Georgios P. Katsikas 2017
MetronNFV Service Chains at the True
Speed of the Underlying Hardware
Georgios P. Katsikas1,3, Tom Barbette2, Dejan Kostic3,
Rebecca Steinert1, and Gerald Q. Maguire Jr.3
1RISE SICS 2University of Liege 3KTH Royal Institute of Technology
-
Georgios P. Katsikas 2017 2
High throughput and low latency
Web
Video
...
Analytics
Introduction
-
Georgios P. Katsikas 2017 3
IPSec LB
...
Web
Analytics
Video
Firewall DPIRouter Router
NAT IDS Monitor
Service Chains of Network Functions
-
Georgios P. Katsikas 2017 4
IPSec LB
...
Web
Analytics
Video
Firewall DPIRouter Router
NAT IDS Monitor
Service Chains of Network Functions
Minimize the performance impact of service chains
-
Georgios P. Katsikas 2017 5
Source: D. Firestone et al. Hardware-Accelerated Networks
at Scale in the Cloud, Microsoft Azure, SIGCOMM 2017Motivation
-
Georgios P. Katsikas 2017 6
Motivation Source: D. Firestone et al. Hardware-Accelerated Networks at Scale in the Cloud, Microsoft Azure, SIGCOMM 2017
-
Georgios P. Katsikas 2017 7
Motivation Source: D. Firestone et al. Hardware-Accelerated Networks at Scale in the Cloud, Microsoft Azure, SIGCOMM 2017
Service chains today greatly
rely on CPU performance!
-
Georgios P. Katsikas 2017
Can existing NFV systems operate at these emerging link speeds
(i.e., at 100 Gbps)?
8
-
Georgios P. Katsikas 2017 9
Generator Processor
1x100GbE
Mellanox Connect-X4
Total Throughput
100 Gbps
Router NAPT LB
Deployment at 100 Gbps
-
Georgios P. Katsikas 2017 10
Hardware Limit
Hardware Limit at 100 Gbps
76
-
Georgios P. Katsikas 2017 11
Performance degradation
Emulated E2Hardware Limit
Existing NFV Solutions at 100 Gbps
76
-
Georgios P. Katsikas 2017 12
Emulated E2Hardware Limit OpenBox
Existing NFV Solutions at 100 Gbps
76
-
Georgios P. Katsikas 2017
Emulated E2Hardware Limit OpenBox
13
Hardware
offloading
Accurate CPU
core dispatching
0 inter-core transfers
Metron
Metron – Hardware-level Performance
76
-
Georgios P. Katsikas 2017
Emulated E2Hardware Limit OpenBox
14
Metron
Hardware
performance
6.5x better efficiency
Metron – Hardware-level Performance
76
-
Georgios P. Katsikas 2017
Why do existing NFV systems lose performance?
15
-
Georgios P. Katsikas 2017 16
Switch NF1 NF2 Idle
OutIn
E2, ClickOS,
NetVM
Core 1 Core 2 Core 3 Core 4Software switch
dispatching
-
Georgios P. Katsikas 2017 17
Switch NF1 NF2 Idle
OutIn
E2, ClickOS,
NetVM
Core 1 Core 2 Core 3 Core 4Software switch
dispatching
-
Georgios P. Katsikas 2017 18
Switch NF1 NF2 Idle
OutIn
E2, ClickOS,
NetVM
Core 1 Core 2 Core 3 Core 4
1Software switch
dispatching
-
Georgios P. Katsikas 2017 19
Switch NF1 NF2 Idle
OutIn
E2, ClickOS,
NetVM
Core 1 Core 2 Core 3 Core 4
2Software switch
dispatching
-
Georgios P. Katsikas 2017 20
Switch NF1 NF2 Idle
OutIn
E2, ClickOS,
NetVM
Core 1 Core 2 Core 3 Core 4
3Software switch
dispatching
-
Georgios P. Katsikas 2017 21
Switch NF1 NF2 Idle
OutIn
E2, ClickOS,
NetVM
Core 1 Core 2 Core 3 Core 4
4Software switch
dispatching
-
Georgios P. Katsikas 2017 22
Switch NF1 NF2 Idle
OutIn
E2, ClickOS,
NetVM
Core 1 Core 2 Core 3 Core 4
4 Inter-Core Transfers
Software switch
dispatching
-
Georgios P. Katsikas 2017 23
Switch NF1 NF2 Idle
OutIn
E2, ClickOS,
NetVM
Core 1 Core 2 Core 3 Core 4
Rx Rx NF1+NF2 Tx
OutInRSS
Pipeline dispatching
with or without RSS
NFP, Flurries
Software switch
dispatching
4 Inter-Core Transfers
-
Georgios P. Katsikas 2017 24
Switch NF1 NF2 Idle
OutIn
E2, ClickOS,
NetVM
Core 1 Core 2 Core 3 Core 4
Rx Rx NF1+NF2 Tx
OutInRSS
Pipeline dispatching
with or without RSS
NFP, Flurries
Software switch
dispatching
4 Inter-Core Transfers
-
Georgios P. Katsikas 2017 25
Switch NF1 NF2 Idle
OutIn
E2, ClickOS,
NetVM
Core 1 Core 2 Core 3 Core 4
Rx Rx NF1+NF2 Tx
OutInRSS
Pipeline dispatching
with or without RSS
NFP, Flurries
Software switch
dispatching
4 Inter-Core Transfers
1
-
Georgios P. Katsikas 2017 26
Switch NF1 NF2 Idle
OutIn
E2, ClickOS,
NetVM
Core 1 Core 2 Core 3 Core 4
Rx Rx NF1+NF2 Tx
OutInRSS
Pipeline dispatching
with or without RSS
NFP, Flurries
Software switch
dispatching
4 Inter-Core Transfers
2
-
Georgios P. Katsikas 2017 27
Switch NF1 NF2 Idle
OutIn
E2, ClickOS,
NetVM
Core 1 Core 2 Core 3 Core 4
Rx Rx NF1+NF2 Tx
OutInRSS
Pipeline dispatching
with or without RSS
NFP, Flurries
Software switch
dispatching
4 Inter-Core Transfers
-
Georgios P. Katsikas 2017 28
Switch NF1 NF2 Idle
OutIn
E2, ClickOS,
NetVM
Core 1 Core 2 Core 3 Core 4
Rx Rx NF1+NF2 Tx
OutInRSS
Pipeline dispatching
with or without RSS
NFP, Flurries
Software switch
dispatching
4 Inter-Core Transfers
2 Inter-Core Transfers
-
Georgios P. Katsikas 2017 29
Switch NF1 NF2 Idle
OutIn
E2, ClickOS,
NetVM
Core 1 Core 2 Core 3 Core 4
OutRSS In
Flow Director
Or
Rule or hash-based
hardware dispatchingRx+NF1 Idle NF2+Tx Idle
Rx Rx NF1+NF2 Tx
OutInRSS
Pipeline dispatching
with or without RSS
NFP, Flurries
OpenBox,
SNF, FastClick
Software switch
dispatching
4 Inter-Core Transfers
2 Inter-Core Transfers
-
Georgios P. Katsikas 2017 30
Switch NF1 NF2 Idle
OutIn
E2, ClickOS,
NetVM
Core 1 Core 2 Core 3 Core 4
OutRSS In
Flow Director
Or
Rule or hash-based
hardware dispatchingRx+NF1 Idle NF2+Tx Idle
Rx Rx NF1+NF2 Tx
OutInRSS
Pipeline dispatching
with or without RSS
NFP, Flurries
OpenBox,
SNF, FastClick
Software switch
dispatching
4 Inter-Core Transfers
2 Inter-Core Transfers
-
Georgios P. Katsikas 2017 31
Switch NF1 NF2 Idle
OutIn
E2, ClickOS,
NetVM
Core 1 Core 2 Core 3 Core 4
OutRSS In
Flow Director
Or
Rule or hash-based
hardware dispatchingRx+NF1 Idle NF2+Tx Idle
Rx Rx NF1+NF2 Tx
OutInRSS
Pipeline dispatching
with or without RSS
NFP, Flurries
OpenBox,
SNF, FastClick
Software switch
dispatching
4 Inter-Core Transfers
2 Inter-Core Transfers
1
-
Georgios P. Katsikas 2017 32
Switch NF1 NF2 Idle
OutIn
E2, ClickOS,
NetVM
Core 1 Core 2 Core 3 Core 4
OutRSS In
Flow Director
Or
Rule or hash-based
hardware dispatchingRx+NF1 Idle NF2+Tx Idle
Rx Rx NF1+NF2 Tx
OutInRSS
Pipeline dispatching
with or without RSS
NFP, Flurries
OpenBox,
SNF, FastClick
Software switch
dispatching
4 Inter-Core Transfers
2 Inter-Core Transfers
-
Georgios P. Katsikas 2017 33
Switch NF1 NF2 Idle
OutIn
E2, ClickOS,
NetVM
Core 1 Core 2 Core 3 Core 4
OutRSS In
Flow Director
Or
Rule or hash-based
hardware dispatchingRx+NF1 Idle NF2+Tx Idle
Rx Rx NF1+NF2 Tx
OutInRSS
Pipeline dispatching
with or without RSS
NFP, Flurries
OpenBox,
SNF, FastClick
Software switch
dispatching
4 Inter-Core Transfers
2 Inter-Core Transfers
1 Inter-Core Transfer
-
Georgios P. Katsikas 2017 34
Switch NF1 NF2 Idle
OutIn
E2, ClickOS,
NetVM
Core 1 Core 2 Core 3 Core 4
OutRSS In
Flow Director
Or
Rule or hash-based
hardware dispatchingRx+NF1 Idle NF2+Tx Idle
Rx Rx NF1+NF2 Tx
OutInRSS
Pipeline dispatching
with or without RSS
NFP, Flurries
OpenBox,
SNF, FastClick
Software switch
dispatching
4 Inter-Core Transfers
2 Inter-Core Transfers
1 Inter-Core Transfer
Fastest system fewest inter-core transfers
-
Georgios P. Katsikas 2017 35
Switch NF1 NF2 Idle
OutIn
E2, ClickOS,
NetVM
Core 1 Core 2 Core 3 Core 4
OutRSS In
Flow Director
Or
Rule or hash-based
hardware dispatchingRx+NF1 Idle NF2+Tx Idle
Rx Rx NF1+NF2 Tx
OutInRSS
Pipeline dispatching
with or without RSS
NFP, Flurries
OpenBox,
SNF, FastClick
Software switch
dispatching
4 Inter-Core Transfers
2 Inter-Core Transfers
1 Inter-Core Transfer
The available time to process 64-byte packets at
100 Gbps is ~5ns/packet
-
Georgios P. Katsikas 2017 36
Switch NF1 NF2 Idle
OutIn
E2, ClickOS,
NetVM
Core 1 Core 2 Core 3 Core 4
OutRSS In
Flow Director
Or
Rule or hash-based
hardware dispatchingRx+NF1 Idle NF2+Tx Idle
Rx Rx NF1+NF2 Tx
OutInRSS
Pipeline dispatching
with or without RSS
NFP, Flurries
OpenBox,
SNF, FastClick
Software switch
dispatching
4 Inter-Core Transfers
2 Inter-Core Transfers
1 Inter-Core Transfer
The available time to process 64-byte packets at
100 Gbps is ~5ns/packet
But, all existing NFV systems transfer each packet
(one or more times) via DRAM or LLC
-
Georgios P. Katsikas 2017 37
Switch NF1 NF2 Idle
OutIn
E2, ClickOS,
NetVM
Core 1 Core 2 Core 3 Core 4
OutRSS In
Flow Director
Or
Rule or hash-based
hardware dispatchingRx+NF1 Idle NF2+Tx Idle
Rx Rx NF1+NF2 Tx
OutInRSS
Pipeline dispatching
with or without RSS
NFP, Flurries
OpenBox,
SNF, FastClick
Software switch
dispatching
4 Inter-Core Transfers
2 Inter-Core Transfers
1 Inter-Core Transfer
The available time to process 64-byte packets at
100 Gbps is ~5ns/packet
But, all existing NFV systems transfer each packet
(one or more times) via DRAM or LLC
Such a transfer takes several nanoseconds itself!
-
Georgios P. Katsikas 2017 38
Problems
Greatly underutilized hardware
Solutions
Challenges
Hardware heterogeneity
Unified hardware
management abstractions
Zero Inter-core Transfers with Metron
-
Georgios P. Katsikas 2017 39
Problems
Greatly underutilized hardware
Too many inter-core transfers
Solutions
Challenges
Hardware heterogeneity
Dynamic adaptation
Unified hardware
management abstractions
Dispatch traffic classes to
CPU cores using tags. Map
each tag to the correct core
Zero Inter-core Transfers with Metron
-
Georgios P. Katsikas 2017 40
Problems
Greatly underutilized hardware
Too many inter-core transfers
Solutions Results
Challenges
Hardware heterogeneity
Dynamic adaptation
Unified hardware
management abstractions
Dispatch traffic classes to
CPU cores using tags. Map
each tag to the correct core
Up to 4.7x lower latency
Up to 7.8x higher throughput
2.75-6.5x better efficiency
Zero Inter-core Transfers with Metron
-
Georgios P. Katsikas 2017
Metron
Controller
41
Firewall DPI
ServerCore 1
Core 2
Input service
chain to the
controller
Source
-
Georgios P. Katsikas 2017
Metron
Controller
Synthesized NF
42
Katsikas et al.
SNF: synthesizing high performance NFV service chains
PeerJ Computer Science 2016
Firewall DPI
ServerCore 1
Core 2
Input service
chain to the
controller
Source
-
Georgios P. Katsikas 2017
Metron
Controller
Launcher
43
Read Ops Write Ops
Splitter
Stateless Stateful
ServerCore 1
Core 2Source
Synthesized NFFirewall DPI
Input service
chain to the
controller
-
Georgios P. Katsikas 2017
Metron
Controller
Launcher
Tagging
44
Read Ops Write Ops
Splitter
Stateless Stateful
ServerCore 1
Core 2Source
Synthesized NFFirewall DPI
Input service
chain to the
controller
-
Georgios P. Katsikas 2017
Metron
Controller
Launcher
Tagging
45
Read Ops Write Ops
Splitter
Stateless Stateful
HW Rules SW Ops
ServerCore 1
Core 2Source
Synthesized NFFirewall DPI
Input service
chain to the
controller
-
Georgios P. Katsikas 2017
Metron
Controller
Launcher
Tagging
46
Read Ops Write Ops
Splitter
Stateless Stateful
HW Rules SW Ops
ServerCore 1
Core 2Source
Synthesized NFFirewall DPI
Input service
chain to the
controller
-
Georgios P. Katsikas 2017
Core 1Server
Metron
Controller
Launcher
Tagging
47
Read Ops Write Ops
Splitter
Stateless Stateful
Monitoring and
Load Balancing
HW Rules SW Ops
Collect
run-time load
statistics from
switches and
servers
Tag 1
Core 2Source
Synthesized NFDPI
Input service
chain to the
controller
Firewall
-
Georgios P. Katsikas 2017
Core 1Server
Metron
Controller
Launcher
Tagging
48
Read Ops Write Ops
Splitter
Stateless Stateful
Monitoring and
Load Balancing
HW Rules SW Ops
Collect
run-time load
statistics from
switches and
servers
Tag 1 Overloaded
Detect Imbalance
Core 2Source
Synthesized NFDPI
Input service
chain to the
controller
Firewall
-
Georgios P. Katsikas 2017
Core 1Server
Metron
Controller
Launcher
Tagging
49
Read Ops Write Ops
Splitter
Stateless Stateful
Monitoring and
Load Balancing
HW Rules SW Ops
Detect Imbalance
Split
Traffic Classes
Tag 1 Overloaded
Core 2Source
Collect
run-time load
statistics from
switches and
servers
Synthesized NFDPI
Input service
chain to the
controller
Firewall
-
Georgios P. Katsikas 2017
Core 1Server
Metron
Controller
Launcher
Tagging
50
Read Ops Write Ops
Splitter
Stateless Stateful
Monitoring and
Load Balancing
Update
HW Rules SW Ops
Detect Imbalance
Split
Traffic Classes
Tag 1 Overloaded
Core 2Source
Collect
run-time load
statistics from
switches and
servers
Synthesized NFDPI
Input service
chain to the
controller
Firewall
-
Georgios P. Katsikas 2017
Core 1Server
Metron
Controller
Launcher
Tagging
51
Read Ops Write Ops
Splitter
Stateless Stateful
Monitoring and
Load Balancing
Update
HW Rules SW Ops
Detect Imbalance
Split
Traffic Classes
Tag 1 Overloaded
Core 2Source
Collect
run-time load
statistics from
switches and
servers
Synthesized NFDPI
Input service
chain to the
controller
Firewall
-
Georgios P. Katsikas 2017 Core 2
Core 1Server
Metron
Controller
Launcher
Tagging
52
Read Ops Write Ops
Splitter
Stateless Stateful
Monitoring and
Load Balancing
Update
HW Rules SW Ops
Detect Imbalance
Split
Traffic Classes
Tags 1, 2Source
Collect
run-time load
statistics from
switches and
servers
Synthesized NFDPI
Input service
chain to the
controller
Firewall
Balanced
-
Georgios P. Katsikas 2017
PerformanceEvaluation
53
-
Georgios P. Katsikas 2017 54
NoviSwitch (OpenFlow 1.4)
4x10GbE
Intel 82599
Generator Processor
Total Throughput
40 Gbps
4x10GbE
Intel 82599
Firewall DPI
Deployment at 40 Gbps
-
Georgios P. Katsikas 2017 55
Hardware Limit
Throughput at 40 Gbps
38
-
Georgios P. Katsikas 2017 56
Emulated E2Hardware Limit
Throughput at 40 Gbps
38
~1.70 Gbps/core, 16 cores, R2=0.94
-
Georgios P. Katsikas 2017 57
Emulated E2Hardware Limit OpenBox
Throughput at 40 Gbps
38
~2.08 Gbps/core, 11 cores, R2=0.92
-
Georgios P. Katsikas 2017 58
Emulated E2Hardware Limit OpenBox Metron
Throughput at 40 Gbps
38
~4.0 Gbps/core, 10 cores, R2=0.86
-
Georgios P. Katsikas 2017 59
~3x better efficiency
Emulated E2Hardware Limit OpenBox Metron
Throughput at 40 Gbps
38
-
Georgios P. Katsikas 2017 60
DPI cost
Emulated E2Hardware Limit OpenBox Metron
Throughput at 40 Gbps
38
-
Georgios P. Katsikas 2017 61
FW + DPI at the speed
of a 40 Gbps testbed
Emulated E2Hardware Limit OpenBox Metron
Throughput at 40 Gbps
38
-
Georgios P. Katsikas 2017 62
Metron achieves 35-97% lower latency
Emulated E2Hardware Limit OpenBox Metron
Latency at 40 Gbps
-
Georgios P. Katsikas 2017 63
Emulated E2Hardware Limit OpenBox Metron
LatencyFW+DPI ~= LatencyMIN
Latency at 40 Gbps
-
Georgios P. Katsikas 2017 64
Generator Processor
1x100GbE
Mellanox Connect-X4
Total Throughput
100 Gbps
Router NAPT LB
Deployment at 100 Gbps
-
Georgios P. Katsikas 2017 65
Metron
Metron Decomposition at 100 Gbps
-
Georgios P. Katsikas 2017 66
Metron Metron w/o HW Offl. (O)
Metron Decomposition at 100 Gbps
Hardware offloading disabled
-
Georgios P. Katsikas 2017 67
Metron Metron w/o HW Offl. (O) Metron w/o HW Disp. (D)
Metron Decomposition at 100 Gbps
Inaccurate CPU core dispatching
Inter-core communication cost
-
Georgios P. Katsikas 2017 68
Metron Metron w/o HW Offl. (O) Metron w/o HW Disp. (D) Metron w/o O & w/o D
Metron Decomposition at 100 Gbps
Hardware offloading +
accurate dispatching disabled
-
Georgios P. Katsikas 2017 69
Firewall NAPT GW
NoviSwitch (OpenFlow 1.4)
1x10GbE
Intel 82599
Generator Processor
Total Throughput
10 Gbps
1x10GbE
Intel 82599
Deployment at 10 Gbps
-
Georgios P. Katsikas 2017 70
Input Load
Dynamic Load Balancing
-
Georgios P. Katsikas 2017 71
Input Load Metron’s Throughput
Dynamic Load Balancing
-
Georgios P. Katsikas 2017 72
Input Load Metron’s Throughput
Dynamic Load Balancing
Dynamic CPU core
allocation in the paper
-
Georgios P. Katsikas 2017 73
Metron’s driver for managing commodity servers is now in ONOS
https://github.com/opennetworkinglab/onos/tree/master/drivers/server
https://github.com/gkatsikas/onos/tree/metron-driver
Metron’s high performance data plane extends FastClick
https://github.com/tbarbette/fastclick/tree/metron
Open Source Activities
https://github.com/opennetworkinglab/onos/tree/master/drivers/serverhttps://github.com/gkatsikas/onos/tree/metron-driverhttps://github.com/tbarbette/fastclick/tree/metron
-
Georgios P. Katsikas 2017 74
NFV service chains at the speed of the hardware
Metron
Contributions
Orchestration of heterogeneous programmable hardware
Accurate CPU core dispatching
Low-cost placement
Hardware offloading
Quick and stable adaptation
Conclusion