metron nfv service chains at the true speed of the ...georgios p. katsikas 2017 metron nfv service...

74
Georgios P. Katsikas 2017 Metron NFV Service Chains at the True Speed of the Underlying Hardware Georgios P. Katsikas 1,3 , Tom Barbette 2 , Dejan Kostic 3 , Rebecca Steinert 1 , and Gerald Q. Maguire Jr. 3 1 RISE SICS 2 University of Liege 3 KTH Royal Institute of Technology

Upload: others

Post on 30-Jan-2021

5 views

Category:

Documents


0 download

TRANSCRIPT

  • Georgios P. Katsikas 2017

    MetronNFV Service Chains at the True

    Speed of the Underlying Hardware

    Georgios P. Katsikas1,3, Tom Barbette2, Dejan Kostic3,

    Rebecca Steinert1, and Gerald Q. Maguire Jr.3

    1RISE SICS 2University of Liege 3KTH Royal Institute of Technology

  • Georgios P. Katsikas 2017 2

    High throughput and low latency

    Web

    Video

    ...

    Analytics

    Introduction

  • Georgios P. Katsikas 2017 3

    IPSec LB

    ...

    Web

    Analytics

    Video

    Firewall DPIRouter Router

    NAT IDS Monitor

    Service Chains of Network Functions

  • Georgios P. Katsikas 2017 4

    IPSec LB

    ...

    Web

    Analytics

    Video

    Firewall DPIRouter Router

    NAT IDS Monitor

    Service Chains of Network Functions

    Minimize the performance impact of service chains

  • Georgios P. Katsikas 2017 5

    Source: D. Firestone et al. Hardware-Accelerated Networks

    at Scale in the Cloud, Microsoft Azure, SIGCOMM 2017Motivation

  • Georgios P. Katsikas 2017 6

    Motivation Source: D. Firestone et al. Hardware-Accelerated Networks at Scale in the Cloud, Microsoft Azure, SIGCOMM 2017

  • Georgios P. Katsikas 2017 7

    Motivation Source: D. Firestone et al. Hardware-Accelerated Networks at Scale in the Cloud, Microsoft Azure, SIGCOMM 2017

    Service chains today greatly

    rely on CPU performance!

  • Georgios P. Katsikas 2017

    Can existing NFV systems operate at these emerging link speeds

    (i.e., at 100 Gbps)?

    8

  • Georgios P. Katsikas 2017 9

    Generator Processor

    1x100GbE

    Mellanox Connect-X4

    Total Throughput

    100 Gbps

    Router NAPT LB

    Deployment at 100 Gbps

  • Georgios P. Katsikas 2017 10

    Hardware Limit

    Hardware Limit at 100 Gbps

    76

  • Georgios P. Katsikas 2017 11

    Performance degradation

    Emulated E2Hardware Limit

    Existing NFV Solutions at 100 Gbps

    76

  • Georgios P. Katsikas 2017 12

    Emulated E2Hardware Limit OpenBox

    Existing NFV Solutions at 100 Gbps

    76

  • Georgios P. Katsikas 2017

    Emulated E2Hardware Limit OpenBox

    13

    Hardware

    offloading

    Accurate CPU

    core dispatching

    0 inter-core transfers

    Metron

    Metron – Hardware-level Performance

    76

  • Georgios P. Katsikas 2017

    Emulated E2Hardware Limit OpenBox

    14

    Metron

    Hardware

    performance

    6.5x better efficiency

    Metron – Hardware-level Performance

    76

  • Georgios P. Katsikas 2017

    Why do existing NFV systems lose performance?

    15

  • Georgios P. Katsikas 2017 16

    Switch NF1 NF2 Idle

    OutIn

    E2, ClickOS,

    NetVM

    Core 1 Core 2 Core 3 Core 4Software switch

    dispatching

  • Georgios P. Katsikas 2017 17

    Switch NF1 NF2 Idle

    OutIn

    E2, ClickOS,

    NetVM

    Core 1 Core 2 Core 3 Core 4Software switch

    dispatching

  • Georgios P. Katsikas 2017 18

    Switch NF1 NF2 Idle

    OutIn

    E2, ClickOS,

    NetVM

    Core 1 Core 2 Core 3 Core 4

    1Software switch

    dispatching

  • Georgios P. Katsikas 2017 19

    Switch NF1 NF2 Idle

    OutIn

    E2, ClickOS,

    NetVM

    Core 1 Core 2 Core 3 Core 4

    2Software switch

    dispatching

  • Georgios P. Katsikas 2017 20

    Switch NF1 NF2 Idle

    OutIn

    E2, ClickOS,

    NetVM

    Core 1 Core 2 Core 3 Core 4

    3Software switch

    dispatching

  • Georgios P. Katsikas 2017 21

    Switch NF1 NF2 Idle

    OutIn

    E2, ClickOS,

    NetVM

    Core 1 Core 2 Core 3 Core 4

    4Software switch

    dispatching

  • Georgios P. Katsikas 2017 22

    Switch NF1 NF2 Idle

    OutIn

    E2, ClickOS,

    NetVM

    Core 1 Core 2 Core 3 Core 4

    4 Inter-Core Transfers

    Software switch

    dispatching

  • Georgios P. Katsikas 2017 23

    Switch NF1 NF2 Idle

    OutIn

    E2, ClickOS,

    NetVM

    Core 1 Core 2 Core 3 Core 4

    Rx Rx NF1+NF2 Tx

    OutInRSS

    Pipeline dispatching

    with or without RSS

    NFP, Flurries

    Software switch

    dispatching

    4 Inter-Core Transfers

  • Georgios P. Katsikas 2017 24

    Switch NF1 NF2 Idle

    OutIn

    E2, ClickOS,

    NetVM

    Core 1 Core 2 Core 3 Core 4

    Rx Rx NF1+NF2 Tx

    OutInRSS

    Pipeline dispatching

    with or without RSS

    NFP, Flurries

    Software switch

    dispatching

    4 Inter-Core Transfers

  • Georgios P. Katsikas 2017 25

    Switch NF1 NF2 Idle

    OutIn

    E2, ClickOS,

    NetVM

    Core 1 Core 2 Core 3 Core 4

    Rx Rx NF1+NF2 Tx

    OutInRSS

    Pipeline dispatching

    with or without RSS

    NFP, Flurries

    Software switch

    dispatching

    4 Inter-Core Transfers

    1

  • Georgios P. Katsikas 2017 26

    Switch NF1 NF2 Idle

    OutIn

    E2, ClickOS,

    NetVM

    Core 1 Core 2 Core 3 Core 4

    Rx Rx NF1+NF2 Tx

    OutInRSS

    Pipeline dispatching

    with or without RSS

    NFP, Flurries

    Software switch

    dispatching

    4 Inter-Core Transfers

    2

  • Georgios P. Katsikas 2017 27

    Switch NF1 NF2 Idle

    OutIn

    E2, ClickOS,

    NetVM

    Core 1 Core 2 Core 3 Core 4

    Rx Rx NF1+NF2 Tx

    OutInRSS

    Pipeline dispatching

    with or without RSS

    NFP, Flurries

    Software switch

    dispatching

    4 Inter-Core Transfers

  • Georgios P. Katsikas 2017 28

    Switch NF1 NF2 Idle

    OutIn

    E2, ClickOS,

    NetVM

    Core 1 Core 2 Core 3 Core 4

    Rx Rx NF1+NF2 Tx

    OutInRSS

    Pipeline dispatching

    with or without RSS

    NFP, Flurries

    Software switch

    dispatching

    4 Inter-Core Transfers

    2 Inter-Core Transfers

  • Georgios P. Katsikas 2017 29

    Switch NF1 NF2 Idle

    OutIn

    E2, ClickOS,

    NetVM

    Core 1 Core 2 Core 3 Core 4

    OutRSS In

    Flow Director

    Or

    Rule or hash-based

    hardware dispatchingRx+NF1 Idle NF2+Tx Idle

    Rx Rx NF1+NF2 Tx

    OutInRSS

    Pipeline dispatching

    with or without RSS

    NFP, Flurries

    OpenBox,

    SNF, FastClick

    Software switch

    dispatching

    4 Inter-Core Transfers

    2 Inter-Core Transfers

  • Georgios P. Katsikas 2017 30

    Switch NF1 NF2 Idle

    OutIn

    E2, ClickOS,

    NetVM

    Core 1 Core 2 Core 3 Core 4

    OutRSS In

    Flow Director

    Or

    Rule or hash-based

    hardware dispatchingRx+NF1 Idle NF2+Tx Idle

    Rx Rx NF1+NF2 Tx

    OutInRSS

    Pipeline dispatching

    with or without RSS

    NFP, Flurries

    OpenBox,

    SNF, FastClick

    Software switch

    dispatching

    4 Inter-Core Transfers

    2 Inter-Core Transfers

  • Georgios P. Katsikas 2017 31

    Switch NF1 NF2 Idle

    OutIn

    E2, ClickOS,

    NetVM

    Core 1 Core 2 Core 3 Core 4

    OutRSS In

    Flow Director

    Or

    Rule or hash-based

    hardware dispatchingRx+NF1 Idle NF2+Tx Idle

    Rx Rx NF1+NF2 Tx

    OutInRSS

    Pipeline dispatching

    with or without RSS

    NFP, Flurries

    OpenBox,

    SNF, FastClick

    Software switch

    dispatching

    4 Inter-Core Transfers

    2 Inter-Core Transfers

    1

  • Georgios P. Katsikas 2017 32

    Switch NF1 NF2 Idle

    OutIn

    E2, ClickOS,

    NetVM

    Core 1 Core 2 Core 3 Core 4

    OutRSS In

    Flow Director

    Or

    Rule or hash-based

    hardware dispatchingRx+NF1 Idle NF2+Tx Idle

    Rx Rx NF1+NF2 Tx

    OutInRSS

    Pipeline dispatching

    with or without RSS

    NFP, Flurries

    OpenBox,

    SNF, FastClick

    Software switch

    dispatching

    4 Inter-Core Transfers

    2 Inter-Core Transfers

  • Georgios P. Katsikas 2017 33

    Switch NF1 NF2 Idle

    OutIn

    E2, ClickOS,

    NetVM

    Core 1 Core 2 Core 3 Core 4

    OutRSS In

    Flow Director

    Or

    Rule or hash-based

    hardware dispatchingRx+NF1 Idle NF2+Tx Idle

    Rx Rx NF1+NF2 Tx

    OutInRSS

    Pipeline dispatching

    with or without RSS

    NFP, Flurries

    OpenBox,

    SNF, FastClick

    Software switch

    dispatching

    4 Inter-Core Transfers

    2 Inter-Core Transfers

    1 Inter-Core Transfer

  • Georgios P. Katsikas 2017 34

    Switch NF1 NF2 Idle

    OutIn

    E2, ClickOS,

    NetVM

    Core 1 Core 2 Core 3 Core 4

    OutRSS In

    Flow Director

    Or

    Rule or hash-based

    hardware dispatchingRx+NF1 Idle NF2+Tx Idle

    Rx Rx NF1+NF2 Tx

    OutInRSS

    Pipeline dispatching

    with or without RSS

    NFP, Flurries

    OpenBox,

    SNF, FastClick

    Software switch

    dispatching

    4 Inter-Core Transfers

    2 Inter-Core Transfers

    1 Inter-Core Transfer

    Fastest system fewest inter-core transfers

  • Georgios P. Katsikas 2017 35

    Switch NF1 NF2 Idle

    OutIn

    E2, ClickOS,

    NetVM

    Core 1 Core 2 Core 3 Core 4

    OutRSS In

    Flow Director

    Or

    Rule or hash-based

    hardware dispatchingRx+NF1 Idle NF2+Tx Idle

    Rx Rx NF1+NF2 Tx

    OutInRSS

    Pipeline dispatching

    with or without RSS

    NFP, Flurries

    OpenBox,

    SNF, FastClick

    Software switch

    dispatching

    4 Inter-Core Transfers

    2 Inter-Core Transfers

    1 Inter-Core Transfer

    The available time to process 64-byte packets at

    100 Gbps is ~5ns/packet

  • Georgios P. Katsikas 2017 36

    Switch NF1 NF2 Idle

    OutIn

    E2, ClickOS,

    NetVM

    Core 1 Core 2 Core 3 Core 4

    OutRSS In

    Flow Director

    Or

    Rule or hash-based

    hardware dispatchingRx+NF1 Idle NF2+Tx Idle

    Rx Rx NF1+NF2 Tx

    OutInRSS

    Pipeline dispatching

    with or without RSS

    NFP, Flurries

    OpenBox,

    SNF, FastClick

    Software switch

    dispatching

    4 Inter-Core Transfers

    2 Inter-Core Transfers

    1 Inter-Core Transfer

    The available time to process 64-byte packets at

    100 Gbps is ~5ns/packet

    But, all existing NFV systems transfer each packet

    (one or more times) via DRAM or LLC

  • Georgios P. Katsikas 2017 37

    Switch NF1 NF2 Idle

    OutIn

    E2, ClickOS,

    NetVM

    Core 1 Core 2 Core 3 Core 4

    OutRSS In

    Flow Director

    Or

    Rule or hash-based

    hardware dispatchingRx+NF1 Idle NF2+Tx Idle

    Rx Rx NF1+NF2 Tx

    OutInRSS

    Pipeline dispatching

    with or without RSS

    NFP, Flurries

    OpenBox,

    SNF, FastClick

    Software switch

    dispatching

    4 Inter-Core Transfers

    2 Inter-Core Transfers

    1 Inter-Core Transfer

    The available time to process 64-byte packets at

    100 Gbps is ~5ns/packet

    But, all existing NFV systems transfer each packet

    (one or more times) via DRAM or LLC

    Such a transfer takes several nanoseconds itself!

  • Georgios P. Katsikas 2017 38

    Problems

    Greatly underutilized hardware

    Solutions

    Challenges

    Hardware heterogeneity

    Unified hardware

    management abstractions

    Zero Inter-core Transfers with Metron

  • Georgios P. Katsikas 2017 39

    Problems

    Greatly underutilized hardware

    Too many inter-core transfers

    Solutions

    Challenges

    Hardware heterogeneity

    Dynamic adaptation

    Unified hardware

    management abstractions

    Dispatch traffic classes to

    CPU cores using tags. Map

    each tag to the correct core

    Zero Inter-core Transfers with Metron

  • Georgios P. Katsikas 2017 40

    Problems

    Greatly underutilized hardware

    Too many inter-core transfers

    Solutions Results

    Challenges

    Hardware heterogeneity

    Dynamic adaptation

    Unified hardware

    management abstractions

    Dispatch traffic classes to

    CPU cores using tags. Map

    each tag to the correct core

    Up to 4.7x lower latency

    Up to 7.8x higher throughput

    2.75-6.5x better efficiency

    Zero Inter-core Transfers with Metron

  • Georgios P. Katsikas 2017

    Metron

    Controller

    41

    Firewall DPI

    ServerCore 1

    Core 2

    Input service

    chain to the

    controller

    Source

  • Georgios P. Katsikas 2017

    Metron

    Controller

    Synthesized NF

    42

    Katsikas et al.

    SNF: synthesizing high performance NFV service chains

    PeerJ Computer Science 2016

    Firewall DPI

    ServerCore 1

    Core 2

    Input service

    chain to the

    controller

    Source

  • Georgios P. Katsikas 2017

    Metron

    Controller

    Launcher

    43

    Read Ops Write Ops

    Splitter

    Stateless Stateful

    ServerCore 1

    Core 2Source

    Synthesized NFFirewall DPI

    Input service

    chain to the

    controller

  • Georgios P. Katsikas 2017

    Metron

    Controller

    Launcher

    Tagging

    44

    Read Ops Write Ops

    Splitter

    Stateless Stateful

    ServerCore 1

    Core 2Source

    Synthesized NFFirewall DPI

    Input service

    chain to the

    controller

  • Georgios P. Katsikas 2017

    Metron

    Controller

    Launcher

    Tagging

    45

    Read Ops Write Ops

    Splitter

    Stateless Stateful

    HW Rules SW Ops

    ServerCore 1

    Core 2Source

    Synthesized NFFirewall DPI

    Input service

    chain to the

    controller

  • Georgios P. Katsikas 2017

    Metron

    Controller

    Launcher

    Tagging

    46

    Read Ops Write Ops

    Splitter

    Stateless Stateful

    HW Rules SW Ops

    ServerCore 1

    Core 2Source

    Synthesized NFFirewall DPI

    Input service

    chain to the

    controller

  • Georgios P. Katsikas 2017

    Core 1Server

    Metron

    Controller

    Launcher

    Tagging

    47

    Read Ops Write Ops

    Splitter

    Stateless Stateful

    Monitoring and

    Load Balancing

    HW Rules SW Ops

    Collect

    run-time load

    statistics from

    switches and

    servers

    Tag 1

    Core 2Source

    Synthesized NFDPI

    Input service

    chain to the

    controller

    Firewall

  • Georgios P. Katsikas 2017

    Core 1Server

    Metron

    Controller

    Launcher

    Tagging

    48

    Read Ops Write Ops

    Splitter

    Stateless Stateful

    Monitoring and

    Load Balancing

    HW Rules SW Ops

    Collect

    run-time load

    statistics from

    switches and

    servers

    Tag 1 Overloaded

    Detect Imbalance

    Core 2Source

    Synthesized NFDPI

    Input service

    chain to the

    controller

    Firewall

  • Georgios P. Katsikas 2017

    Core 1Server

    Metron

    Controller

    Launcher

    Tagging

    49

    Read Ops Write Ops

    Splitter

    Stateless Stateful

    Monitoring and

    Load Balancing

    HW Rules SW Ops

    Detect Imbalance

    Split

    Traffic Classes

    Tag 1 Overloaded

    Core 2Source

    Collect

    run-time load

    statistics from

    switches and

    servers

    Synthesized NFDPI

    Input service

    chain to the

    controller

    Firewall

  • Georgios P. Katsikas 2017

    Core 1Server

    Metron

    Controller

    Launcher

    Tagging

    50

    Read Ops Write Ops

    Splitter

    Stateless Stateful

    Monitoring and

    Load Balancing

    Update

    HW Rules SW Ops

    Detect Imbalance

    Split

    Traffic Classes

    Tag 1 Overloaded

    Core 2Source

    Collect

    run-time load

    statistics from

    switches and

    servers

    Synthesized NFDPI

    Input service

    chain to the

    controller

    Firewall

  • Georgios P. Katsikas 2017

    Core 1Server

    Metron

    Controller

    Launcher

    Tagging

    51

    Read Ops Write Ops

    Splitter

    Stateless Stateful

    Monitoring and

    Load Balancing

    Update

    HW Rules SW Ops

    Detect Imbalance

    Split

    Traffic Classes

    Tag 1 Overloaded

    Core 2Source

    Collect

    run-time load

    statistics from

    switches and

    servers

    Synthesized NFDPI

    Input service

    chain to the

    controller

    Firewall

  • Georgios P. Katsikas 2017 Core 2

    Core 1Server

    Metron

    Controller

    Launcher

    Tagging

    52

    Read Ops Write Ops

    Splitter

    Stateless Stateful

    Monitoring and

    Load Balancing

    Update

    HW Rules SW Ops

    Detect Imbalance

    Split

    Traffic Classes

    Tags 1, 2Source

    Collect

    run-time load

    statistics from

    switches and

    servers

    Synthesized NFDPI

    Input service

    chain to the

    controller

    Firewall

    Balanced

  • Georgios P. Katsikas 2017

    PerformanceEvaluation

    53

  • Georgios P. Katsikas 2017 54

    NoviSwitch (OpenFlow 1.4)

    4x10GbE

    Intel 82599

    Generator Processor

    Total Throughput

    40 Gbps

    4x10GbE

    Intel 82599

    Firewall DPI

    Deployment at 40 Gbps

  • Georgios P. Katsikas 2017 55

    Hardware Limit

    Throughput at 40 Gbps

    38

  • Georgios P. Katsikas 2017 56

    Emulated E2Hardware Limit

    Throughput at 40 Gbps

    38

    ~1.70 Gbps/core, 16 cores, R2=0.94

  • Georgios P. Katsikas 2017 57

    Emulated E2Hardware Limit OpenBox

    Throughput at 40 Gbps

    38

    ~2.08 Gbps/core, 11 cores, R2=0.92

  • Georgios P. Katsikas 2017 58

    Emulated E2Hardware Limit OpenBox Metron

    Throughput at 40 Gbps

    38

    ~4.0 Gbps/core, 10 cores, R2=0.86

  • Georgios P. Katsikas 2017 59

    ~3x better efficiency

    Emulated E2Hardware Limit OpenBox Metron

    Throughput at 40 Gbps

    38

  • Georgios P. Katsikas 2017 60

    DPI cost

    Emulated E2Hardware Limit OpenBox Metron

    Throughput at 40 Gbps

    38

  • Georgios P. Katsikas 2017 61

    FW + DPI at the speed

    of a 40 Gbps testbed

    Emulated E2Hardware Limit OpenBox Metron

    Throughput at 40 Gbps

    38

  • Georgios P. Katsikas 2017 62

    Metron achieves 35-97% lower latency

    Emulated E2Hardware Limit OpenBox Metron

    Latency at 40 Gbps

  • Georgios P. Katsikas 2017 63

    Emulated E2Hardware Limit OpenBox Metron

    LatencyFW+DPI ~= LatencyMIN

    Latency at 40 Gbps

  • Georgios P. Katsikas 2017 64

    Generator Processor

    1x100GbE

    Mellanox Connect-X4

    Total Throughput

    100 Gbps

    Router NAPT LB

    Deployment at 100 Gbps

  • Georgios P. Katsikas 2017 65

    Metron

    Metron Decomposition at 100 Gbps

  • Georgios P. Katsikas 2017 66

    Metron Metron w/o HW Offl. (O)

    Metron Decomposition at 100 Gbps

    Hardware offloading disabled

  • Georgios P. Katsikas 2017 67

    Metron Metron w/o HW Offl. (O) Metron w/o HW Disp. (D)

    Metron Decomposition at 100 Gbps

    Inaccurate CPU core dispatching

    Inter-core communication cost

  • Georgios P. Katsikas 2017 68

    Metron Metron w/o HW Offl. (O) Metron w/o HW Disp. (D) Metron w/o O & w/o D

    Metron Decomposition at 100 Gbps

    Hardware offloading +

    accurate dispatching disabled

  • Georgios P. Katsikas 2017 69

    Firewall NAPT GW

    NoviSwitch (OpenFlow 1.4)

    1x10GbE

    Intel 82599

    Generator Processor

    Total Throughput

    10 Gbps

    1x10GbE

    Intel 82599

    Deployment at 10 Gbps

  • Georgios P. Katsikas 2017 70

    Input Load

    Dynamic Load Balancing

  • Georgios P. Katsikas 2017 71

    Input Load Metron’s Throughput

    Dynamic Load Balancing

  • Georgios P. Katsikas 2017 72

    Input Load Metron’s Throughput

    Dynamic Load Balancing

    Dynamic CPU core

    allocation in the paper

  • Georgios P. Katsikas 2017 73

    Metron’s driver for managing commodity servers is now in ONOS

    https://github.com/opennetworkinglab/onos/tree/master/drivers/server

    https://github.com/gkatsikas/onos/tree/metron-driver

    Metron’s high performance data plane extends FastClick

    https://github.com/tbarbette/fastclick/tree/metron

    Open Source Activities

    https://github.com/opennetworkinglab/onos/tree/master/drivers/serverhttps://github.com/gkatsikas/onos/tree/metron-driverhttps://github.com/tbarbette/fastclick/tree/metron

  • Georgios P. Katsikas 2017 74

    NFV service chains at the speed of the hardware

    Metron

    Contributions

    Orchestration of heterogeneous programmable hardware

    Accurate CPU core dispatching

    Low-cost placement

    Hardware offloading

    Quick and stable adaptation

    Conclusion