mellanox approach to nfv & sdn
TRANSCRIPT
Eran Bello, Director of Business DevelopmentMarch 2014 | NFV&SDN Summit | Paris, France
Mellanox Approach to NFV & SDN
© 2014 Mellanox Technologies 2
Leading Supplier of End-to-End Interconnect Solutions
Virtual Protocol Interconnect
StorageFront / Back-EndServer / Compute Switch / Gateway
56G IB & FCoIB 56G InfiniBand
10/40/56GbE & FCoE 10/40/56GbE
Virtual Protocol Interconnect
Host/Fabric SoftwareICs Switches/GatewaysAdapter Cards Cables/Modules
Comprehensive End-to-End InfiniBand and Ethernet Portfolio
Metro / WAN
© 2014 Mellanox Technologies 3
Virtual Protocol Interconnect (VPI) Technology
64 ports 10GbE36 ports 40/56GbE48 10GbE + 12 40/56GbE36 ports IB up to 56Gb/s8 VPI subnets
Switch OS Layer
Mezzanine Card
VPI Adapter VPI Switch
Ethernet: 10/40/56 Gb/s
InfiniBand:10/20/40/56 Gb/s
Unified Fabric Manager
Networking
Storage Clustering Management
Applications
Acceleration Engines
LOM Adapter Card
3.0
From data center to campus and metro connectivity
© 2014 Mellanox Technologies 4
Highest Capacity in 1RU• From 12 QSFP to 36 QSFP 40/56Gb ports 4.03Tb/s
• 64 x 10GbE
• 48x10GbE plus 12x40/56Gbps
Unique Value Proposition• VPI 10/40/56Gbps
• End to end solution
Ethernet Switch Portfolio
SX1036The Ideal 40GbE ToR/Aggregation
SX1024Non-blocking 10GbE 40GbE ToR
SX1016Highest density 10GbE ToR
Latency• 220ns L2 latency
• 330ns L3 latency
Power (SX1036)• SX1036 – 83w
• SX1024 – 70w
• SX1016 – 64w
• 1W per 10Gb port, 2.3W per 40Gb port
SX1012Ideal storage/Database 10/40GbE Switch
© 2014 Mellanox Technologies 5
IndependentSoftware Vendors
BRAS
Firewall
DPI
CDN
Tester/QoEmonitor
WANAcceleration
MessageRouter
Radio NetworkController
CarrierGrade NAT
Session BorderController
Classical Network ApplianceApproach
PE Router
SGSN/GGSN
Generic High VolumeEthernet Switches
Generic High Volume Servers
Generic High Volume Storage
Orchestrated,automatic remote install
Network Functions VirtualisationApproach
Ideal platform for ETSI NFV: Network Functions Virtualization • Consolidate Network Equipment onto standard servers, switches and storage• Leverage Software Defined Networking • Driven within the ETSI: European Telecom Standard Institution
The migration to x86 based platforms is the enabler • 3G/4G Network Core • Load Balancing / Traffic and Policy Enforcement• Internet Security Gateways • Network Monitoring / VPN• CDN / Video Processing and Optimization / IPTV
Telecom and Security Network Functions Virtualization
ATCA Platforms
Compute and Storage
Platforms
© 2014 Mellanox Technologies 6
HP c7000 with Mellanox 40GbE Interconnect
Mezz AdapterHPPN 644161-B22 2P Blade
NFFCablesSwitch Blade
SX1018HP
VPI Ready: Same HW for both ETH and IBHighest Capacity: 2P 40GbE PCIe 3.0 x 8 lanes Lowest Latency RoCE (App to App): 1.3us Lowest Power (40GbE) Typ 2-port 40GbE: 5.1W
•56Gbps FDR IB / 40GbE QSFP•QSA: QSFP to SFP+ Adapter
VPI Ready: Same HW for both ETH and IBHighest Capacity: 2.72 Tb/s bandwidth
16 Internal 40/10GbE ports18 External 40/10GbE ports (QSFP+)
Lowest Latency:220 nsec latency 40GbE270 nsec latency 10GbE
Lowest Power: 82W (Typical Power with Passive Cables)
C-Class double wide form factorUp to two SX1018HP switches per enclosure
© 2014 Mellanox Technologies 7- Mellanox Confidential -
Mellanox ConnectX-3 Dual-Port 40GbE NIC and Switch release in Q3/2013
14 Compute Blades each using Single EN6132 Dual Port 40GbE NIC
2 Switch Blades each using EN6131 SwitchX-2 32 Ports 40GbE Compute I/O 1.12 Tbps @40 GbE Uplink I/O up to 1.44Tbps @40 GbE
Dual Star Architecture Dual-Dual Star Architecture
10GbE
10GbE
40GbE
40GbE
22 x 10GbE
18 x 40GbE
40GbE
40GbE
40GbE
40GbE
18 x 40GbE
18 x 40GbE
22 x 10GbE
18 x 40GbE
18 x 40GbE
18 x 40GbE
EN4093R
EN4093R
EN6131
EN6131
EN6131
EN6131
EN6131
EN6131
ITESWITCH
ITESWITCH
40GE
IBM PureFlex System
14 Compute Blades each using Dual EN6132 Dual Port 40GbE NIC
4 Switch Blades each using EN6131 SwitchX-2 32 Ports 40GbE Compute I/O 2.24 Tbps @40 GbE Uplink I/O up to 2.24 Tbps @40 GbE
IBM PureFlex with Mellanox 40GbE Interconnect
Single wide chassis: 14x ITEs / Blade Servers support 2 adapters per server
Double-wide chassis: 7x ITEs / Blade Servers support 4 adapters per server
© 2014 Mellanox Technologies 8
CloudNFV: The 1st ETSI ISG NFV Approved PoC
Demo Console
Optimization, Active Virtualization and Demo console:
NFV Orchestration and Metro Ethernet Switches:
Demo Virtual Function:
Traffic Telemetry and DPI as a service:
Servers, Data Center Switching, Lab Facilities. Systems Integration:
Data Path Acceleration:
DemoVirtual Function
Overall Architecture:
High Performance Server and Storage Interconnect
Active Data Center
© 2014 Mellanox Technologies 9
Mellanox and 6WIND ConnectX-3 NIC Driver for Intel® DPDK
6WIND or Intel® DPDK
• Data Plane libraries
• Optimized NIC drivers
Client’s Application Software
High-performance packet processing solutions for
• Gateways
• Security appliances
• UTMs
• Virtual appliances
• etc.
Multicore Processor
……librte_pmd_mlx4
librte_pmd driver provided as an addon into the DPDK (no need to patch the DPDK)
Based on the generic librte_eal and librte_ether API of the DPDK.
Best design since it co-works with the ibverb framework.
librte_crypto_nitrox 6WIND addonsVMware …
© 2014 Mellanox Technologies 10
Neutron Plug-in
OpenStack integration
High performance 10/40/56GbpsSR-IOV enabledOpenFlow enabled eSwitch OpenStack Neutron Plug-inPMD for DPDK: VM OS bypassMulti cores and RSS supportDelivering bare-metal performance
Mellanox NIC SR-IOV with PMD for Intel® DPDK in the Guest VM
OS
VMOS
VMHypervisor
Legacy Software vSwitches
SR-IOV eSwitch
Hardware Offload
OpenFlow enabled
VM
6WIND or Intel® DPDK
• Data Plane libraries
• Optimized NIC drivers
Client’s Application Software
High-performance packet processing solutions for
• Gateways
• Security appliances
• UTMs
• Virtual appliances
• etc.
Multicore Processor
……
librte_pmd_mlx4 librte_crypto_nitrox 6WIND addonsVMware …
10/40/56Gbps
© 2014 Mellanox Technologies 11
Slow Application Performance• 1/10GbE• 50us latency for VM to VM connectivity • Slow VM Migration• Slow Storage I/O
Expensive and inefficient • High CPU overhead for I/O processing • Multiple adapters needed
Limited isolation• Minimal QoS and Security in software
Mellanox NIC Based I/O Virtualization Advantages
Fastest Application Performance• 10/40GbE with RDMA, 56Gb InfiniBand• Only 2us for VM to VM connectivity• >3.5x faster VM Migration• >6x faster storage access
Superb efficiency • Offload hypervisor CPU, more VMs • I/O consolidation
Best isolation• Hardware-enforced security and QoS
OSVM
OSVM
OSVM
Hypervisor
Software based vSwitches
OSVM
OSVM
OSVM
Hypervisor
Legacy Software vSwitches
Hardware Offload + vSwitches
Legacy NICs
© 2014 Mellanox Technologies 12
I/O Virtualization Future – NIC Based Switching
OS
VM
OS
VM
OS
VM
Hypervisor OS
VM
eSwitches (embedded switches)
Physical Ports (pPort)
NIC/HCA
Hardware “LAG”
vPorts with multi-level QoS and hardware based congestion control
Virtual NICs (vNICs)
vPort Security Filters, ACLs, and Tunneling (EoIB/VXLAN/NVGRE)
HW Based teaming
HW Based VM Switching
pPort QoS and DCB
vPort Priority tagging
Controlled via SDN/OpenFlow
eSwitch supported match fields :
• Destination MAC address
• VLAN ID
• Ether Type
• Source/Destination IP address
• Source/Destination UDP/TCP port
eSwitch supported actions:
• Drop
• Allow
• Count
• Trap/Mirror
• Set priority (VLAN priority, egress queue & policer)
© 2014 Mellanox Technologies 13
Mellanox and Radware Defense Pro : SDN demo
The Traditional Way: Bump in the wire Appliances
A Better Way: SDN and OpenFlow with Flow Based Routing
© 2014 Mellanox Technologies 14
ConnectX-3 Pro NVGRE and VXLAN Performance
2 4 8 160
2
4
6
8
10
12
NvGRE Throughput ConnectX-3 Pro 10GbE
VM Pairs
Ba
nd
wid
th G
b/s
1 VM 2 VMs 3 VMs
VxLAN in software 3.5 3.33333333333333
4.28571428571429
VxLAN HW Offload 0.9 0.894736842105263
1.19047619047619
0.251.252.253.254.25
CPU Usage Per Gbit/sec with VxLAN
CP
U%
/ B
an
dw
idth
(G
bit
/sc
e)
1 VM 2 VMs 3 VMs
VxLAN in software 2 3 3.5
VxLAN HW Offload 10 19 21
2.5
7.5
12.5
17.5
22.5
Total VM Bandwidth when using VxLAN
Ba
nd
wid
th [
Gb
/s]
The Foundation of Cloud 2.0
The World’s First
NVGRE / VXLAN Offloaded NIC
© 2014 Mellanox Technologies 15
6WIND demonstration of 195 Gbps Accelerated Virtual Switch
iproute2iptables
Fast Path
IP IPsecOVS
Acceleration
TCP VLAN GRE
MPLS ACL LAG
Custom GTPu NAT
Intel® DPDK
Shared Memory
Statistics
Protocol Tables
Linux Kernel
6WINDGate
fast path
statistics
Linux Networking Stack
6WINDGate
Sync
Daemons
NIC(s)Multicore Processor Platform
Quagga
6WINDGate includes the Mellanox poll mode driver (PMD) - Provide Direct access to the networking hardware: Linux OS Bypass The demo include 5 Mellanox ConnectX®-3 Pro cards with dual 40G Ports.
© 2014 Mellanox Technologies 16
Managing the VM Networking Via OpenFlow / SDN
NeutronPlug-In
SDN Applications
SDN Applications
Cloud Management
OpenStack Manager
SDN Controller
OSVM
Para-virtual
OSVM
OSVM
OSVM
SR-IOV to the VM
10/40GbE or InfiniBand ports
Embedded Switch
OpenFlow Agent
Neutron Agent
Create/delete, configure policy per VM vNIC
Servers
tap tap
OpenFlow control over switch and NIC
Adapter hardware acceleration for OpenFlow and overlay functions
Native integration to OpenStack and SDN controllers
The Benefits of VM Provision & Fabric Policy in HardwareIsolation, Performance & Offload, Simpler SDN
© 2014 Mellanox Technologies 17
Allow Service Orchestration over the Telecom WAN Network. Leverage OpenStack for the Telecom Datacenter Leverage Mellanox Neutron Plug-in allow SR-IOV
• Near bear metal performance to the VMs
CYAN Blue Planet: Carrier Grade SDN Orchestration Platform
© 2014 Mellanox Technologies 18
Using CloudBand, service providers can create cloud services that offer virtually limitless growth and that capitalize on their broad range of distributed data center and network resources. By building their own carrier clouds, service providers can meet stringent service level agreements (SLAs) and deliver the performance, access and security that enterprises and consumers demand.
“Network Function Virtualization can provide service providers with significant gains in automation and reductions in costs. Working in conjunction with the Alcatel-Lucent CloudBand Ecosystem, Mellanox’s industry-leading, end-to-end InfiniBand and Ethernet interconnect products with support for NFV provides cloud and telecommunications networks with best-in-class virtualization features, performance and efficiency.”
Alcatel-Lucent CloudBand: Mellanox Solution Partner
© 2014 Mellanox Technologies 19
Calsoft Labs: Virtual B-RAS solution
High Performance Virtual B-RAS solution
Addresses Broadband service requirements
Intel® DPDK optimized solution
Powered by highly optimized data plane processing software from 6WIND
Performance & capabilities accelerated by Mellanox ConnectX-3 NIC in DELL servers
Delivers 256K PPPoE tunnels on a 2U rack DELL server with Intel Sandy Bridge
Can be integrated with Calsoft Labs Cloud NOC™ orchestration framework or third party NFV management systems.
PPPoX termination with VRF support for Multi-tenants
DHCP support:o DHCP Relayo DHCP Server for IPv4/IPv6
Tunneling:o L2TP and GRE with VRF supporto IPsec/PPP interworking per VRF
AAA (Authentication, Authorization, Accounting) – RADIUS
Security:o IP address trackingo Centralized Firewall
QoS:o QoS per serviceo QoS per subscriber, Hierarchical QoSo Dynamic Bandwidth management
Key Features
POWERED BY
© 2014 Mellanox Technologies 20
RDMA/RoCE I/O Offload
RDMA over InfiniBand or Ethernet
KE
RN
EL
HA
RD
WA
RE
US
ER
RACK 1
OS
NIC Buffer 1
Application
1Application
2
OS
Buffer 1
NICBuffer 1
TCP/IP
RACK 2
HCA HCA
Buffer 1Buffer 1
Buffer 1
Buffer 1
Buffer 1
© 2014 Mellanox Technologies 21
I/O Size - 64 [KB]0
2000
4000
6000
80006200
1200 800
SCSI Write Example, Linux KVM
iSER 16 VMs Write 10GbE
Fiber Channel - 8GbBa
nd
wid
th
[MB
/s]
Accelerating Cloud Performance
01020304050
38
10
Migration of Active VM
10GE-A 40GE-A
Tim
e [
s] 3.5X Faster
6X Faster
Storage
Migration
Virtualization
3.5X
20X
6X
20X Faster
Message Size - 256 [bytes]0
20
40
6040
2
VM-to-VM Latency Performance
TCP ParaVirtualization
RDMA Direct Access
La
ten
cy
[
us
]
10 GbE
Fibre Channel 8Gb
40 GbE
iSER 40GbE VMs Write
© 2014 Mellanox Technologies 22- Mellanox Confidential -
“To make storage cheaper we use lots more
network!
How do we make Azure Storage scale?
RoCE (RDMA over Ethernet) enabled at
40GbE for Windows Azure Storage,
achieving massive COGS savings”
Microsoft Keynote at Open Networking Summit 2014 on RDMA
RDMA at 40GbE Enables Massive Cloud Saving For Microsoft Azure
KeynoteAlbert Greenberg, Microsoft
SDN Azure Infrastructure
© 2014 Mellanox Technologies 23
Using OpenStack Built-in components and management (Open-iSCSI, tgt target, Cinder), no additional software is required, RDMA is already inbox and used by our OpenStack customers !
Mellanox enable faster performance, with much lower CPU% Next step is to bypass Hypervisor layers, and add NAS & Object storage
Faster Cloud Storage Access
Hypervisor (KVM)
OS
VMOS
VMOS
VM
Adapter
Open-iSCSI w iSER
Compute Servers
Switching Fabric
iSCSI/iSER Target (tgt)
Adapter Local Disks
RDMA Cache
Storage Servers
OpenStack (Cinder)
Using RDMA to accelerate
iSCSI storage
1 10 100 10000
1000
2000
3000
4000
5000
6000
7000
iSER 4 VMs Write
iSER 8 VMs Write
iSER 16 VMs Write
iSCSI Write 8 vms
iSCSI Write 16 VMs
I/O Size [KB]
Band
wid
th [M
B/s]
PCIe Limit
6X
© 2014 Mellanox Technologies 24
Mellanox CloudX OpenCloud
Any Software
Open NIC, Open Switch, Open Server, Open Rack
© 2014 Mellanox Technologies 25
Fat-tree SDN Switch Network
40GbE56GbpsIB FDR
Fabric 40Gbps
Fabric
Platform 1
Platform 2
40GbpsFabric
Platform X
40Gbps
Server Attached and or Network Attached HWA
DPI
BRAS
SGSNGGSN
PE Router
Firewall
CG-NAT SBC
STB
Ethernet Ethernet EthernetAAAAAAAAAAAA
BBBBBBBBBBBB
CCCCCCCCCCCC
AAAA
BBBB
CCCC
AAAA
BBBB
CCCC
AA
AA
BB
BB
CC
CC
AA BB CC AA BB CC
AA BB CC
AA BB CC AA BB CC
AA BB CC
AA BB CC
Server Attached and or Network Attached HWA areNon-Scalable and lead back to the custom appliance based model
© 2014 Mellanox Technologies 26
Fat-tree SDN Switch Network
40GbE56GbpsIB FDR
SX1024 Ethernet Switch
HWA / Signal Processing
Fabric 40Gbps
SX1024 Ethernet Switch
HWA / Signal Processing
Fabric
Platform 1
Platform 2
40Gbps
SX1024 Ethernet Switch
HWA / Signal Processing
Fabric
Platform X
Nx40Gbps Nx40Gbps Nx40Gbps
40Gbps
Remote HWA as a Service in NFV Cloud Model
DPI
BRAS
SGSNGGSN
PE Router
Firewall
CG-NAT SBC
STB
Ethernet Ethernet Ethernet
RD
MA
/ R
oC
E
RD
MA
/ R
oC
E
RD
MA
/ R
oC
E
AA BB CC AA BB CC AA BB CCAA BB CC AA BB CC
AA BB CC
© 2014 Mellanox Technologies 27
Fat-tree SDN Switch Network
10/40/100Gbps
ToR
Aggregation
Ethernet SwitchSAN/NAS Storage
Compute Storage
10/40/100Gbps
10/40/100Gbps
Ethernet SwitchSAN/NAS Storage
Compute Storage
Rack 1 Rack 2 10/40/100Gbps
10/40/100Gbps
Ethernet SwitchSAN/NAS Storage
Compute Storage
Rack n
12x10/40/100Gbps 12x10/40/100Gbps 12x10/40/100Gbps
10/40/100Gbps
iSCSI SAN/NAS Storage Architecture in an NFV Cloud model
iSCSI SAN/NAS Storage over Standard Ethernet Network: Shared Resource
RD
MA
/ R
oC
E
RD
MA
/ R
oC
E
RD
MA
/ R
oC
E
© 2014 Mellanox Technologies 28
The GPU as a Service Implementation
GPUs as a network-resident service• Little to no overhead when using FDR InfiniBand
Virtualize and decouple GPU services from CPU services• A new paradigm in cluster flexibility• Lower cost, lower power and ease of use with shared GPU resources• Remove difficult physical requirements of the GPU for standard compute
servers
GPU
CPU
GPU
CPUGPU
CPU
GPU
CPU
GPU
CPU
GPUs in every server GPUs as a Service
CPUVGPU
CPUVGPUv
CPUVGPU
GPUGPUGPUGPUGPUGPUGPUGPUGPUGPUGPU
© 2014 Mellanox Technologies 29
Local and Remote GPU HWA Solutions
Application/GPU servers GPU as a Service with Mellanox GPUDirect™ 1.0
rCUDA daemon
Network InterfaceCUDA
Driver + runtimeNetwork Interface
rCUDA library
Application
Application Server Side
Remote GPU Side
Application
CUDADriver + runtime
CUDA Application
Mellanox GPUDirect™ 1.0 enables remote access from every node to any GPU in the system with a single copy
Data path is copied through CPU Memory to or from Network Interface and GPU HWA Device
GPU as a Service with Mellanox PeerDirect™
Network InterfaceCUDA
Driver + runtimeNetwork Interface
rCUDA library
Application
Application Server Side
Remote GPU Side
P2P Plugin
HCA Driver
Peer Driver
EXPORT Peer DeviceMemory Functions
ib_umem_* functions are “tunneled”thru the p2p plugin module
Mellanox PeerDirect™ enables remote access from every node to any GPU in the system with a zero copy
Data path is directly from Network Interface to GPU HWA Device
© 2014 Mellanox Technologies 30
Ideal for Cloud Datacenter, Data Processing Platforms and Network Functions Virtualization• Leading SerDes Technology: High Bandwidth – Advanced Process• 10/40/56Gb VPI with PCIe 3.0 Interface• 10/40/56Gb High Bandwidth Switch: 36 ports of 10/40/56Gb or 64 ports of 10Gb• RDMA/RoCE technology: Ultra Low Latency Data Transfer• Software Defined Networking: SDN Switch and Control End to End Solution• Cloud Management: OpenStack integration
Paving the way to 100Gb/s Interconnect• End to End Network Interconnect for Compute/Processing and Switching• Software Defined Networking
High Bandwidth, Low Latency and Lower TCO: $/Port/Gb
Mellanox Interconnect Solutions
Mellanox Interconnect is Your competitive Advantage!
Thank You