multicore i/o processors in virtual data centers
DESCRIPTION
TRANSCRIPT
5th Annual
Application of Multicore I/O ppProcessors in
Virtualized Data CentersVirtualized Data CentersNabil Damouny
Rolf Neugebauer
ESC – Multicore ExpoSan Jose, CA
April 27 2010April 27, 2010
5th Annual
Outline Networking Market Dynamics Cloud Computing & the Virtualized Data Center The Need for an Intelligent I/O Coprocessor I/O Processing in Virtualized Data centers
1. SW-based (Bridge & vSwitch) 2. I/O Gateway3. Virtual Ethernet Port Aggregation (VEPA)4. Server-based
I/O Coprocessor Requirements Meeting the I/O Coprocessor Challenge in Virtualized Data Centers Heterogeneous Multicore Architecture Netronome’s Network Flow Processors and Acceleration Cards
Summary and Conclusion.
Data center virtualization is not complete until the I/O subsystem is also virtualized.
2ESC Silicon Valley – April, 2010 2
y
2
5th Annual
About Netronome• Fabless semiconductor company, developing Network Flow Processing
solutions for high-performance, programmable, L2-L7 applications
• Network coprocessors for x86 designs• Most complex processing per packet than any other architecture• Best in class performance per watt• Unmatched integration with x86 CPUs
• Family of products including processors, acceleration cards, development tools, software libraries and professional services
Intel Agreements Summary• Founded in 2003
• Solid background in networking, communications, security,voice and video applications, high-performance computing
• Comprised of networking and silicon veterans
Intel Agreements Summary• IXP28XX Technology License• SDK Software License• HDK Hardware License
S l d M k ti• Global Presence
• Boston, Massachusetts; Santa Clara, California; Pittsburgh,Pennsylvania; Cambridge, United Kingdom; Shenzhen, China;Penang, Malaysia
• Sales and Marketing• QPI Technology License
33ESC Silicon Valley – April, 2010
3
5th Annual
Networking Market DynamicsNetworking Market DynamicsEventually, every packet from every flow of communications Application
Market Drivers
of communications services will be intelligently processed.
ApplicationAwarenessEmail, Web, Multimedia
ContentInspection
Voice. Video, Data,Executables
IntegratedSecurity
VPN, SSL, Spam,Anti-Virus, IDS/IPS, ExecutablesFirewall
IncreasingB d idth
DeviceIntelligent
NetworkingS it hi R tiBandwidth
Millions of packets and flows at 10GigE
and beyond
VirtualizationMulticore, Multi-OSMulti-app, Multi-I/O
Switching, Routing,WiMax, 3GPP LTE,Security Blades &Appliances, Data
Center Servers
Source: Morgan Stanley
Increasing Bandwidth, Greater Security Requirements and the need for Application and Content-aware Networking are Driving the Evolution to
Intelligent Networking (L2-L7) from Today’s Simpler (L2-L3 only) Networks.
4ESC Silicon Valley – April, 2010 4
4
5th Annual Unified Computing in Virtualized Data centers .…
Requires Intelligent NetworkingRequires Intelligent Networking Unified Computing: The convergence of computing, networking, and storage in a
virtualized environment Applies to the enterprise (private or internal) and service providers
Environment: Uncorrelated high I/O data rates Networking Networking Web servers, especially virtualized servers Unified Computing - combination of servers and networking
Requirements for high-performance intelligent networking I/O coprocessing for multicore IA/x86 to scale applications Intelligent flow-based switching for inter-VM communications Manage complex high performance networking interfaces
The advent of many VMs and the need for IOV creates a new set of requirements that mandates a more intelligent approach for managing I/O.
55ESC Silicon Valley – April, 2010
5
5th Annual Cloud Computing … Definition & Services
Cloud Computing Defined: IT-related capabilities are provided “as a service” using Internet
technologies to multiple external customers. P blic Clo ds Public Clouds
Private Clouds
Types of services available in Cloud Computing: Software-as-a-service: Software applications delivered over the
Web
Infrastructure- as-a-service: Remotely accessible server and storage capacitystorage capacity
Platform- as-a-service: compute-and-software platform that lets developers build and deploy Web applications on a hosted infrastructureinfrastructure.
Cloud computing technologies play a crucial role in allowing companies to scale their data center infrastructure to meet performance and TCO requirements.
6ESC Silicon Valley – April, 2010 6
6
5th Annual
The Need for an I/O Coprocessor … In the Virtualized data Center… In the Virtualized data Center
Efficient delivery of data to VMs at high rates (20+ Gbs) Requires intelligent IOV solution.Requires intelligent IOV solution.
Just L2+ processing is not enough VLANs, ACLs, etc only cover the base Stateful load-balancing requires flow-awareness
Clouds are hostile environments: Stateful firewalls, IPS/IDS, deep packet inspection capabilities
Multicore x86 CPUs show poor packet processing performance
A it bl f h dli illi f t t f l fl Are unsuitable for handling millions of stateful flows Have high power consumption
Introduce an intelligent I/O-Coprocessor to assist x86 Multicore CPUs
7ESC Silicon Valley – April, 2010 7
7
5th Annual
IDC … on I/O VirtualizationVirtualization
“If I/O is not sufficient, then it could limit all the gains brought about by the virtualization process”about by the virtualization process
I/O subsystem needs to deliver peak throughput and lower latency to the VMs and to the applications they host.
As the VM density increases, most customers are scaling I/O capacity by installing more adapters.
IOV is simply the abstraction of the logical details of I/O from the physical, essentially to separate the upper-layer protocols from the physical connection or transport.
If I/O is not sufficient, then it could limit all the gainsbrought about by the virtualization process
8ESC Silicon Valley – April, 2010 8
8
5th Annual I/O Coprocessor in a Virtualized Heterogeneous Multicore Architecture
Multicore CPU Multicore CPUVM1
OS
VM2OS
VM3
OS
VMnOS
VM1
OS
VM2OS
VM3
OS
VMnOS
VNICVNIC VNIC VNIC VNICVNIC VNIC VNIC
x86x86Chipset Control plane
PCIe Gen2
I/O C
Data planePCIe Gen2
IOV10GE 10GE
I/O CoprocessorHigh-speed
Serial interface
9
Interlaken * Future
ESC Silicon Valley – April, 2010 99
5th Annual I/O Coprocessor Requirements in a Heterogeneous Multicore Architecture
Addressing the Inter-VM Switching and I/O Challenge
Inter-chip access•Demultiplexing and classification
TCP ffl dore
6 • TCP offload
• Host offload for burdensome I/O, security, DPI functions
Mul
tico
x86
IOV• Zero copy, big block transfers to multiple cores, VMs or endpoints
• Full I/O virtualization with Intel
Mul
ticor
ew
Pro
cess
or
IOV
VTd
• Programmable egress traffic management
MFl
ow
Heterogeneous Multicore Processing Solutions are >4x performance of (Multicore x86 + standard NIC).
1010
p ( )
ESC Silicon Valley – April, 2010 10
5th Annual Challenges in Virtualized Data Centers
Rack of single core servers and switches
5 years ago
Many virtual machines and cores in one server
2004 2009
What was a rack of servers five years ago is now a single server including
networking (switch, IPS, FW..)
11
2004 2009 g ( )
Many cores results in 10’s of VMs and network I/O challenge.
11
ESC Silicon Valley – April, 2010 11
11
5th Annual
IEEE 802.1 Addressing Ethernet Virtualization in data CenterVirtualization in data Center
Current IEEE 802.1Q Bridges Do not allow packet to be sent back to same port within same VLAN
D t h i ibilit i t id tit f i t l VM ithi h i l t ti Do not have visibility into identity of virtual VMs within physical stations Extensions to Bridge and End Station behaviors needed to support
virtualization
Q ( ) / ( IEEE 802.1Qbg EVB (Edge Virtual Bridging), VEB/VEPA (Virtual Ethernet Bridge / Virtual Ethernet Port Aggregation) & 802.1Qbh Bridge Port Extension (PE) Address management issues created by the explosion in VMs in data
centers sharing access to network through embedded bridgecenters – sharing access to network through embedded bridge
Discuss methods to offload policy, security, and management processing from virtual switches on NICs and blade servers, to physical Ethernet switchesswitches
Managing Network I/O and Inter-VM Switching will Require Various Implementation Alternatives
1212
p
5th Annual
OpenFlow Switching / vSwitch
OpenFlow Switching includes: Flow Tables used to implement packet processing
OpenFlow protocol used to manipulate the flow entries OpenFlow protocol used to manipulate the flow entries. Enables acceleration of stateful security functions: Application VM with associated security VM (e.g. FW, IPS, anti-virus). Network traffic will be classified and transit the security VM prior to being y p g
allowed to reach the application VM. If new flow has been “blessed” pass packets straight to App VM. Flow based policies for white/black lists (not just L2)
Software-based virtual switches will have difficulty coping with: Large numbers of flows per second; Many packets per second, i.e. high throughput at small packet sizes;
A i l l t Assuring low latency.
Network Flow Processors architecture fits well with OpenFlow.
13ESC Silicon Valley – April, 2010 13
15
5th Annual
1A. Software-Based Switching (Bridge) in Virtual Server(Bridge) in Virtual Server
Software virtual switchVMWare, Xen & Linux Bridge
(initially had noACL’ VLAN’ t)ACL’s, VLAN’s support)
VMWare and Xen putVMWare and Xen put switches as software
modules in their VMM - but they lacked key features, and
were slow!
14
were slow!
1413
5th Annual 1B. Enhanced Software-Based Switching (vSwitch) in Virtual Server
Cisco Nexus 1000VCisco Nexus 1000VCisco Nexus 1000V (ACLs, VLANs, IOS)
for VMWare; OpenVSwitch (flow
Cisco Nexus 1000V (ACLs, VLANs, IOS)
for VMWare; OpenVSwitch (flow
based) for XenServerbased) for XenServer
But with added functionality the performance reduces
15
the performance reduces hugely - what happens if FW
and IPS are added?Example:Cisco Nexus N1000
Good Solution for low-performance systems. High Latency
1515
Good Solution for low performance systems. High Latency
ESC Silicon Valley – April, 201014
5th Annual
2. I/O Gateway
Delivers Three Key Functions:• In‐rack server communications switch
l t f k Eth t it h• replaces top‐of‐rack Ethernet switch• 10/20Gbps PCIe fabric
• Centralized enclosure for I/O adapters used by servers in the rackSource: Aprius
Note: Xsigo Next I/O Virtensys• shared (network, storage)• assigned (specialty accelerators)
• Virtualized I/O configuration
Note: Xsigo, Next I/O, Virtensysuse similar concepts
New approach using PCIe or Infiniband interconnects,and security functions within gateway
1616
and security functions within gateway
ESC Silicon Valley – April, 201016
5th Annual 3. Virtual Ethernet Port Aggregation (VEPA)
Offloads policy, security and management processing from virtual switches on NICs and blade servers, i t h i l Eth t it h (into physical Ethernet switches (e.g. ToR switch)
IEEE VEPA is an extension to physical and virtual switching
VEPA allows VMs to use external switches to access features like ACLs, policies, VLAN assignments.
All Inter-VM traffic has to traverse the physical network infrastructure. Additional security features, load balancers
etc. implemented in external appliances
17ESC Silicon Valley – April, 2010 17
17
5th Annual 4. Moving Switching Into The Server
Switch moved from IA/x86 into
Netronome NFP-32xx
Moving the switching to Netronome based Coprocessor leads to release of cycles on IA and increased application performance Adding IPS or FW is no
18
performance. Adding IPS or FW is no problem!
Server-based NIC or LoM - Use Existing Wiring. Security processing in the Server
18
18
Server based NIC or LoM Use Existing Wiring. Security processing in the Server
18
5th Annual
Intelligent I/O Sharing Alternatives; SummaryAddressing Inter-VM Switching and the Network I/O Challenge
Software-based switch I/O Gateway VEPA Server-based
switch
P fVery good – except
Performance Poor Very good for inter-VM switching
Very good
Power Poor Wastes IA Cycles Good Good Good
U l t d dManagement Network or server
admin
Unclear – standard if I/O Gateway
implements a switch
Network admin owns
Depends who owns the switch
Security Software-basedAdds Latency
Centralized. Adding security increases
Centralized. Adding security increases Centralized
+Distributedy Adds Latency cost and latency cost and latency +Distributed
Flexibility High Depends on architecture
Medium –standard switch High
Reliability Low Good Good Good Di t ib t dReliability Distributed
Cost Less costly but wastes IA cycles
<VEPA: Card in server <CNA & ToR Sw part of Gateway
Low, but higher for intelligent ToR
switches
<VEPA – card is same as CNA in ToR. But VEPA much simpler,
cheaper
1919ESC Silicon Valley – April, 2010
19
5th Annual Performance of SR-IOV NIC, Linux Bridge and a vSwitch
vSwitches require more packet processing & hence drop packets much earlier.
2020ESC Silicon Valley – April, 2010
20
much earlier.
5th Annual Performance of SR-IOV NIC, an old style Bridge and a vSwitch
vSwitches Provide more Flexibility and Functionality, but…Drop Packets Earlier; Consumes more CPU Cycles
2121ESC Silicon Valley – April, 2010
21
Drop Packets Earlier; Consumes more CPU Cycles
5th Annual Performance & CPU Load of SR-IOV NIC, Linux Bridge and a vSwitch
Combining Flexibility of vSwitches with Performance of SR-IOV NICsRequires an Intelligent I/O Coprocessor
2222ESC Silicon Valley – April, 2010
22
5th Annual
Requirements for I/O CoprocessorCoprocessor
Intelligent, Stateful, Flow-based switchingIntelligent, Stateful, Flow based switchingIntegrated IOVLoad balancingLoad balancingIntegrated securityGlue less interface to CPU subsystemGlue-less interface to CPU subsystem
Netronome “Netrok Flow Processor” is an Intelligent I/O Coprocessor
23ESC Silicon Valley – April, 2010 23
23
5th Annual Netronome Silicon & PCIe Cards NFP-3240 based PCIe Cards NFP-3240 based PCIe Cards 20Gbps of line rate packet and flow processing per NFE
6x1GigE, 2x10GigE (SPF+), netmod interfacesg g ( )
PCIe Gen2 (8 lanes)
Virtualized Linux drivers via SR-IOV
Flexible/configurable memory options
Packet time stamping with nanosecond granularity
Integrated cryptography
Packet capture and Inline applications
Hardware-based stateful flow management
TCAM based traffic filtering
D i fl b d l d b l i t 86 CPU Dynamic flow-based load balancing to x86 CPUs
Highly programmable, intelligent, virtualized acceleration cards for network security appliances and virtualized servers
24
© 2009 Netronome Systems Confidential 24
24
5th Annual
Summary and ConclusionConclusion
Inter-VM switching and intelligent I/O device sharing are integral part of data center virtualization There are many implementations alternatives There are many implementations alternatives
Heterogeneous architecture addresses this challenge I/O Coprocessor Complements multicore x86 with packet processingI/O Coprocessor Complements multicore x86 with packet processing
performance; handling millions of stateful flows; Lowering power consumption
Netronome’s NFP-32xx processor family integrates inter-VM switching and I/O virtualization capabilities Netronome’s PCIe card family integrates the intelligent, programmable, y g g , p g ,
flow-based, Network Card functionality with IOV, for the data center.
Heterogeneous architecture (Network Flow Processing + Multicore x86) addresses the need for inter-VM switching and intelligent I/O sharing.
2525
g g g
ESC Silicon Valley – April, 201025
5th Annual
BackupBackup
26
5th Annual Session Info & AbstractAbstract
https://www.cmpevents.com/ESCw10/a.asp?option=C&V=11&SessID=10701 Application of Multicore I/O Processors in Virtualized Data Centers Application of Multicore I/O Processors in Virtualized Data Centers
Speaker: Nabil Damouny (Senior Director, Marketing, Netronome Systems), Rolf Neugebauer (Staff Software Engineer, Netronome Systems)Date/Time: (April 27, 2010) 8:30am — 9:15amFormats: Audience level: Intermediate
Presentation AbstractThis presentation will discuss the applications of integrated multicore processors, optimized for networking I/O applications, in virtualized data centers. Data centers are increasingly being built with multicore virtualized servers. As the number of
i th i th b f VM t f tcores in the server increases, the number of VMs goes up at an even faster pace. These servers need to have access to high-performance network I/O, resulting in the requirement to implement I/O sharing in a virtualized, intelligent way. In addition, a mechanism for high-performance inter-VM switching will also be needed. Flow-based solutions, such as flow classification, routing and load balancing supporting in excess of 8M flows are effective ways to address thebalancing, supporting in excess of 8M flows, are effective ways to address the above challenges. Track: Multicore Expo – Networking & Telecom
27
5th Annual NFP-32xx Integrates Flow-Based L2 Functions
For Inter-VM SwitchingFor Inter-VM Switching• Flow Classification• Switching between physical networking ports• Switching between virtual NICs, without host intervention
Switching between any physical port and any virtual port• Switching between any physical port and any virtual port• Stateful flow-based switching
VM1
C1
VM2
C2
VM3
C3
VMn
Cn
CPU (Host)
TxRxEthernet
NFE
( )VNICVNIC VNIC VNIC
NFP-32xx SupportsTxRx
Interconnection Link
EthernetSwitch
NFP-32xx Supports > 8 Million Flows
28
Interconnection Link
28ESC Silicon Valley – April, 2010
5th Annual I/O Virtualization (IOV) Requirements
Support multiple virtual functions (VFs) over PCIe Lower cost, lower power
Dynamically assign VFs to different VMs Support multiple NIC functions: Crypto, PCAP, etc…pp p yp , ,
Capability to pin I/O device to specific CPU core/VM Enable consolidation and isolation Enable consolidation and isolation
Flow-based load balancing to x86 multicore CPUsHi h f t l Higher-performance at lower power
Intelligent I/O virtualization is required in multicore CPU designsPCI-SIG introduced SR-IOV standards for this purpose
2929ESC Silicon Valley – April, 2010
5th Annual
The Need for Intelligent I/O VirtualizationI/O Virtualization• Use commodity multicore hardware• Virtualization for:
C lid ti• Consolidation• Move “legacy” applications & OSs
to multicore• Isolationso at o
• I/O devices need to be shared
• Load balance/direct traffic to VMs• Pin VMs to cores• Direct traffic to cores/VMs
I l t d i f VM• Isolate device access from VMs
A good IOV solutions provides all of the above!
30
A good IOV solutions provides all of the above!
30ESC Silicon Valley – April, 2010
5th Annual
NFP Security Capabilities Internal instruction unit Internal instruction unit DMA, bulk crypt/hash, PKI control, sequenced through cryptography
instructions with multithreaded controller
H d l t d b lk t h (20 Gb ) Hardware accelerated bulk cryptography (20+Gbps) AES-128,192, 256 bit keys ECB, CBC, GCM, CTR, OFB, CFB, CM, f8 support
3DES, DES with3DES, DES with ECB, CBC support
ARC-4 SHA-1, SHA-1 HMAC
S S C f SHA-2, SHA-2 HMAC family 224/256/384/512-bit support
PKI modular exponentiation
PKI Modular ExponentiationEncrypt/Authenticate
p 20k+ ops Up to 2048 bit Supports CRT
Integrated high performance modern crypto algorithms, with a PKI engine, in a multi-threaded programmable environment
31
© 2010 Netronome, Inc. - Confidential. 31