a.a. 2010-2011 - roma tre universityrimondin/courses/rcng1011/slides/...os-level virtualization...

89
A.A. 2010-2011

Upload: others

Post on 15-Feb-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: A.A. 2010-2011 - Roma Tre Universityrimondin/courses/rcng1011/slides/...OS-Level Virtualization Single OS/kernel Actually isolation (of contexts), not virtualization No emulation overhead

A.A. 2010-2011

Page 2: A.A. 2010-2011 - Roma Tre Universityrimondin/courses/rcng1011/slides/...OS-Level Virtualization Single OS/kernel Actually isolation (of contexts), not virtualization No emulation overhead

Massimo RIMONDINI

RCNG – 02/11/10

Tecnologie per la Virtualizzazione delle 

Reti di Calcolatori

Page 3: A.A. 2010-2011 - Roma Tre Universityrimondin/courses/rcng1011/slides/...OS-Level Virtualization Single OS/kernel Actually isolation (of contexts), not virtualization No emulation overhead

Virtualization

HA! HA! Easy!

Page 4: A.A. 2010-2011 - Roma Tre Universityrimondin/courses/rcng1011/slides/...OS-Level Virtualization Single OS/kernel Actually isolation (of contexts), not virtualization No emulation overhead

Virtualization

D’Oh!

Page 5: A.A. 2010-2011 - Roma Tre Universityrimondin/courses/rcng1011/slides/...OS-Level Virtualization Single OS/kernel Actually isolation (of contexts), not virtualization No emulation overhead

Virtualization

“...the act of decoupling the (logical) service from its (physical) realization...”

“execution of software in an environment separated from the underlying hardware resources”

“sufficiently complete simulation of the underlying hardware to allow software, typically a guest operating system, to run unmodified”

“complete simulation of the underlying hardware”

Page 6: A.A. 2010-2011 - Roma Tre Universityrimondin/courses/rcng1011/slides/...OS-Level Virtualization Single OS/kernel Actually isolation (of contexts), not virtualization No emulation overhead

Virtualization

Full virtualization (emulation)

Partial virtualization

Paravirtualization(OS-assisted virtualization)

Hardware-assisted virtualization

OS-Level virtualization

Page 7: A.A. 2010-2011 - Roma Tre Universityrimondin/courses/rcng1011/slides/...OS-Level Virtualization Single OS/kernel Actually isolation (of contexts), not virtualization No emulation overhead

Full Virtualization

Emulation of a fully fledged hardware box (e.g., x86)

Binary translation

For non-virtualizable instructions(have different semantic in Rings ≠0)

Direct execution

For performance

VirtualBox, Parallels, ~VMware, Microsoft Virtual PC, QEMU, Bochs

Picture from Understanding Full Virtualization, Paravirtualization, and Hardware Assist. White Paper. VMware.

(Emulation)

Page 8: A.A. 2010-2011 - Roma Tre Universityrimondin/courses/rcng1011/slides/...OS-Level Virtualization Single OS/kernel Actually isolation (of contexts), not virtualization No emulation overhead

Partial Virtualization

E.g., address space virtualization

Supports multiple instances of a specific hardware device

Does not support running a guest OS

FreeBSD network stack virtualization project, IBM M44/44X

(More) of historical interest

Address space “virtualization” is a basic component in modern OSs

Page 9: A.A. 2010-2011 - Roma Tre Universityrimondin/courses/rcng1011/slides/...OS-Level Virtualization Single OS/kernel Actually isolation (of contexts), not virtualization No emulation overhead

Paravirtualization(OS-assisted virtualization)

VMM{+,/}Hypervisor

Guest OS communicates with hypervisor

Changes to guest OS (to prevent non-virtualizable instructions from contacting bare metal)

Better performance

Support for hardware-assisted virtualization

Xen, VMware, Microsoft Hyper-V, Oracle VM Server for SPARC, VirtualBox

Picture from Understanding Full Virtualization, Paravirtualization, and Hardware Assist. White Paper. VMware.

Page 10: A.A. 2010-2011 - Roma Tre Universityrimondin/courses/rcng1011/slides/...OS-Level Virtualization Single OS/kernel Actually isolation (of contexts), not virtualization No emulation overhead

Hypervisor (and VMM)

Hypervisor

Type 1 (native): runs on bare metal; loads prior to the OSMicrosoft Hyper-V, VMware vSphere

Type 2 (hosted): runs within a conventional OS

Virtual Machine Monitor

Same as hypervisor (?)

Picture from Understanding Full Virtualization, Paravirtualization, and Hardware Assist. White Paper. VMware.

Page 11: A.A. 2010-2011 - Roma Tre Universityrimondin/courses/rcng1011/slides/...OS-Level Virtualization Single OS/kernel Actually isolation (of contexts), not virtualization No emulation overhead

Transparent Paravirtualization

Photo credit goes to Flickr user Alexy.

Huh?

Page 12: A.A. 2010-2011 - Roma Tre Universityrimondin/courses/rcng1011/slides/...OS-Level Virtualization Single OS/kernel Actually isolation (of contexts), not virtualization No emulation overhead

Transparent ParavirtualizationVirtual Machine Interface (VMI)

Single VMI-compliant guest kernel

VMI calls may have two implementations

inline native instructions (run on bare metal)

indirect calls to hypervisor

paravirt-ops

IBM+VMware+Red Hat+XenSource

Part of Linux kernel since 2.6.20

Picture from Understanding Full Virtualization, Paravirtualization, and Hardware Assist. White Paper. VMware.

Page 13: A.A. 2010-2011 - Roma Tre Universityrimondin/courses/rcng1011/slides/...OS-Level Virtualization Single OS/kernel Actually isolation (of contexts), not virtualization No emulation overhead

Hardware-assisted Virtualization

Hypervisor runs below Ring 0

Sensitive calls are automatically trapped to the hypervisor

Effective guest isolation

AMD-V (Pacifica)Intel VT-x (Vanderpool)

VirtualBox, KVM, Microsoft Virtual PC, Xen, Parallels, ...

Picture from Understanding Full Virtualization, Paravirtualization, and Hardware Assist. White Paper. VMware.

Page 14: A.A. 2010-2011 - Roma Tre Universityrimondin/courses/rcng1011/slides/...OS-Level Virtualization Single OS/kernel Actually isolation (of contexts), not virtualization No emulation overhead

OS-Level Virtualization

Single OS/kernel

Actually isolation (of contexts), not virtualization

No emulation overhead

Requires host kernel patch

Share same system call interface

Limits the set of runnable guests

Processes in a virtual server are regular processes on the host

Resources (e.g., memory) can be requested at runtime

Linux VServer, Parallels Virtuozzo Containers, OpenVZ, Solaris Containers, FreeBSD Jails; to a certain extent, UMview, UML

Page 15: A.A. 2010-2011 - Roma Tre Universityrimondin/courses/rcng1011/slides/...OS-Level Virtualization Single OS/kernel Actually isolation (of contexts), not virtualization No emulation overhead

Able to run guest OS

Unmodified Guest

Unmodified Host

Overhead Flexibility

Full virtualization (emulation) ✓ ✓ Depends High Limited

Partial virtualization ✗ ✓ ✓ Low Limited

Paravirtualization(OS-assisted virtualization) ✓ ✗ ✗ Low High

Hardware-assisted virtualization ✓ ✓ ✓

Mostly offloaded

to hardware

Average

OS-Level Virtualization

Almost ✗ ✗ Low High

Page 16: A.A. 2010-2011 - Roma Tre Universityrimondin/courses/rcng1011/slides/...OS-Level Virtualization Single OS/kernel Actually isolation (of contexts), not virtualization No emulation overhead

Which virtualization for networking?

Page 17: A.A. 2010-2011 - Roma Tre Universityrimondin/courses/rcng1011/slides/...OS-Level Virtualization Single OS/kernel Actually isolation (of contexts), not virtualization No emulation overhead

Requirements

Depend much on the context

Operational network

Experimentation

Anyway...

Performance and scalability

FlexibilityConfiguration

Programmability (for development)

Support for mobility

Strong isolation

Ahem.... usability

Page 18: A.A. 2010-2011 - Roma Tre Universityrimondin/courses/rcng1011/slides/...OS-Level Virtualization Single OS/kernel Actually isolation (of contexts), not virtualization No emulation overhead

Tools for Managing Virtual Network Scenarios

Page 19: A.A. 2010-2011 - Roma Tre Universityrimondin/courses/rcng1011/slides/...OS-Level Virtualization Single OS/kernel Actually isolation (of contexts), not virtualization No emulation overhead

Netkit

Roma Tre University

VM engine: UML

VM interconnection: uml_switch

Core: shell scripts

Routing engine: Quagga, XORP

Lab description: (mostly) uses native router language

Lightweight

Easy-to-share labs

Several networking technologies, including MPLS forwarding

Page 20: A.A. 2010-2011 - Roma Tre Universityrimondin/courses/rcng1011/slides/...OS-Level Virtualization Single OS/kernel Actually isolation (of contexts), not virtualization No emulation overhead

Netkit

Page 21: A.A. 2010-2011 - Roma Tre Universityrimondin/courses/rcng1011/slides/...OS-Level Virtualization Single OS/kernel Actually isolation (of contexts), not virtualization No emulation overhead

VNUML

Universidad Politécnica de Madrid, Telefónica I+D

VM engine: UML

VM interconnection: uml_switch

Core: python + perl scripts

Routing engine: Quagga

Lab description: XML

Build, then play

Support for distributed emulation (segmentation)

Round robin

Weighted (by CPU load before deploy) round robin

Page 22: A.A. 2010-2011 - Roma Tre Universityrimondin/courses/rcng1011/slides/...OS-Level Virtualization Single OS/kernel Actually isolation (of contexts), not virtualization No emulation overhead

VNUML

H: ssh, scp, rsync

W: SNMP, telnet TFTP

Page 23: A.A. 2010-2011 - Roma Tre Universityrimondin/courses/rcng1011/slides/...OS-Level Virtualization Single OS/kernel Actually isolation (of contexts), not virtualization No emulation overhead

VNUML

Backplane: one or more 802.1Q-compliant switches

Host-switch and switch-switch connections are trunks

Page 24: A.A. 2010-2011 - Roma Tre Universityrimondin/courses/rcng1011/slides/...OS-Level Virtualization Single OS/kernel Actually isolation (of contexts), not virtualization No emulation overhead

Marionnet

Université Paris

VM engine: UML

VM interconnection: uml_switch, vde

Core: OCaml

Routing engine: Quagga

Lab description: GUI (dot-based layout)

Ability to export project file for reuse

Network impairments(delay, loss → unidirectional links, bandwidth, flipped bits)

Switch status leds

Page 25: A.A. 2010-2011 - Roma Tre Universityrimondin/courses/rcng1011/slides/...OS-Level Virtualization Single OS/kernel Actually isolation (of contexts), not virtualization No emulation overhead

Marionnet

Introducing indirection to support...

...stable endpoints

...port “defects”

Page 26: A.A. 2010-2011 - Roma Tre Universityrimondin/courses/rcng1011/slides/...OS-Level Virtualization Single OS/kernel Actually isolation (of contexts), not virtualization No emulation overhead

GINI

McGill University

VM engine: UML

VM interconnection: customized uml_switch, implementation of a wireless channel

Core: C + python

Routing engine: custom implementation (compliant with the Linux TCP/IP stack)

Lab description: GUI, XML

Integrated task manager to start/stop nodes

Real-time performance plots

Page 27: A.A. 2010-2011 - Roma Tre Universityrimondin/courses/rcng1011/slides/...OS-Level Virtualization Single OS/kernel Actually isolation (of contexts), not virtualization No emulation overhead

Vincent Perrier

VM engine: UML, (QEMU+)KVM, Openwrt

VM interconnection: improved uml_switch

Core: C

Routing engine: N/A

Lab description: custom markup

Several customizations and hacks

Ability to plot the value of any kernel variables

Switch supports tcp sockets, real-time configuration with XML messages

Built-on-the-fly CDROM image for machine differentiation

Page 28: A.A. 2010-2011 - Roma Tre Universityrimondin/courses/rcng1011/slides/...OS-Level Virtualization Single OS/kernel Actually isolation (of contexts), not virtualization No emulation overhead

IMUNES

University of Zagreb

VM engine: N/A (stack virtualization)

VM interconnection: N/A

Core: N/A

Routing engine: N/A

GUI

Based on FreeBSD VirtNET

Page 29: A.A. 2010-2011 - Roma Tre Universityrimondin/courses/rcng1011/slides/...OS-Level Virtualization Single OS/kernel Actually isolation (of contexts), not virtualization No emulation overhead

IMUNESVirtNET: network state replication

Page 30: A.A. 2010-2011 - Roma Tre Universityrimondin/courses/rcng1011/slides/...OS-Level Virtualization Single OS/kernel Actually isolation (of contexts), not virtualization No emulation overhead

IMUNESVirtNET: network state replication

In a vimage it is possible to

configure network interfaces

open sockets

run processes

(in some way) similar to UMview

Page 31: A.A. 2010-2011 - Roma Tre Universityrimondin/courses/rcng1011/slides/...OS-Level Virtualization Single OS/kernel Actually isolation (of contexts), not virtualization No emulation overhead

Virtual Routers

Page 32: A.A. 2010-2011 - Roma Tre Universityrimondin/courses/rcng1011/slides/...OS-Level Virtualization Single OS/kernel Actually isolation (of contexts), not virtualization No emulation overhead

DynaMIPSChristophe Fillot (University of Technology of Compiegne)

Supported platforms (as of v0.2.7)

Cisco 7200 (NPE-100 to NPE-400)

Cisco 3600 (3620, 3640 and 3660)

Cisco 2691

Cisco 3725

Cisco 3745

No acceleration

CPU idle times must be tuned

“Of course, this emulator cannot replace a real router: you should be able to get a performance of about 1 kpps [...], to be compared to the 100 kpps delivered by a NPE-100 [...]. So, it is simply a complementary tool to real labs for administrators of Cisco networks or people wanting to pass their CCNA/CCNP/CCIE exams.”

Page 33: A.A. 2010-2011 - Roma Tre Universityrimondin/courses/rcng1011/slides/...OS-Level Virtualization Single OS/kernel Actually isolation (of contexts), not virtualization No emulation overhead

DynaMIPSFully virtualized hardware

ROM, RAM, NVRAM

Chassis

Console, AUX

PCMCIA ATA disks

ATM/Frame Relay/Ethernet virtual switch between emulator instances

Port adapters, network modules

Interface binding

UNIX socket

VDE

tap

host interface (optionally via libpcap)

UDP port

Some lacking opcodes (mostly FPU)

Can manage multiple instances (“hypervisor” mode)

Development stalled, but still a milestone

Page 34: A.A. 2010-2011 - Roma Tre Universityrimondin/courses/rcng1011/slides/...OS-Level Virtualization Single OS/kernel Actually isolation (of contexts), not virtualization No emulation overhead

DynaMIPS

Dynagen: a Python frontend

Dynagui

Page 35: A.A. 2010-2011 - Roma Tre Universityrimondin/courses/rcng1011/slides/...OS-Level Virtualization Single OS/kernel Actually isolation (of contexts), not virtualization No emulation overhead

DynaMIPS

Dynagen: a Python frontend

Dynagui

GNS3

Page 36: A.A. 2010-2011 - Roma Tre Universityrimondin/courses/rcng1011/slides/...OS-Level Virtualization Single OS/kernel Actually isolation (of contexts), not virtualization No emulation overhead

So, Performance and Scalability are 2 Bugaboos...

Carrier grade equipment:40Gbps ÷ 92Tbps

Per-packet processing capability must scale withO(line_rate)

Aggregate switching capability must scale withO(port_count * line_rate)

But software routers need not run on a single server:RouteBricks

Columbia+Intel+UCLA+Berkeley+... (a 10-authors paper!)

Click-based

Software routers:1Gbps ÷ 3Gbps

Page 37: A.A. 2010-2011 - Roma Tre Universityrimondin/courses/rcng1011/slides/...OS-Level Virtualization Single OS/kernel Actually isolation (of contexts), not virtualization No emulation overhead

RouteBricks

Cluster router architecture

Parallelism across servers

Nodes can make independent decisions on a subset of the overall traffic

Parallelism within servers

CPU

I/O, memory

Page 38: A.A. 2010-2011 - Roma Tre Universityrimondin/courses/rcng1011/slides/...OS-Level Virtualization Single OS/kernel Actually isolation (of contexts), not virtualization No emulation overhead

RouteBricks

Intra-cluster routing

Valiant Load Balancing (VLB):source random node destination

Randomizes input traffic

No centralized scheduling

Beware of reordering

Topology

Full mesh: not feasible(server fanout is limited!)

Commodity Ethernet switches:not viable!

Missing load-sensitive routing features

Cost

Tori and butterflies

Page 39: A.A. 2010-2011 - Roma Tre Universityrimondin/courses/rcng1011/slides/...OS-Level Virtualization Single OS/kernel Actually isolation (of contexts), not virtualization No emulation overhead

RouteBricksButterfly topology

2-ary 4-fly

Page 40: A.A. 2010-2011 - Roma Tre Universityrimondin/courses/rcng1011/slides/...OS-Level Virtualization Single OS/kernel Actually isolation (of contexts), not virtualization No emulation overhead

RouteBricks

Experimental setup

4 Intel Xeon servers(Nehalem microarchitecture)

1 10Gbps external line each

Full-mesh topology

Bottleneck

64-byte packets: <19Mpps sustained

Caused by CPUPer-byte CPU load higher for smaller packets

Programmability

NIC driver

2 Click elements

Page 41: A.A. 2010-2011 - Roma Tre Universityrimondin/courses/rcng1011/slides/...OS-Level Virtualization Single OS/kernel Actually isolation (of contexts), not virtualization No emulation overhead

ClickUCLA

A modular software router

UNIX-pipe-like composition

Implemented as a Linux kernel extension

333,000 64-byte packets per second on a 700 MHz Pentium III

Element

Connection

Page 42: A.A. 2010-2011 - Roma Tre Universityrimondin/courses/rcng1011/slides/...OS-Level Virtualization Single OS/kernel Actually isolation (of contexts), not virtualization No emulation overhead

Click

Element: a packet processing module performing a simple computation (e.g., decrease TTL, routing table lookup, packet queueing, etc.)

A C++ object wih a state

Has multiple input/output ports

May have configuration settings

May export an interface (e.g., method to report queue length)

Page 43: A.A. 2010-2011 - Roma Tre Universityrimondin/courses/rcng1011/slides/...OS-Level Virtualization Single OS/kernel Actually isolation (of contexts), not virtualization No emulation overhead

ClickConnection: a packet handoff path

Push processingFor unsolicited packets (e.g., from a device)

Pull processingFor packet schedulers

Only 1 connection per port

Element ports can be push (black), pull (white), or agnostic (outline)

Push and pull are not mixable!

Example: simple router queue

Page 44: A.A. 2010-2011 - Roma Tre Universityrimondin/courses/rcng1011/slides/...OS-Level Virtualization Single OS/kernel Actually isolation (of contexts), not virtualization No emulation overhead

Click

Note: queues do not live in ports: they are objects

Scheduling

Single thread(multithreaded implementation available)

Scheduling unit: element

Scheduling order: according to the packet’s path along the graph

Elements may be implicitly scheduled when their push/pull methods are called

Elements may have timers

Page 45: A.A. 2010-2011 - Roma Tre Universityrimondin/courses/rcng1011/slides/...OS-Level Virtualization Single OS/kernel Actually isolation (of contexts), not virtualization No emulation overhead

ClickFlow-based contexts

Answer the questions: “If I were to emit a packet on my second output...

...where might it go?”

...which Queues might it encounter?”

...and it stopped at the first Queue it encountered, where might it stop?”

Page 46: A.A. 2010-2011 - Roma Tre Universityrimondin/courses/rcng1011/slides/...OS-Level Virtualization Single OS/kernel Actually isolation (of contexts), not virtualization No emulation overhead

Click

Declarative languagesrc :: FromDevice(eth0);ctr :: Counter;sink :: Discard;src -> ctr;ctr -> sink;

or

FromDevice(eth0) -> Counter -> Discard;

Page 47: A.A. 2010-2011 - Roma Tre Universityrimondin/courses/rcng1011/slides/...OS-Level Virtualization Single OS/kernel Actually isolation (of contexts), not virtualization No emulation overhead

Click

The Element class has ~20 virtual functions

Most default implementations are fine

Just override push, pull, and run_scheduledSample transparent element:

class NullElement: public Element { public:NullElement() { add_input(); add_output(); }const char *class_name() const { return "Null"; }NullElement *clone() const { return new NullElement; }const char *processing() const { return AGNOSTIC; }void push(int port, Packet *p) { output(0).push(p); }Packet *pull(int port) { return input(0).pull(); }

};

Page 48: A.A. 2010-2011 - Roma Tre Universityrimondin/courses/rcng1011/slides/...OS-Level Virtualization Single OS/kernel Actually isolation (of contexts), not virtualization No emulation overhead

Click

Fine-grained elements are preferred

Not easy with BGP

Shared structures (e.g., routing tables)

Incorporated into the packet forwarding path

Example: IP Router

See page 12 of [Kohler]...

Page 49: A.A. 2010-2011 - Roma Tre Universityrimondin/courses/rcng1011/slides/...OS-Level Virtualization Single OS/kernel Actually isolation (of contexts), not virtualization No emulation overhead

Virtual Switches

Page 50: A.A. 2010-2011 - Roma Tre Universityrimondin/courses/rcng1011/slides/...OS-Level Virtualization Single OS/kernel Actually isolation (of contexts), not virtualization No emulation overhead

“Ordinary” Virtual Switch

A piece of software working at layer 2/3, inside the hypervisor or the hardware management layer

Page 51: A.A. 2010-2011 - Roma Tre Universityrimondin/courses/rcng1011/slides/...OS-Level Virtualization Single OS/kernel Actually isolation (of contexts), not virtualization No emulation overhead

Can we do any better?

If we are in a virtual scenario, the network layer can...

...know about existing hosts (MAC/IP addresses) and their “movements”

...know interface operational mode (e.g., promiscuous)

...know about multicast memberships

...handle a flat topology (all the nodes are leaves, therefore we do not need STP)

...know the OS run by virtual machines (and, for example, run an OS-aware Deep Packet Inspection)

...support migration outside of a single subnet

Page 52: A.A. 2010-2011 - Roma Tre Universityrimondin/courses/rcng1011/slides/...OS-Level Virtualization Single OS/kernel Actually isolation (of contexts), not virtualization No emulation overhead

VMware’s Approach

vNetwork Distributed switch

“VMware’s next generation virtual networking solution for spanning multiple hosts with a single virtual switch representation”

Private VLANs (restrict communication between virtual machines on the same VLAN)

Network VMotion—tracking of VM networking state

3rd Party Virtual Switch support (Cisco Nexus 1000V Series Virtual Switch)

Bi-directional traffic shaping

Page 53: A.A. 2010-2011 - Roma Tre Universityrimondin/courses/rcng1011/slides/...OS-Level Virtualization Single OS/kernel Actually isolation (of contexts), not virtualization No emulation overhead

VMware’s Approach

Page 54: A.A. 2010-2011 - Roma Tre Universityrimondin/courses/rcng1011/slides/...OS-Level Virtualization Single OS/kernel Actually isolation (of contexts), not virtualization No emulation overhead

VEPA

Virtual Ethernet Port Aggregator

An IEEE standard proposal

Idea: off-load switching activities from hypervisor-based virtual switches to physical switches

VEPA+OVF: the future?

OVF (Open Virtualization Format): metadata describing a virtual machine

June 2009: Linux kernel patch for VEPA support

Tagged (VEPA-aware switch) and tagless (VEPA-agnostic switch that bounces packets back) variants

Page 55: A.A. 2010-2011 - Roma Tre Universityrimondin/courses/rcng1011/slides/...OS-Level Virtualization Single OS/kernel Actually isolation (of contexts), not virtualization No emulation overhead

VEPA

From Hudson, Congdon. Tag-less Virtual Ethernet Port Aggregator (VEPA) Proposal. 2009.

Page 56: A.A. 2010-2011 - Roma Tre Universityrimondin/courses/rcng1011/slides/...OS-Level Virtualization Single OS/kernel Actually isolation (of contexts), not virtualization No emulation overhead

OpenvSwitch

Citrix Systems, Nicira, Intel, NEC, Google...

Virtualization layer no longer (just) Ethernet-based

Page 57: A.A. 2010-2011 - Roma Tre Universityrimondin/courses/rcng1011/slides/...OS-Level Virtualization Single OS/kernel Actually isolation (of contexts), not virtualization No emulation overhead

OpenvSwitch

QoS, tunneling, filtering

Tunneling supports transparent inter-subnet migration without breaking transport sessions

Interface status migration (e.g., to bind policies to interfaces tightly)

Support for VLANs and GRE tunnels (intelligent forwarding)

All VMs on a single host: VPNs within a single OpenvSwitch instance

All VMs on the same LAN: implement VPNs by VLANs

VMs on different subnets: implement VPNs by GRE tunnels

Operates in the hypervisor (dom0 in Xen)

Offers connectivity between VMs and physical interfaces on the host

Forwarding state and runtime configuration can be altered by some programming interfaces

VEPA compatible (the control layer is able to manage a VEPA-enabled switch)

Page 58: A.A. 2010-2011 - Roma Tre Universityrimondin/courses/rcng1011/slides/...OS-Level Virtualization Single OS/kernel Actually isolation (of contexts), not virtualization No emulation overhead

OpenvSwitch

Configuration options

Port mirroring (SPAN, RSPAN – a variant of SPAN that allows remote monitoring)

QoS policies

NetFlow

Bonding

Unified CLI for managing distributed switches

Exports interfaces compatible with VDE and Linux bridges

Forwarding

Ability to manipulate the forwarding table (to support status migration, where “status”=flow counters, ACLs, tunnels, etc.)

Packet processing based on layer 2/3/4 headersActions: forward (from ≥1 ports), drop, en/decapsulate

Page 59: A.A. 2010-2011 - Roma Tre Universityrimondin/courses/rcng1011/slides/...OS-Level Virtualization Single OS/kernel Actually isolation (of contexts), not virtualization No emulation overhead

OpenvSwitch

Forwarding paths

Fast pathkernel space

forwarding engine

few code (portability, performance, hardware implementation)

comparable performance with Linux Ethernet bridge (pure kernel space, MAC-based forwarding only)

Slow pathuser space

forwarding logic (MAC learning, load balancing), remote management (NetFlow, OpenFlow)

Page 60: A.A. 2010-2011 - Roma Tre Universityrimondin/courses/rcng1011/slides/...OS-Level Virtualization Single OS/kernel Actually isolation (of contexts), not virtualization No emulation overhead

VDE

University of Bologna

Virtual Distributed Ethernet:the swiss army knife of emulated networks

A general VPN

A mobility support technology

A tool for network testing

A reconfigurable overlay

A privacy-preserving layer

...

Page 61: A.A. 2010-2011 - Roma Tre Universityrimondin/courses/rcng1011/slides/...OS-Level Virtualization Single OS/kernel Actually isolation (of contexts), not virtualization No emulation overhead

VDE

vde_switch vde_switch

vde_cable

vde_switch

MAC address learning (with aging)

Ability to operate as a hub

VLANs

Fast STP

Can be connected to a tap interface

vde_cable

Interconnects vde_switches

Does not exist...

vde_plug

Page 62: A.A. 2010-2011 - Roma Tre Universityrimondin/courses/rcng1011/slides/...OS-Level Virtualization Single OS/kernel Actually isolation (of contexts), not virtualization No emulation overhead

VDEvde_cable = vde_plug + dpipe

vde_plug

socket → stdout

stdin → socket

dpipe

stdin → stdout

stdout → stdin

vde_switch

vde_plug

UNIXsocket

dpipe

Page 63: A.A. 2010-2011 - Roma Tre Universityrimondin/courses/rcng1011/slides/...OS-Level Virtualization Single OS/kernel Actually isolation (of contexts), not virtualization No emulation overhead

VDE

vde_switch UNIXsocket

dpipe

dpipe vde_plug /tmp/vde1.ctl = vde_plug /tmp/vde2.ctl

vde_switchUNIXsocket

vde_switch UNIXsocket

dpipe+ssh

dpipe vde_plug /tmp/vde.ctl =ssh [email protected] vde_plug /tmp/vde_remote.ctl

vde_switchUNIXsocket

Page 64: A.A. 2010-2011 - Roma Tre Universityrimondin/courses/rcng1011/slides/...OS-Level Virtualization Single OS/kernel Actually isolation (of contexts), not virtualization No emulation overhead

VDE

vde_switch UNIXsocket

vde_cryptcab

host1$ vde_cryptcab -s /tmp/vde.ctl -p 12000

host2$ vde_cryptcab -s /tmp/vde_local.ctl -c username@host1:12000

vde_switchUNIXsocket

dpipe+ssh is not a very good solution...

Dropping ssh is not advisable

With ssh, we may experience interference between congestion control algorithms

vde_cryptcab (UDP-based)

Page 65: A.A. 2010-2011 - Roma Tre Universityrimondin/courses/rcng1011/slides/...OS-Level Virtualization Single OS/kernel Actually isolation (of contexts), not virtualization No emulation overhead

VDE

With ssh, we may experience interference between congestion control algorithms.........

Page 66: A.A. 2010-2011 - Roma Tre Universityrimondin/courses/rcng1011/slides/...OS-Level Virtualization Single OS/kernel Actually isolation (of contexts), not virtualization No emulation overhead

VDE

vde_plug2tap --daemon -s /tmp/myvde.ctl tap0

vde_plug2tap

Direct connection from vde_switch to host tap interface

Same thing can be done during creation of the vde_switch

UNIXsocket

tap0

Page 67: A.A. 2010-2011 - Roma Tre Universityrimondin/courses/rcng1011/slides/...OS-Level Virtualization Single OS/kernel Actually isolation (of contexts), not virtualization No emulation overhead

VDE

slirpvde

Internet connection

No root privileges

Connections are regenerated (a sort of masquerading)

Provides DHCP

UNIXsocket

slirpvde -d -s /tmp/vde.ctl -dhcp

Page 68: A.A. 2010-2011 - Roma Tre Universityrimondin/courses/rcng1011/slides/...OS-Level Virtualization Single OS/kernel Actually isolation (of contexts), not virtualization No emulation overhead

VDE

UNIXsocket

dpipe

dpipe vde_plug /tmp/vde1.ctl = wirefilter -M /tmp/wiremgmt = vde_plug /tmp/vde2.ctl

UNIXsocket

wirefilter

dpipe

wirefilter: emulates real cable features (runtime tunable)

max packet queue capacitymtubit corruptionreordering

% lost packetsdelayduplicationbandwitdhinterface speed

Page 69: A.A. 2010-2011 - Roma Tre Universityrimondin/courses/rcng1011/slides/...OS-Level Virtualization Single OS/kernel Actually isolation (of contexts), not virtualization No emulation overhead

VDE

Software that supports VDE as a userspace switch

VirtualBox

QEMU/KVM (with wrappers)

UML

OpenvSwitch

DynaMIPS

...

Page 70: A.A. 2010-2011 - Roma Tre Universityrimondin/courses/rcng1011/slides/...OS-Level Virtualization Single OS/kernel Actually isolation (of contexts), not virtualization No emulation overhead

Testbeds

Page 71: A.A. 2010-2011 - Roma Tre Universityrimondin/courses/rcng1011/slides/...OS-Level Virtualization Single OS/kernel Actually isolation (of contexts), not virtualization No emulation overhead

A collection of machines distributed over the globe

Hosted by research institutions

Only accessible by organizations in the ConsortiumNo fee for academic institutions

Each machine runs a software package (PLC – PlanetLab Central)

Includes a Linux-based OS

Node bootstrapping

Management, monitoring, and auditing tools

Supports distributed virtualization (slicing)Uses VNET(+) for traffic isolation between slices

Stand-alone version for private use: MyPLC

Page 72: A.A. 2010-2011 - Roma Tre Universityrimondin/courses/rcng1011/slides/...OS-Level Virtualization Single OS/kernel Actually isolation (of contexts), not virtualization No emulation overhead

What would the network administrator at your organization say about the experiment running on your local site?

Continuously running experiments:

Active network probing(within netiquette):

Service disruption, actions triggering adm complaints are a no-go!

Page 73: A.A. 2010-2011 - Roma Tre Universityrimondin/courses/rcng1011/slides/...OS-Level Virtualization Single OS/kernel Actually isolation (of contexts), not virtualization No emulation overhead
Page 74: A.A. 2010-2011 - Roma Tre Universityrimondin/courses/rcng1011/slides/...OS-Level Virtualization Single OS/kernel Actually isolation (of contexts), not virtualization No emulation overhead

University of Utah

A facility

Windows/Linux nodes>1300 network ports that can be connected arbitrarily by remotely setting up VLANs on the switches

Virtual nodes (Xen-based)Arbitrary topology

A software system

“a kind of "operating system" for controlling collections of networked devices of all types, for the purpose of controlled experimentation”

Page 75: A.A. 2010-2011 - Roma Tre Universityrimondin/courses/rcng1011/slides/...OS-Level Virtualization Single OS/kernel Actually isolation (of contexts), not virtualization No emulation overhead

Acceptable use: “in principle, almost any research or experimental use of the testbed by experimenters that have a need for it is appropriate”

Usersmap

Page 76: A.A. 2010-2011 - Roma Tre Universityrimondin/courses/rcng1011/slides/...OS-Level Virtualization Single OS/kernel Actually isolation (of contexts), not virtualization No emulation overhead

>500 nodes

High-end (2.4 GHz Quad Core Xeon “Nehalem”, 12GB RAM 1066MHz, 2x250GB SATA disks, 8 GbE interfaces)

Mid-end (3.0 GHz 64-bit Xeon, 2GB RAM 400Mhz, 2x146GB 10kRPM SCSI disks, 6 GbE interfaces)

Low-end (600MHz Intel Pentium III, 256 MB RAM, 13GB IDE hdd, 5 FE interfaces, WiFi)

12 switches (Cisco and HP)

Servers

DB, DNS, users, file, serial line, etc.

Remote power controllers

Page 77: A.A. 2010-2011 - Roma Tre Universityrimondin/courses/rcng1011/slides/...OS-Level Virtualization Single OS/kernel Actually isolation (of contexts), not virtualization No emulation overhead

VINI

“a virtual network infrastructure that allows network researchers to evaluate their protocols and services in the wide area”

Runs on top of PlanetLab

42 nodes @ 27 sites

Page 78: A.A. 2010-2011 - Roma Tre Universityrimondin/courses/rcng1011/slides/...OS-Level Virtualization Single OS/kernel Actually isolation (of contexts), not virtualization No emulation overhead

VINI

Approach & technologies

Page 79: A.A. 2010-2011 - Roma Tre Universityrimondin/courses/rcng1011/slides/...OS-Level Virtualization Single OS/kernel Actually isolation (of contexts), not virtualization No emulation overhead

VINI

A VINI router

XORP: routing engine

Click: forwarding engine

Page 80: A.A. 2010-2011 - Roma Tre Universityrimondin/courses/rcng1011/slides/...OS-Level Virtualization Single OS/kernel Actually isolation (of contexts), not virtualization No emulation overhead

VINIEncapsulation

Page 81: A.A. 2010-2011 - Roma Tre Universityrimondin/courses/rcng1011/slides/...OS-Level Virtualization Single OS/kernel Actually isolation (of contexts), not virtualization No emulation overhead

NSF project

“a virtual laboratory for exploring future internets at scale”

Keywords

Programmability

Virtualization and resource sharing

Federation (among participating organizations)

Slice-based experimentation

The PlanetLab team is also involved

Page 82: A.A. 2010-2011 - Roma Tre Universityrimondin/courses/rcng1011/slides/...OS-Level Virtualization Single OS/kernel Actually isolation (of contexts), not virtualization No emulation overhead

So, which virtualization for networking?

Page 83: A.A. 2010-2011 - Roma Tre Universityrimondin/courses/rcng1011/slides/...OS-Level Virtualization Single OS/kernel Actually isolation (of contexts), not virtualization No emulation overhead

(node) Virtualization type

Node virtualization technology

Link virtualization technology

Netkit “Paravirtualization” UML uml_switch

VNUML “Paravirtualization” UML uml_switch

Marionnet “Paravirtualization” UML uml_switch + vde

GINI “Paravirtualization” UMLuml_switch +

customizations

Cloonix “Paravirtualization”UML, QEMU,

OpenWRTuml_switch +

customizations

IMUNES OS-level None VirtNET

DynaMIPS Emulation Custom Custom

RouteBricks None None None

Click “Partial virtualization”

API N/A

VMware vNetwork “OS-level” N/A Custom

VEPA “OS-level” N/A Custom

OpenvSwitch “OS-level” N/A Custom

VDE OS-level N/A Custom

PlanetLab None None Overlay

Emulab Paravirtualization Xen N/A / Overlay / None

VINI “Paravirtualization” UML Overlay

GENI N/A N/A N/A

Page 84: A.A. 2010-2011 - Roma Tre Universityrimondin/courses/rcng1011/slides/...OS-Level Virtualization Single OS/kernel Actually isolation (of contexts), not virtualization No emulation overhead

A one-fits-all proposal:The Network Hypervisor

Tunnels, VLANs, VRFs...:a pool of forwarding capacity

Page 85: A.A. 2010-2011 - Roma Tre Universityrimondin/courses/rcng1011/slides/...OS-Level Virtualization Single OS/kernel Actually isolation (of contexts), not virtualization No emulation overhead

A one-fits-all proposal:The Network Hypervisor

Tunnels, VLANs, VRFs...:a pool of forwarding capacity

Rationale: virtualization should happen at the forwarding layer (amidst tunnels and VMs)

Network hypervisor

a mapper between the logical and physical network

gets a view of the logical network from the control plane

gets a view of the physical topology from a centralized management system

Page 86: A.A. 2010-2011 - Roma Tre Universityrimondin/courses/rcng1011/slides/...OS-Level Virtualization Single OS/kernel Actually isolation (of contexts), not virtualization No emulation overhead

The Network Hypervisor

Control plane

Logical forwarding plane

Network Hypervisor

Physical forwarding plane

logical fwtables, ports

hw switches

Page 87: A.A. 2010-2011 - Roma Tre Universityrimondin/courses/rcng1011/slides/...OS-Level Virtualization Single OS/kernel Actually isolation (of contexts), not virtualization No emulation overhead

The Network Hypervisor

Forwarding in a “hypervised” switch

1. Mapping packet to logical context

2. Logical forwarding

3. Mapping decision to physical context

4. Physical forwardingHandled by

the IGP

Handled by the hypervisor

Page 88: A.A. 2010-2011 - Roma Tre Universityrimondin/courses/rcng1011/slides/...OS-Level Virtualization Single OS/kernel Actually isolation (of contexts), not virtualization No emulation overhead

The Network Hypervisor

Prototype implementation

Physical switchL2 over GRE

L2 packets are relevant in the logical context

GRE tunnels are the physical transport

L2-to-tunnel mapping (=logical forwarding decision) handled by the hypervisor

VLAN/MPLS tags to indicate logical context

Network hypervisorDistributed OpenFlow controller

Load balancing but no enforceable bandwidth on ports and links

Logical forwarding only at the edge of the network

An open standard by which high level routing decisions can be run on a separate server.

Supports experimentation without requiring vendors to expose internals.

Page 89: A.A. 2010-2011 - Roma Tre Universityrimondin/courses/rcng1011/slides/...OS-Level Virtualization Single OS/kernel Actually isolation (of contexts), not virtualization No emulation overhead

ReferencesUnderstanding Full Virtualization, Paravirtualization, and Hardware Assist. White paper. VMware. 2007.

What’s New in VMware vSphere™ 4: Virtual Networking. White paper. VMware. 2009.

VMware Virtual Networking Concepts. White paper. VMware. 2007.

Argyraki, Baset, Chun, Fall, Iannaccone, Knies, Kohler, Manesh, Nedveschi, Ratnasamy. Can Software Routers Scale?. PRESTO 2008.

Dobrescu, Egi, Argyraki, Chun, Fall, Iannaccone, Knies, Manesh, Ratnasamy. RouteBricks: Exploiting Parallelism to Scale Software Routers. SOSP 2009.

Casado, Koponen, Ramanathan, Shenker. Virtualizing the Network Forwarding Plane. PRESTO 2010.

Galán, Fernández, Ferrer, Martín. Scenario-Based Distributed Virtualization Management Architecture for Multi-host Environments. SVM 2008.

Loddo, Saiu. How to Implement a Virtual Network Laboratory in Six Months and Be Happy. SIGPLAN Workshop on ML, 2007.

Kohler, Morris, Chen, Jannotti, Kaashoek. The Click Modular Router. ACM Transactions on Computer Systems 18(3), August 2000.

Maheswaran, Malozemoffy, Ngy, Liaoy, Guy, Maniymarany, Raymondy, Shaikhy, Gaoy. GINI: A User-Level Toolkit for Creating Micro Internets for Teaching & Learning Computer Networking. SIGCSE Bulletin. 2009.

Pfaff, Pettit, Koponen, Amidon, Casado, Shenker. Extending Networking into the Virtualization Layer. HotNets 2009.

Davoli. VDE: Virtual Distributed Ethernet. Technical Report. University of Bologna. 2004.

Bavier, Feamster, Huang, Peterson, Rexford. In VINI Veritas: Realistic and Controlled Network Experimentation. SIGCOMM 2006.