the new vvorld - watson.orgrobert/freebsd/2010ukuug/20100324-newvvorld.pdf · the new vvorld robert...
TRANSCRIPT
![Page 1: The new VVorld - watson.orgrobert/freebsd/2010ukuug/20100324-newvvorld.pdf · The new VVorld Robert Watson and Bjoern Zeeb (and thanks to Marko Zec) The FreeBSD Project UKUUG Spring](https://reader030.vdocument.in/reader030/viewer/2022041010/5eba2b9bf941f544331bc1d0/html5/thumbnails/1.jpg)
The new VVorldRobert Watson and Bjoern Zeeb
(and thanks to Marko Zec)The FreeBSD Project
UKUUG Spring Conference 2010
1
OS research and development performed by FreeBSD Project, University of Zagreb, FreeBSD Foundation, NLNet, and other contributors over a decade
Still a work-in-progress, but exciting technology coming soon
![Page 2: The new VVorld - watson.orgrobert/freebsd/2010ukuug/20100324-newvvorld.pdf · The new VVorld Robert Watson and Bjoern Zeeb (and thanks to Marko Zec) The FreeBSD Project UKUUG Spring](https://reader030.vdocument.in/reader030/viewer/2022041010/5eba2b9bf941f544331bc1d0/html5/thumbnails/2.jpg)
Introduction
• About virtualization
• FreeBSD Jails
• Virtualizing a kernel
• A virtualized network stack
• A few application ideas
2
2
![Page 3: The new VVorld - watson.orgrobert/freebsd/2010ukuug/20100324-newvvorld.pdf · The new VVorld Robert Watson and Bjoern Zeeb (and thanks to Marko Zec) The FreeBSD Project UKUUG Spring](https://reader030.vdocument.in/reader030/viewer/2022041010/5eba2b9bf941f544331bc1d0/html5/thumbnails/3.jpg)
What is virtualization?
3
• Illusion of multiple virtual X on one real X
• Virtual memory address spaces
• VLANs, VPNs, and overlay networks
• Storage volume management
• Virtual machines, OS instances
• ... you can solve any problem with another level of indirection ...
3
“... a level of indirection...”
![Page 4: The new VVorld - watson.orgrobert/freebsd/2010ukuug/20100324-newvvorld.pdf · The new VVorld Robert Watson and Bjoern Zeeb (and thanks to Marko Zec) The FreeBSD Project UKUUG Spring](https://reader030.vdocument.in/reader030/viewer/2022041010/5eba2b9bf941f544331bc1d0/html5/thumbnails/4.jpg)
Why virtualize?
• Sharing with the illusion of exclusive use
• Consolidation, managed overcommit
• Flexibility in implementation
• Security and robustness
• Administrative delegation
4
4
![Page 5: The new VVorld - watson.orgrobert/freebsd/2010ukuug/20100324-newvvorld.pdf · The new VVorld Robert Watson and Bjoern Zeeb (and thanks to Marko Zec) The FreeBSD Project UKUUG Spring](https://reader030.vdocument.in/reader030/viewer/2022041010/5eba2b9bf941f544331bc1d0/html5/thumbnails/5.jpg)
Virtualization spectrum
• Tradeoffs: scheduler integration, efficient sharing, overcommit opportunities, functionality, security/isolation, resource management, administrative delegation, ...
5
OS access control
UNIX usersSELinux
...
OS virtualization
FreeBSD JailSolaris Zones
...
Hypervisors
VMWareXen...
Physical separation
Racks and racksand racks of
machines
5
Example: With OS virtualization you get full scheduler integration, but migration very hard With Hypervisors you get really bad scheduling, but migration is relatively easy
![Page 6: The new VVorld - watson.orgrobert/freebsd/2010ukuug/20100324-newvvorld.pdf · The new VVorld Robert Watson and Bjoern Zeeb (and thanks to Marko Zec) The FreeBSD Project UKUUG Spring](https://reader030.vdocument.in/reader030/viewer/2022041010/5eba2b9bf941f544331bc1d0/html5/thumbnails/6.jpg)
OS virtualization
• Single OS kernel instance, many userspaces
• Safe root delegation, various constraints
• Efficient resource sharing with overcommit
• No hypervisor/virtual device overhead
• ISP virtual hosting, server consolidation, ...
6
6
![Page 7: The new VVorld - watson.orgrobert/freebsd/2010ukuug/20100324-newvvorld.pdf · The new VVorld Robert Watson and Bjoern Zeeb (and thanks to Marko Zec) The FreeBSD Project UKUUG Spring](https://reader030.vdocument.in/reader030/viewer/2022041010/5eba2b9bf941f544331bc1d0/html5/thumbnails/7.jpg)
History of FreeBSD Jail
7
1999
2002
2006
2007
2008
2009
Jails merged to FreeBSD 4.x
Virtualized network stack prototype
NLNet/FreeBSD Foundation fund multi-year VIMAGE development project
Jail-friendly ZFS merged from Open Solaris
Multi-IPv4/v6/no-IP patches;VNET integration starts
Hierarchical jail support; FreeBSD 8.0 with highly experimental options VIMAGE shipped
7
Earliest open source OS virtualization we’re aware of
Virtualization work has long timeline
![Page 8: The new VVorld - watson.orgrobert/freebsd/2010ukuug/20100324-newvvorld.pdf · The new VVorld Robert Watson and Bjoern Zeeb (and thanks to Marko Zec) The FreeBSD Project UKUUG Spring](https://reader030.vdocument.in/reader030/viewer/2022041010/5eba2b9bf941f544331bc1d0/html5/thumbnails/8.jpg)
Why change Jail?
8
• Jail is fast, efficient, secure, useful
• But, Jail “subsets” rather than “virtualizes”
• For example: employs chroot() internally
• Some resources subset poorly
• E.g., System V IPC, loopback interface, ...
• Virtualization is a functional improvement
8
![Page 9: The new VVorld - watson.orgrobert/freebsd/2010ukuug/20100324-newvvorld.pdf · The new VVorld Robert Watson and Bjoern Zeeb (and thanks to Marko Zec) The FreeBSD Project UKUUG Spring](https://reader030.vdocument.in/reader030/viewer/2022041010/5eba2b9bf941f544331bc1d0/html5/thumbnails/9.jpg)
Virtualizing OS services• New abstraction: the virtual instance
• Replicate global objects per-instance
• Multiplex or replicate threads, timers
• Tag subjects with virtual instances
• Consider administrative interfaces
• Examine privileges carefully
• Plan inter-instance plumbing
• How to start and stop instances?
9
9
![Page 10: The new VVorld - watson.orgrobert/freebsd/2010ukuug/20100324-newvvorld.pdf · The new VVorld Robert Watson and Bjoern Zeeb (and thanks to Marko Zec) The FreeBSD Project UKUUG Spring](https://reader030.vdocument.in/reader030/viewer/2022041010/5eba2b9bf941f544331bc1d0/html5/thumbnails/10.jpg)
Virtual kernel infrastructure
• Sounds complicated, but some new tools
• Virtualized global variables
• Virtualized startup/shutdown
• Virtualized sysctl MIB entries
• Virtualization-enhanced debugging
• Multiplex virtualization onto netisrs
10
10
![Page 11: The new VVorld - watson.orgrobert/freebsd/2010ukuug/20100324-newvvorld.pdf · The new VVorld Robert Watson and Bjoern Zeeb (and thanks to Marko Zec) The FreeBSD Project UKUUG Spring](https://reader030.vdocument.in/reader030/viewer/2022041010/5eba2b9bf941f544331bc1d0/html5/thumbnails/11.jpg)
Virtualized heap
11
Stack1
Kernel
Heap
Kernel
Regular kernel Virtualized heap
Heap2
Heap1
Stack1
Stack2 Stack2
Stack3
11
Goal: make it easy for us to take one of something and make many
Notice that same layout is used for virtual instances as original
![Page 12: The new VVorld - watson.orgrobert/freebsd/2010ukuug/20100324-newvvorld.pdf · The new VVorld Robert Watson and Bjoern Zeeb (and thanks to Marko Zec) The FreeBSD Project UKUUG Spring](https://reader030.vdocument.in/reader030/viewer/2022041010/5eba2b9bf941f544331bc1d0/html5/thumbnails/12.jpg)
Virtual global variables
12
• Tag selected globals as virtual in source
• Placed in different ELF section when linked
• Each VNET instance gets a copy of section
• Thread context carries VNET reference
• Globals mapped to VNET when accessed
• Can compile to regular globals if desired
12
Can compile out during development but still in tree. In fact, default today.
Also valuable for embedded.
![Page 13: The new VVorld - watson.orgrobert/freebsd/2010ukuug/20100324-newvvorld.pdf · The new VVorld Robert Watson and Bjoern Zeeb (and thanks to Marko Zec) The FreeBSD Project UKUUG Spring](https://reader030.vdocument.in/reader030/viewer/2022041010/5eba2b9bf941f544331bc1d0/html5/thumbnails/13.jpg)
13
VNET_DEFINE(struct inpcbhead, ripcb);VNET_DEFINE(struct inpcbinfo, ripcbinfo);
#define V_ripcb VNET(ripcb)#define V_ripcbinfo VNET(ripcbinfo)
...
voidrip_init(void){
INP_INFO_LOCK_INIT(&V_ripcbinfo, "rip"); LIST_INIT(&V_ripcb);
13
Goal: make virtualized programming natural
![Page 14: The new VVorld - watson.orgrobert/freebsd/2010ukuug/20100324-newvvorld.pdf · The new VVorld Robert Watson and Bjoern Zeeb (and thanks to Marko Zec) The FreeBSD Project UKUUG Spring](https://reader030.vdocument.in/reader030/viewer/2022041010/5eba2b9bf941f544331bc1d0/html5/thumbnails/14.jpg)
14
Virtualized boot
SYSINIT
VNET 0 SYSINIT
VNET 1 SYSINIT
Kernelboot
Create jailwith VNET
Load kernel module withvirtualized components
Portions of previously serialized kernel and module startup are now per-VNET
14
Tag bits of boot process that now need to be per-VNET.
Module case is interesting, and tricky.
![Page 15: The new VVorld - watson.orgrobert/freebsd/2010ukuug/20100324-newvvorld.pdf · The new VVorld Robert Watson and Bjoern Zeeb (and thanks to Marko Zec) The FreeBSD Project UKUUG Spring](https://reader030.vdocument.in/reader030/viewer/2022041010/5eba2b9bf941f544331bc1d0/html5/thumbnails/15.jpg)
Virtual kernel startup
• Kernel, module startup uses SYSINIT()
• Functions tagged with special ELF section
• Sorted and executed “in order”
• Used for 99% of FreeBSD kernel init
• Some events now need to be virtualized
• Add a new event set, run once per VNET
15
15
![Page 16: The new VVorld - watson.orgrobert/freebsd/2010ukuug/20100324-newvvorld.pdf · The new VVorld Robert Watson and Bjoern Zeeb (and thanks to Marko Zec) The FreeBSD Project UKUUG Spring](https://reader030.vdocument.in/reader030/viewer/2022041010/5eba2b9bf941f544331bc1d0/html5/thumbnails/16.jpg)
16
static voidvnet_igmp_init(const void *unused __unused){ CTR1(KTR_IGMPV3, "%s: initializing", __func__); LIST_INIT(&V_igi_head);}VNET_SYSINIT(vnet_igmp_init, SI_SUB_PSEUDO, SI_ORDER_ANY, vnet_igmp_init, NULL);
static voidvnet_igmp_uninit(const void *unused __unused){ CTR1(KTR_IGMPV3, "%s: tearing down", __func__); KASSERT(LIST_EMPTY(&V_igi_head), ("igi list not empty; detached?"));}VNET_SYSUNINIT(vnet_igmp_uninit, SI_SUB_PSEUDO, SI_ORDER_ANY, vnet_igmp_uninit, NULL);
16
Again: goal to make it natural.
Five-character change to each to say “do it virtualized”.
![Page 17: The new VVorld - watson.orgrobert/freebsd/2010ukuug/20100324-newvvorld.pdf · The new VVorld Robert Watson and Bjoern Zeeb (and thanks to Marko Zec) The FreeBSD Project UKUUG Spring](https://reader030.vdocument.in/reader030/viewer/2022041010/5eba2b9bf941f544331bc1d0/html5/thumbnails/17.jpg)
What to virtualize?
• Start with a virtual network stack
• Immediate demand due to Jail limitations
• Zec 2002 prototype
• Validate performance of approach
• Can parallelize over many net modules
• In the future: VIPC, ...
17
17
![Page 18: The new VVorld - watson.orgrobert/freebsd/2010ukuug/20100324-newvvorld.pdf · The new VVorld Robert Watson and Bjoern Zeeb (and thanks to Marko Zec) The FreeBSD Project UKUUG Spring](https://reader030.vdocument.in/reader030/viewer/2022041010/5eba2b9bf941f544331bc1d0/html5/thumbnails/18.jpg)
Virtual network stack
• Jails can have their own network stacks
• TCP/IP socket bindings, routing table, firewall, IPsec, ...
• Real/virtual interfaces belong to one stack,but may be assigned to child stacks
• Packets float between stacks as needed
• Arbitrary virtual network topologies OK
18
18
Avoid constructs that require additional copying, context switching, etc.
![Page 19: The new VVorld - watson.orgrobert/freebsd/2010ukuug/20100324-newvvorld.pdf · The new VVorld Robert Watson and Bjoern Zeeb (and thanks to Marko Zec) The FreeBSD Project UKUUG Spring](https://reader030.vdocument.in/reader030/viewer/2022041010/5eba2b9bf941f544331bc1d0/html5/thumbnails/19.jpg)
19
Jail 0
VNET 0
Jail 1
VNET 1
App
epair0bepair0aigb0 bridge0 igb1
App
TCP/IPTCP/IP
App
19
Applications pinned to virtual stacks.Simple case: assign an ifnet to a jail.Complex case: virtual interfaces, bridging, firewalls, ...
![Page 20: The new VVorld - watson.orgrobert/freebsd/2010ukuug/20100324-newvvorld.pdf · The new VVorld Robert Watson and Bjoern Zeeb (and thanks to Marko Zec) The FreeBSD Project UKUUG Spring](https://reader030.vdocument.in/reader030/viewer/2022041010/5eba2b9bf941f544331bc1d0/html5/thumbnails/20.jpg)
19
Jail 0
VNET 0
Jail 1
VNET 1
App
epair0bepair0aigb0 bridge0 igb1
App
TCP/IPTCP/IP
App
19
Applications pinned to virtual stacks.Simple case: assign an ifnet to a jail.Complex case: virtual interfaces, bridging, firewalls, ...
![Page 21: The new VVorld - watson.orgrobert/freebsd/2010ukuug/20100324-newvvorld.pdf · The new VVorld Robert Watson and Bjoern Zeeb (and thanks to Marko Zec) The FreeBSD Project UKUUG Spring](https://reader030.vdocument.in/reader030/viewer/2022041010/5eba2b9bf941f544331bc1d0/html5/thumbnails/21.jpg)
19
Jail 0
VNET 0
Jail 1
VNET 1
App
epair0bepair0aigb0 bridge0 igb1
App
TCP/IPTCP/IP
App
19
Applications pinned to virtual stacks.Simple case: assign an ifnet to a jail.Complex case: virtual interfaces, bridging, firewalls, ...
![Page 22: The new VVorld - watson.orgrobert/freebsd/2010ukuug/20100324-newvvorld.pdf · The new VVorld Robert Watson and Bjoern Zeeb (and thanks to Marko Zec) The FreeBSD Project UKUUG Spring](https://reader030.vdocument.in/reader030/viewer/2022041010/5eba2b9bf941f544331bc1d0/html5/thumbnails/22.jpg)
19
Jail 0
VNET 0
Jail 1
VNET 1
App
epair0bepair0aigb0 bridge0 igb1
App
TCP/IPTCP/IP
App
19
Applications pinned to virtual stacks.Simple case: assign an ifnet to a jail.Complex case: virtual interfaces, bridging, firewalls, ...
![Page 23: The new VVorld - watson.orgrobert/freebsd/2010ukuug/20100324-newvvorld.pdf · The new VVorld Robert Watson and Bjoern Zeeb (and thanks to Marko Zec) The FreeBSD Project UKUUG Spring](https://reader030.vdocument.in/reader030/viewer/2022041010/5eba2b9bf941f544331bc1d0/html5/thumbnails/23.jpg)
Really hard problem: shutting down cleanly
• We’ve been doing that for years, right?
• Actually, no -- we’ve been booting for years.
• But we’ve never shut down the network.
• We just power off and it goes away. :-)
• Now we need destructors.
20
20
![Page 24: The new VVorld - watson.orgrobert/freebsd/2010ukuug/20100324-newvvorld.pdf · The new VVorld Robert Watson and Bjoern Zeeb (and thanks to Marko Zec) The FreeBSD Project UKUUG Spring](https://reader030.vdocument.in/reader030/viewer/2022041010/5eba2b9bf941f544331bc1d0/html5/thumbnails/24.jpg)
Status
• FreeBSD 8.0 VIMAGE “highly experimental”
• Known memory leaks on stack shutdown
• Several known crash conditions
• Many subsystems not fully virtualized
• Foundation will shortly announce new funding for productionization work
• Goal production-quality VIMAGE in 9.1/9.2
21
21
![Page 25: The new VVorld - watson.orgrobert/freebsd/2010ukuug/20100324-newvvorld.pdf · The new VVorld Robert Watson and Bjoern Zeeb (and thanks to Marko Zec) The FreeBSD Project UKUUG Spring](https://reader030.vdocument.in/reader030/viewer/2022041010/5eba2b9bf941f544331bc1d0/html5/thumbnails/25.jpg)
How to give it a spin
• Update to 8-STABLE or 9-CURRENT
• Compile kernel with “options VIMAGE”
• Simple example:
jail -ci vnet path=/jail command=/bin/cshifconfig vlan100 vnet <id>
• http://wiki.freebsd.org/Image
• WARNING: EXPERIMENTAL
22
22
![Page 26: The new VVorld - watson.orgrobert/freebsd/2010ukuug/20100324-newvvorld.pdf · The new VVorld Robert Watson and Bjoern Zeeb (and thanks to Marko Zec) The FreeBSD Project UKUUG Spring](https://reader030.vdocument.in/reader030/viewer/2022041010/5eba2b9bf941f544331bc1d0/html5/thumbnails/26.jpg)
A few applications
• Network routing research
• Parallel overlay networks
• Large-scale hosting
23
23
![Page 27: The new VVorld - watson.orgrobert/freebsd/2010ukuug/20100324-newvvorld.pdf · The new VVorld Robert Watson and Bjoern Zeeb (and thanks to Marko Zec) The FreeBSD Project UKUUG Spring](https://reader030.vdocument.in/reader030/viewer/2022041010/5eba2b9bf941f544331bc1d0/html5/thumbnails/27.jpg)
Routing simulation
24
Host
Jail
VNETJail
VNET Jail
VNET
Jail
VNET
Jail
VNETJail
VNET
Jail
VNETJail
VNET
Jail
VNET
Jail
VNET
Jail
VNETJail
VNETJail
VNET
Jail
VNET
VNET
Trivially simulate thousands of nodes with arbitrary topologies and fully functional, independent network stacks
24
![Page 28: The new VVorld - watson.orgrobert/freebsd/2010ukuug/20100324-newvvorld.pdf · The new VVorld Robert Watson and Bjoern Zeeb (and thanks to Marko Zec) The FreeBSD Project UKUUG Spring](https://reader030.vdocument.in/reader030/viewer/2022041010/5eba2b9bf941f544331bc1d0/html5/thumbnails/28.jpg)
Virtualized overlay infrastructure
25
Host 1
VNET 0
VNET 1
VNET 2
Host 2
VNET 0
VNET 1
VNET 2
Host 3
VNET 0
VNET 1
VNET 2
Host 4
VNET 0
VNET 1
VNET 2
Host 5
VNET 0
VNET 1
VNET 2
Multiple network stacks allow router/bridge/VAP nodes to implement complex policies using minimal hardware
25
![Page 29: The new VVorld - watson.orgrobert/freebsd/2010ukuug/20100324-newvvorld.pdf · The new VVorld Robert Watson and Bjoern Zeeb (and thanks to Marko Zec) The FreeBSD Project UKUUG Spring](https://reader030.vdocument.in/reader030/viewer/2022041010/5eba2b9bf941f544331bc1d0/html5/thumbnails/29.jpg)
Large-scale hosting
26
Host
VNET 0
Jail 1
VNET 1
VLAN 101
TCP/IP
FW IPSEC
Jail 2
VNET 2
VLAN 102
TCP/IP
FW IPSEC
Jail 3
VNET 3
VLAN 103
TCP/IP
FW IPSEC
Jail 4
VNET 4
VLAN 104
TCP/IP
FW IPSEC
...
igb0
Jails each have their own fully delegated connection tables, routing tables, firewalls, IPsec, ...
26
Nested jails, with and without VNETs
![Page 30: The new VVorld - watson.orgrobert/freebsd/2010ukuug/20100324-newvvorld.pdf · The new VVorld Robert Watson and Bjoern Zeeb (and thanks to Marko Zec) The FreeBSD Project UKUUG Spring](https://reader030.vdocument.in/reader030/viewer/2022041010/5eba2b9bf941f544331bc1d0/html5/thumbnails/30.jpg)
Some other ideas
27
• Efficient server consolidation
• 500K memory overhead vs. 256M+ VM
• Virtualized appliances
• Multi-instance appliances, such as file stores, firewalls, filters, ...
• Neat: Debian/kFreeBSD on VNETs
27
2MB with ZFS or nullfs providing efficient storage, before applications
![Page 31: The new VVorld - watson.orgrobert/freebsd/2010ukuug/20100324-newvvorld.pdf · The new VVorld Robert Watson and Bjoern Zeeb (and thanks to Marko Zec) The FreeBSD Project UKUUG Spring](https://reader030.vdocument.in/reader030/viewer/2022041010/5eba2b9bf941f544331bc1d0/html5/thumbnails/31.jpg)
Conclusion
• Virtual kernel features, such as a virtual network stack, finally becoming a reality
• Prototype operates with increasing stability and little performance overhead
• Adds to virtualization menu; can be combined with other techniques like Xen
• Coming soon(ish)...
28
28
Cleverly, we are able to take advantage of many virtualization-centric hardware optimizations.
For example: MAC address filtering with RSS.