Distributed System: Lecture 4 Virtualizations
Box Leangsuksun SWECO Endowed Professor, Computer Science Louisiana Tech University [email protected]
CTO, PB Tech International Inc. [email protected]
Introduction to Virtualization
• System virtualization studied since the 70's (Goldberg, Popek)
• Fundamental – Run multiple virtual machines (OSes) simultaneously – Isolating between virtual machines. – Controlling Resources sharing between VMs – Increase resources utilization – One of the hottest technologies since 2006
Virtualization: Key concepts
• Virtual Machine (VM), guest OS: complete operating system running in a virtual environment
• Host OS: operating system running on top the hardware, interface between the user and the VMM and VMs
• Virtual Machine Monitor (VMM):, Hypervisor: manage VMs (scheduling, hardware access)
Virtualization: Usage
Ø Server consolidation (cloud)
Ø Software testing
Ø Security, Isolation (cloud)
Ø Lower cost of ownership of server. (cloud)
Ø Increase manageability (cloud)
Ø Enhance server reliability
Major Fields of Virtualization
• Storage Virtualization
• Network Virtualization
• Server Virtualization
Credit: CS5204 – Operating Systems from vtech u
Architecture & Interfaces • Architecture: formal specification of a system’s interface and the
logical behavior of its visible resources.
Hardware
System ISA User ISA
Operating System
System Calls Libraries
Applications
ISA
ABI
API
n API – application binary interface n ABI – application binary interface n ISA – instruction set architecture
Sample of API vs ABI
4/22/14 Towards survivable architecture 7
CS5204 – Operating Systems
VMM Types
• System
n Process
¨ Provides ABI interface ¨ Efficient execution ¨ Can add OS-independent
services (e.g., migration, intrusion detection)
¨ Provdes API interface ¨ Easier installation ¨ Leverage OS services (e.g.,
device drivers) ¨ Execution overhead
(possibly mitigated by just-in-time compilation)
Credit: CS5204 – Operating Systems from vtech u
CS5204 – Operating Systems
System-level Design Approaches
• Full virtualization (direct execution) – Exact hardware exposed to OS – Efficient execution – OS runs unchanged – Requires a “virtualizable” architecture – Example: VMWare
n Paravirtualization ¨ OS modified to execute under VMM ¨ Requires porting OS code ¨ Execution overhead ¨ Necessary for some (popular)
architectures (e.g., x86) ¨ Examples: Xen, Denali
Credit: CS5204 – Operating Systems from vtech u
CS5204 – Operating Systems
Design Space (level vs. ISA)
• Variety of techniques and approaches available • Critical technology space highlighted
API interface ABI interface
Credit: CS5204 – Operating Systems from vtech u
CS5204 – Operating Systems
System VMMs
• Structure – Type 1: runs directly on host hardware – Type 2: runs on HostOS
• Primary goals – Type 1: High performance – Type 2: Ease of construction/installation/acceptability
• Examples – Type 1: VMWare ESX Server, Xen, OS/370 – Type 2: User-mode Linux
Type 1
Type 2
Credit: CS5204 – Operating Systems from vtech u
CS5204 – Operating Systems
Hosted VMMs
• Structure – Hybrid between Type1 and Type2 – Core VMM executes directly on hardware – I/O services provided by code running on HostOS
• Goals
– Improve performance overall – leverages I/O device support on the HostOS
• Disadvantages – Incurs overhead on I/O operations – Lacks performance isolation and performance guarantees
• Example: VMWare (Workstation)
Credit: CS5204 – Operating Systems from vtech u
CS5204 – Operating Systems
Whole-system VMMs
n Challenge: GuestOS ISA differs from HostOS ISA
n Requires full emulation of GuestOS and its applications
n Example: VirtualPC
Credit: CS5204 – Operating Systems from vtech u
CS5204 – Operating Systems
Strategies
• De-privileging
– VMM emulates the effect on system/hardware resources of privileged instructions whose execution traps into the VMM
– aka trap-and-emulate – Typically achieved by running GuestOS at a lower hardware
priority level than the VMM – Problematic on some architectures where privileged
instructions do not trap when executed at deprivileged priority
• Primary/shadow structures – VMM maintains “shadow” copies of critical structures whose
“primary” versions are manipulated by the GuestOS – e.g., page tables – Primary copies needed to insure correct environment visible
to GuestOS
• Memory traces – Controlling access to memory so that the shadow and primary
structure remain coherent – Common strategy: write-protect primary copies so that update
operations cause page faults which can be caught, interpreted, and emulated.
resource
vmm
privileged instruction
trap
GuestOS
resource
emulate change
change
Credit: CS5204 – Operating Systems from vtech u
Different Virtualization Concepts
• Full-virtualization: full virtual machine, from the boot sequence to the virtualized hardware
• Para-virtualization: the guest OS has to be modify for performance optimization
• Emulation: the guest OS architecture is different from the architecture of the host OS (translation on the fly). Ex: PPC VM on top of a x86 host OS.
Classification
• Two kinds of system virtualization – Type-I: the virtual machine monitor and the virtual
machine run directly on top of the hardware, – Type-II: the virtual machine monitor and the virtual
machine run on top of the host OS
Hardware
Host OS
VMM
VM VM
Hardware
VMM
Host OS VM VM
Type I Virtualization (Bare-metal) Type II Virtualization
(hosted)
VMware ESX, Microsoft Hyper-V, Xen VMware Workstation, Microsoft Virtual PC, Sun VirtualBox, QEMU, KVM
17
Bare-metal or hosted?
• Bare-metal – Has complete control over hardware – Doesn’t have to “fight” an OS
• Hosted – Avoid code duplication: need not code a process scheduler,
memory management system – the OS already does that – Can run native processes alongside VMs – Familiar environment – how much CPU and memory does a VM
take? Use top! How big is the virtual disk? ls –l – Easy management – stop a VM? Sure, just kill it!
• A combination – Mostly hosted, but some parts are inside the OS kernel for
performance reasons – E.g., KVM
Available Solutions • Example of Virtualization Projects
– Type I: Xen, L4, VMware ESX, Microsoft Hyper-V
• Type II: VMware Workstation, Microsoft Virtual PC, Sun VirtualBox, QEMU, KVM
• Different Benefits – Type I: performances
• direct access to the hardware simple to implement • para-virtualization possible
– Type II: development • no limitation of para-virtualization • emulation possible
19
How to run a VM? Emulate!
• Do whatever the CPU does but in software • Fetch the next instruction • Decode – is it an ADD, a XOR, a MOV? • Execute – using the emulated registers and memory Example: addl %ebx, %eax is emulated as: enum {EAX=0, EBX=1, ECX=2, EDX=3, …}; unsigned long regs[8]; regs[EAX] += regs[EBX];
20
How to run a VM? Emulate!
• Pro: – Simple!
• Con: – Slooooooooow
• Example hypervisor: BOCHS
21
How to run a VM? Trap and emulate!
• Run the VM directly on the CPU – no emulation! • Most of the code can execute just fine
– E.g., addl %ebx, %eax • Some code needs hypervisor intervention
– int $0x80 – movl something, %cr3 – I/O
• Trap and emulate it! – E.g., if guest runs int $0x80, trap it and execute
guest’s interrupt 0x80 handler
22
How to run a VM? Trap and emulate!
• Pro: – Performance!
• Cons: – Harder to implement – Need hardware support
• Not all “sensitive” instructions cause a trap when executed in usermode
• E.g., POPF, that may be used to clear IF • This instruction does not trap, but value of IF does not
change!
– This hardware support is called VMX (Intel) or SVM (AMD)
– Exists in modern CPUs
• Example hypervisor: KVM
23
How to run a VM? Dynamic (binary) translation!
• Take a block of binary VM code that is about to be executed • Translate it on the fly to “safe” code (like JIT – just in time
compilation) • Execute the new “safe” code directly on the CPU
• Translation rules? – Most code translates identically (e.g., movl %eax, %ebx translates to
itself) – “Sensitive” operations are translated into hypercalls
• Hypercall – call into the hypervisor to ask for service • Implemented as trapping instructions (unlike POPF) • Similar to syscall – call into the OS to request service
24
How to run a VM? Dynamic (binary) translation!
• Pros: – No hardware support required – Performance – better than emulation
• Cons: – Performance – worse than trap and emulate – Hard to implement – hypervisor needs on-the-fly x86-
to-x86 binary compiler
• Example hypervisors: VMware, QEMU
25
How to run a VM? Paravirtualization!
• Does not run unmodified guest OSes • Requires guest OS to “know” it is running on top
of a hypervisor
• E.g., instead of doing cli to turn off interrupts, guest OS should do hypercall(DISABLE_INTERRUPTS)
26
How to run a VM? Paravirtualization!
• Pros: – No hardware support required – Performance – better than emulation
• Con: – Requires specifically modified guest – Same guest OS cannot run in the VM and bare-metal
• Example hypervisor: Xen
27
Industry trends
• Trap and emulate
• With hardware support
• VMX, SVM
Linux-related virtualization projects Project Type License
Bochs Emulation LGPL
QEMU Emulation LGPL/GPL
VMware Full virtualization Proprietary
z/VM Full virtualization Proprietary
Xen Paravirtualization GPL
UML Paravirtualization GPL
Linux-VServer Operating system-level virtualization
GPL
OpenVZ Operating system-level virtualization
GPL
Hardware support for full virtualization and paravirtualization
• Recall that the IA-32 (x86) architecture creates some issues when it comes to virtualization. Certain privileged-mode instructions do not trap, and can return different results based upon the mode. For example, the x86 STR instruction retrieves the security state, but the value returned is based upon the particular requester's privilege level. This is problematic when attempting to virtualize different operating systems at different levels. For example, the x86 supports four rings of protection, where level 0 (the highest privilege) typically runs the operating system, levels 1 and 2 support operating system services, and level 3 (the lowest level) supports applications. Hardware vendors have recognized this shortcoming (and others), and have produced new designs that support and accelerate virtualization.
• Intel is producing new virtualization technology that will support hypervisors for both the x86 (VT-x) and Itanium® (VT-i) architectures.
• The VT-x supports two new forms of operation – one for the VMM (root) – one for guest operating systems (non-root).
• The root form is fully privileged, while the non-root form is deprivileged (even for ring 0).
• The architecture also supports flexibility in defining the instructions that cause a VM (guest operating system) to exit to the VMM and store off processor state. Other capabilities have been added
Hardware support for full virtualization and paravirtualization
• AMD is also producing hardware-assisted virtualization technology, under the name Pacifica.
• Among other things, Pacifica maintains a control block for guest operating systems that are saved on execution of special instructions.
• The VMRUN instruction allows a virtual machine (and its associated guest operating system) to run until the VMM regains control (which is also configurable). The configurability allows the VMM to customize the privileges for each of the guests.
• Pacifica also amends address translation with host and guest memory management unit (MMU) tables.
Hardware support for full virtualization and paravirtualization
32
I/O Virtualization
• Typical methods to virtualize the CPU • A computer is more than a CPU • Also need I/O!
• Types of I/O: – Block (e.g., hard disk) – Network – Input (e.g., keyboard, mouse) – Sound – Video
• Most performance critical (for servers): – Network – Block
Xen Overview
• Para-virtualization possible
– full-virtualization is virtualization support at the hardware level (VT Intel technology, AMD-V/Pacifica technology)
– XenoLinux: port of the Linux kernel to the Xen Hypervisor • Hypervisor based on a micro-kernel Ø Open Source, Linux based Ø Create and manage VMs via command line Ø Restricted hardware access though API Ø Host’s kernel need to be patched.
VMware Overview
Ø Commercial virtualization applications Ø Full Virtualization Ø Highly Portability Ø Simulate BIOS, PXE boot. Ø Simulate virtual Hardware for every VM Ø Support Bridge, NAT, and Host-Only Networks Ø Run wide range unmodified guest OSes such as Windows,
Linux, Solaris, BSD, Netware, DOS,
VMware Overview
Source : http://www.vmware.com
VMware vs. Xen
Relative performance on native Linux (L), Xen/Linux (X), VMware Workstation 3.2 (V), and User Mode Linux (U).
Source : “Xen and Art of Virtualization”, Ian Pratt, University of Cambridge, Xensource Inc. Http://www.cl.cam.ac.uk/netos/papers/2005-xen-may.ppt
VMware vs. Xen (TCP results)
Source : “Xen and Art of Virtualization”, Ian Pratt, University of Cambridge, Xensource Inc. Http://www.cl.cam.ac.uk/netos/papers/2005-xen-may.ppt
L X V U Tx, MTU 1500 (Mbps) L X V U
Rx, MTU 1500 (Mbps) L X V U Tx, MTU 500 (Mbps) L X V U
Rx, MTU 500 (Mbps) 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 1.1
TCP bandwidth on Linux (L), Xen (X), VMWare Workstation (V), and UML (U)
Qemu
• Emulation solution • Direct access to the hardware possible if the host
OS and the guest OS have the same architecture
User Space
Linux
Drivers
Qemu x86
Host OS: Linux, Mac OS X, Windows
Hardware: processor, memory, disk, network, etc.
From http://fr.wikipedia.org/wiki/Qemu
User Space
Windows
Drivers
Qemu x86
User Space
Linux
Drivers
Qemu PPC
User Space
Mac OS X
Drivers
Qemu PPC
User Space
Solaris
Drivers
Qemu Sparc
Xen Overview
• Para-virtualization possible
– full-virtualization is virtualization support at the hardware level (VT Intel technology, AMD-V/Pacifica technology)
– XenoLinux: port of the Linux kernel to the Xen Hypervisor
• Hypervisor based on a micro-kernel • Efficient virtualization: HPC possible
Xen Overview
Ø Open Source, Linux based Ø High Performance Ø Support Bridge, and Routing Networks Ø Create and manage VMs via command line Ø Restricted hardware access though API Ø Host’s kernel need to be patched.
Xen’s Ring Model
Ring 0
Xen’s Hypervisor
Ring 3 User Applications
Ring 2 is not used
Xen on x86 Architecture
Ring 1 for VM’s
Ring 3 User Applications
Ring 1 and 2 are not used
Standard x86 Architecture
Ring 0
Operating System
43 Instructor’s Guide for Coulouris, Dollimore, Kindberg and Blair, Distributed Systems: Concepts and Design Edn. 5
© Pearson Education 2012
The architecture of Xen
43
44 Instructor’s Guide for Coulouris, Dollimore, Kindberg and Blair, Distributed Systems: Concepts and Design Edn. 5
© Pearson Education 2012
Use of rings of privilege
44
45 Instructor’s Guide for Coulouris, Dollimore, Kindberg and Blair, Distributed Systems: Concepts and Design Edn. 5
© Pearson Education 2012
Virtualization of memory management
45
46 Instructor’s Guide for Coulouris, Dollimore, Kindberg and Blair, Distributed Systems: Concepts and Design Edn. 5
© Pearson Education 2012
Split device drivers
46
47 Instructor’s Guide for Coulouris, Dollimore, Kindberg and Blair, Distributed Systems: Concepts and Design Edn. 5
© Pearson Education 2012
I/O rings
47
48 Instructor’s Guide for Coulouris, Dollimore, Kindberg and Blair, Distributed Systems: Concepts and Design Edn. 5
© Pearson Education 2012
The XenoServer Open Platform Architecture
48
Virtualization Examples
• Server consolidation - Virtual machines are used to consolidate many physical servers into fewer servers, which in turn host virtual machines. Each physical server is reflected as a virtual machine "guest" residing on a virtual machine host system. This is also known as Physical-to-Virtual or 'P2V' transformation.
• Disaster recovery - Virtual machines can be used as "hot standby" environments for physical production servers. This changes the classical "backup-and-restore" philosophy, by providing backup images that can "boot" into live virtual machines, capable of taking over workload for a production server experiencing an outage.
Virtualization Examples
• Testing and training - Hardware virtualization can give root access to a virtual machine. This can be very useful such as in kernel development and operating system courses.
Virtualization Examples
• Portable applications - The Microsoft Windows platform has a well-known issue involving the creation of portable applications, needed (for example) when running an application from a removable drive, without installing it on the system's main disk drive. This is a particular issue with USB drives. Virtualization can be used to encapsulate the application with a redirection layer that stores temporary files, Windows Registry entries, and other state information in the application's installation directory – and not within the system's permanent file system. See portable applications for further details. It is unclear whether such implementations are currently available.
Virtualization Examples
• Portable workspaces - Recent technologies have used virtualization to create portable workspaces on devices like iPods and USB memory sticks. These products include: – Application Level – Thinstall – which is a driver-less solution for
running "Thinstalled" applications directly from removable storage without system changes or needing Admin rights
– OS-level – MojoPac, Ceedo, and U3 – which allows end users to install some applications onto a storage device for use on another PC.
– Machine-level – moka5 and LivePC – which delivers an operating system with a full software suite, including isolation and security protections.
Virtualization Examples
Virtualization Tips
• In the VMware space, VirtualCenter is the management tool of choice for ESX Server.
• Other products, like Hewlett-Packard's Virtual Machine Management or IBM's Director modules, are adding functionality to deal with virtual machine [VM] environments.
• The problem is that most of these tools that are snap-ins lack much of the simple functionality you get in VirtualCenter.
• Most companies will end up buying both VirtualCenter and the vendor's tool and use both depending on what they are doing.
Virtualization Tips
• Shy away from large amounts of processing when doing consolidation.
• If you are doing virtualization for other reasons, like workload management, then you can get nearly anything to run virtualized if you are willing to change some of the things you do.
• However, if you are looking for maximum consolidation ratios and high ROIs, stay away from the quad boxes that are already running at 50%.
VM on Amazon
4/22/14 Towards survivable architecture 56
Security Tips
• Some standard minimum security at least: – Disable remote root access – use sudo when needed – configure the AD PAM modules for Windows shops.
• Some organizations use too much surrounding security and end up making their environment slower, more difficult and expensive to manage.
• When dealing with the VMs, all of the standard procedures should be followed.
• The host systems themselves should often be considered appliances, and organizations should limit the amount of customized agents and security hacks performed on these systems.
Security Tips
• One should not go overboard with ESX hosts, since they are basically appliances serving up computing resources and should be treated as such. Nevertheless, taking a common sense approach to security on the servers is the best bet.
• The most common mistakes made with virtual security are based on ignorance, lack of knowledge of the Linux console, failure to understand how virtual switch architecture works, and what the host does not directly see in the data in the VM disk files.
Security Tips
• The same practices that are performed to secure a physical environment can, and should, be used in a virtual environment as well.
• Everything from proper VLAN/firewall organization to host-based intrusion detection should be leveraged to keep the environment secure.
Security Tips
Scalability Tips
• Simplicity. The more complicated the design and infrastructure, the less scalable it will be. – For example, a common mistake in large organizations, is
that they assume they cannot create a simple solution because they are big. One can argue that they should make the solution or design for VMware as simple as possible to make it scalable for the size of their organization and largest client base.
• Don't design the entire solution around the one-offs.
• When designing a virtual infrastructure, one should never look at the environment and try to plan one large infrastructure for the entire virtualization project. It won’t work.
• Organize the overall environment into smaller groupings of servers and addressed individually.
• When approached this way, at the end of the project, a very scalable deployment methodology that uses the same principals with a manageable number of servers in various phases of the project will be in place
Scalability Tips