part 1 – concepts & xencs5204/fall07-kafura/... · 2007. 8. 1. · 3 cs 5204 – fall, 2007...

30
1 Virtualization Part 1 – Concepts & XEN

Upload: others

Post on 01-Apr-2021

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Part 1 – Concepts & XENcs5204/fall07-kafura/... · 2007. 8. 1. · 3 CS 5204 – Fall, 2007 Virtualization 3 Definitions Virtualization A layer mapping its visible interface and

1

Virtualization

Part 1 – Concepts & XEN

Page 2: Part 1 – Concepts & XENcs5204/fall07-kafura/... · 2007. 8. 1. · 3 CS 5204 – Fall, 2007 Virtualization 3 Definitions Virtualization A layer mapping its visible interface and

2

CS 5204 – Fall, 2007

Virtualization

2

Concepts

James Smith, Ravi Nair, “The Architectures of Virtual Machines,” IEEE Computer, May 2005, pp. 32-38.Mendel Rosenblum, Tal Garfinkel, “Virtual Machine Monitors: Current Technology and Future Trends,” IEEE Computer, May 2005, pp. 39-47. L.H. Seawright, R.A. MacKinnon, “VM/370 – a study of multiplicity and usefulness,” IBM Systems Journal, vol. 18, no. 1, 1979, pp. 4-17.S.T. King, G.W. Dunlap, P.M. Chen, “Operating System Support for Virtual Machines,” Proceedings of the 2003 USENIX Technical Conference, June 9-14, 2003, San Antonio TX, pp. 71-84. A. Whitaker, R.S. Cox, M. Shaw, S.D. Gribble, “Rethinking the Design of Virtual Machine Monitors,” IEEE Computer, May 2005, pp. 57-62.G.J. Popek, and R.P. Goldberg, “Formal requirements for virtualizable third generation architectures,” CACM, vol. 17 no. 7, 1974, pp. 412-421.

References and Sources

Page 3: Part 1 – Concepts & XENcs5204/fall07-kafura/... · 2007. 8. 1. · 3 CS 5204 – Fall, 2007 Virtualization 3 Definitions Virtualization A layer mapping its visible interface and

3

CS 5204 – Fall, 2007

Virtualization

3

DefinitionsVirtualization

A layer mapping its visible interface and resources onto the interface and resources of the underlying layer or system on which it is implementedPurposes

Abstraction – to simplify the use of the underlying resource (e.g., by removing details of the resource’s structure)Replication – to create multiple instances of the resource (e.g., to simplifymanagement or allocation)Isolation – to separate the uses which clients make of the underlying resources (e.g., to improve security)

Virtual Machine Monitor (VMM)A virtualization system that partitions a single physical “machine” into multiple virtual machines.Terminology

Host – the machine and/or software on which the VMM is implementedGuest – the OS which executes under the conrol of the VMM

Page 4: Part 1 – Concepts & XENcs5204/fall07-kafura/... · 2007. 8. 1. · 3 CS 5204 – Fall, 2007 Virtualization 3 Definitions Virtualization A layer mapping its visible interface and

4

CS 5204 – Fall, 2007

Virtualization

4

Origins - Principles

EfficiencyInnocuous instructions should execute directly on the hardware

Resource controlExecuted programs may not affect the system resources

EquivalenceThe behavior of a program executing under the VMM should be the same as if the program were executed directly on the hardware (except possibly for timing and resource availability) Communications of the ACM, vol 17, no 7, 1974, pp.412-421

“an efficient, isolated duplicate of the real machine”

Page 5: Part 1 – Concepts & XENcs5204/fall07-kafura/... · 2007. 8. 1. · 3 CS 5204 – Fall, 2007 Virtualization 3 Definitions Virtualization A layer mapping its visible interface and

5

CS 5204 – Fall, 2007

Virtualization

5

Origins - Principles

Instruction types

Privilegedan instruction traps in unprivileged (user) mode but not in privileged (supervisor) mode.

SensitiveControl sensitive –

attempts to change the memory allocation or privilege modeBehavior sensitive

Location sensitive – execution behavior depends on location in memoryMode sensitive – execution behavior depends on the privilege mode

Innocuous – an instruction that is not sensitive

TheoremFor any conventional third generation computer, a virtual machine monitor may be constructed if the set of sensitive instructions for that computer is a subset of the set of privileged instructions.

SignficanceThe IA-32/x86 architecture is not virtualizable.

Page 6: Part 1 – Concepts & XENcs5204/fall07-kafura/... · 2007. 8. 1. · 3 CS 5204 – Fall, 2007 Virtualization 3 Definitions Virtualization A layer mapping its visible interface and

6

CS 5204 – Fall, 2007

Virtualization

6

Origins - Technology

Concurrent execution of multiple production operating systemsTesting and development of experimental systemsAdoption of new systems with continued use of legacy systems Ability to accommodate applications requiring special-purpose OSIntroduced notions of “handshake” and “virtual-equals-real mode” to allow sharing of resource control information with CPLeveraged ability to co-design hardware, VMM, and guestOS

IBM Systems Journal, vol. 18, no. 1, 1979, pp. 4-17.

Page 7: Part 1 – Concepts & XENcs5204/fall07-kafura/... · 2007. 8. 1. · 3 CS 5204 – Fall, 2007 Virtualization 3 Definitions Virtualization A layer mapping its visible interface and

7

CS 5204 – Fall, 2007

Virtualization

7

VMMs Rediscovered

Server/workload consolidation (reduces “server sprawl”)Compatible with evolving multi-core architecturesSimplifies software distributions for complex environments“Whole system” (workload) migrationImproved data-center management and efficiencyAdditional services (workload isolation) added “underneath” the OS

security (intrusion detection, sandboxing,…)fault-tolerance (checkpointing, roll-back/recovery)

VMM

Virtual Virtual MachineMachine

Guest OS

Application

Virtual Virtual MachineMachine

Guest OS

Application

Virtual Virtual MachineMachine

Guest OS

Application

RealMachine

Page 8: Part 1 – Concepts & XENcs5204/fall07-kafura/... · 2007. 8. 1. · 3 CS 5204 – Fall, 2007 Virtualization 3 Definitions Virtualization A layer mapping its visible interface and

8

CS 5204 – Fall, 2007

Virtualization

8

Architecture & InterfacesArchitecture: formal specification of a system’s interface and the logical behavior of its visible resources.

Hardware

System ISA User ISA

Operating System

System CallsLibraries

Applications

ISA

ABI

API

API – application binary interfaceABI – application binary interfaceISA – instruction set architecture

Page 9: Part 1 – Concepts & XENcs5204/fall07-kafura/... · 2007. 8. 1. · 3 CS 5204 – Fall, 2007 Virtualization 3 Definitions Virtualization A layer mapping its visible interface and

9

CS 5204 – Fall, 2007

Virtualization

9

VMM Types

System

Process

Provides ABI interfaceEfficient executionCan add OS-independent services (e.g., migration, intrustion detection)

Provdes API interfaceEasier installationLeverage OS services (e.g., device drivers)Execution overhead (possibly mitigated by just-in-time compilation)

Page 10: Part 1 – Concepts & XENcs5204/fall07-kafura/... · 2007. 8. 1. · 3 CS 5204 – Fall, 2007 Virtualization 3 Definitions Virtualization A layer mapping its visible interface and

10

CS 5204 – Fall, 2007

Virtualization

10

System-level Design Approaches

Full virtualization (direct execution)Exact hardware exposed to OSEfficient executionOS runs unchangedRequires a “virtualizable” architectureExample: VMWare

ParavirtualizationOS modified to execute under VMMRequires porting OS codeExecution overheadNecessary for some (popular) architectures (e.g., x86)Examples: Xen, Denali

Page 11: Part 1 – Concepts & XENcs5204/fall07-kafura/... · 2007. 8. 1. · 3 CS 5204 – Fall, 2007 Virtualization 3 Definitions Virtualization A layer mapping its visible interface and

11

CS 5204 – Fall, 2007

Virtualization

11

Design Space (level vs. ISA)

Variety of techniques and approaches availableCritical technology points highlighted

API interface ABI interface

Page 12: Part 1 – Concepts & XENcs5204/fall07-kafura/... · 2007. 8. 1. · 3 CS 5204 – Fall, 2007 Virtualization 3 Definitions Virtualization A layer mapping its visible interface and

12

CS 5204 – Fall, 2007

Virtualization

12

System VMMs

StructureType 1: runs directly on host hardwareType 2: runs on HostOS

Primary goalsType 1: High performance Type 2: Ease of construction/installation/acceptability

ExamplesType 1: VMWare ESX Server, Xen, OS/370Type 2: User-mode Linux

Type 1

Type 2

Page 13: Part 1 – Concepts & XENcs5204/fall07-kafura/... · 2007. 8. 1. · 3 CS 5204 – Fall, 2007 Virtualization 3 Definitions Virtualization A layer mapping its visible interface and

13

CS 5204 – Fall, 2007

Virtualization

13

Hosted VMMs

StructureHybrid between Type1 and Type2Core VMM executes directly on hardwareI/O services provided by code running on HostOS

GoalsImprove performance overallleverages I/O device support on the HostOS

DisadvantagesIncurs overhead on I/O operationsLacks performance isolation and performance guarantees

Example: VMWare (Workstation)

Page 14: Part 1 – Concepts & XENcs5204/fall07-kafura/... · 2007. 8. 1. · 3 CS 5204 – Fall, 2007 Virtualization 3 Definitions Virtualization A layer mapping its visible interface and

14

CS 5204 – Fall, 2007

Virtualization

14

Whole-system VMMs

Challenge: GuestOS ISA differs from HostOS ISARequires full emulation of GuestOS and its applicationsExample: VirtualPC

Page 15: Part 1 – Concepts & XENcs5204/fall07-kafura/... · 2007. 8. 1. · 3 CS 5204 – Fall, 2007 Virtualization 3 Definitions Virtualization A layer mapping its visible interface and

15

CS 5204 – Fall, 2007

Virtualization

15

Strategies

De-privilegingVMM emulates the effect on system/hardware resources of privileged instructions whose execution traps into the VMM aka trap-and-emulateTypically achieved by running GuestOS at a lower hardware priority level than the VMMProblematic on some architectures where privileged instructions do not trap when executed at deprivileged priority

Primary/shadow structuresVMM maintains “shadow” copies of critical structures whose “primary” versions are manipulated by the GuestOSe.g., page tablesPrimary copies needed to insure correct environment visible to GuestOS

Memory tracesControlling access to memory so that the shadow and primary structure remain coherentCommon strategy: write-protect shadow copies so that update operations cause page faults which can be caught, interpreted, and emulated.

resource

vmm

privileged instruction

trap

GuestOS

resource

emulate change

change

Page 16: Part 1 – Concepts & XENcs5204/fall07-kafura/... · 2007. 8. 1. · 3 CS 5204 – Fall, 2007 Virtualization 3 Definitions Virtualization A layer mapping its visible interface and

16

CS 5204 – Fall, 2007

Virtualization

16

Virtualizing the IA-32 (x86) architecture

Architecture has protection rings 0..3 with OS normally in ring 0 and applications in ring 3…

…and VMM must run in ring 0 to maintain its integrity and control

…but GuestOS not running in ring 0 is problematic:

Some privileged instructions execute only in ring 0 but do not fault when executed outside ring 0 (remember privileged vs. sensitive?)instructions for low latency system calls (SYSENTER/SYSEXIT) always transition to ring 0 forcing the VMM into unwanted emulation or overheadFor the Itanium architecture, interrupt registers only accessible in ring 0; forcing VMM to intercept each device driver access to these registers has severe performance consequencesMasking interrupts can only be done in ring 0Ring compression: paging does not distinguish privilege levels 0-2, GuestOS must run in ring 3 but is then not protected from its applications also running in ring 3Cannot be used for 64-bit guests on IA-32The fact that it is not running in ring 0 can be detected (is this important?)

Page 17: Part 1 – Concepts & XENcs5204/fall07-kafura/... · 2007. 8. 1. · 3 CS 5204 – Fall, 2007 Virtualization 3 Definitions Virtualization A layer mapping its visible interface and

17

CS 5204 – Fall, 2007

Virtualization

17

Memory Management

Isolation/protection of Guest OS address spacesEfficient MM address translation

VMMmachine

VMM GuestOS

“shadow” page tables page tables

processvirtual

OSphysical

entityaddress space

Page 18: Part 1 – Concepts & XENcs5204/fall07-kafura/... · 2007. 8. 1. · 3 CS 5204 – Fall, 2007 Virtualization 3 Definitions Virtualization A layer mapping its visible interface and

18

CS 5204 – Fall, 2007

Virtualization

18

XEN: paravirtualization

References and SourcesPaul Barham, et.al., “Xen and the Art of Virtualization,” Symposium on Operating Systems Principles 2003 (SOSP’03), October 19-22, 2003, Bolton Landing, New York.Presentation by Ian Pratt available at http://www.cl.cam.ac.uk/netos/papers/2005-xen-may.ppt

Computer Laboratory

Page 19: Part 1 – Concepts & XENcs5204/fall07-kafura/... · 2007. 8. 1. · 3 CS 5204 – Fall, 2007 Virtualization 3 Definitions Virtualization A layer mapping its visible interface and

19

CS 5204 – Fall, 2007

Virtualization

19

Xen - Structure

Employs paravirtualizationstrategy

Deals with machine architectures that cannot be virtualizedRequires modifications to guest OSAllows optimizations

“Domain 0”has special access to control interface for platform managementHas back-end device drivers

Xen VMM entirely event driven no internal threads

Xen 3.0 Architecture

Page 20: Part 1 – Concepts & XENcs5204/fall07-kafura/... · 2007. 8. 1. · 3 CS 5204 – Fall, 2007 Virtualization 3 Definitions Virtualization A layer mapping its visible interface and

20

CS 5204 – Fall, 2007

Virtualization

20

MMU Virtualizion : Shadow-Mode

MMU

Accessed &dirty bits

Guest OS

VMMHardware

guest writes

guest reads Virtual → physical

Virtual → Machine

Updates

Page 21: Part 1 – Concepts & XENcs5204/fall07-kafura/... · 2007. 8. 1. · 3 CS 5204 – Fall, 2007 Virtualization 3 Definitions Virtualization A layer mapping its visible interface and

21

CS 5204 – Fall, 2007

Virtualization

21

MMU Virtualization : Direct-Mode

MMU

Guest OS

Xen VMMHardware

guest writes

guest reads

Virtual → Machine

Page 22: Part 1 – Concepts & XENcs5204/fall07-kafura/... · 2007. 8. 1. · 3 CS 5204 – Fall, 2007 Virtualization 3 Definitions Virtualization A layer mapping its visible interface and

22

CS 5204 – Fall, 2007

Virtualization

22

Queued Update Interface (Xen 1.2)

MMU

Guest OS

Xen VMMHardware

validation

guestwrites

guest readsVirtual → Machine

Page 23: Part 1 – Concepts & XENcs5204/fall07-kafura/... · 2007. 8. 1. · 3 CS 5204 – Fall, 2007 Virtualization 3 Definitions Virtualization A layer mapping its visible interface and

23

CS 5204 – Fall, 2007

Virtualization

23

Writeable Page Tables : 1 – write fault

MMU

Guest OS

Xen VMMHardware

page fault

first guestwrite

guest reads

Virtual → Machine

Page 24: Part 1 – Concepts & XENcs5204/fall07-kafura/... · 2007. 8. 1. · 3 CS 5204 – Fall, 2007 Virtualization 3 Definitions Virtualization A layer mapping its visible interface and

24

CS 5204 – Fall, 2007

Virtualization

24

Writeable Page Tables : 2 - Unhook

MMU

Guest OS

Xen VMMHardware

guest writes

guest reads

Virtual → MachineX

Page 25: Part 1 – Concepts & XENcs5204/fall07-kafura/... · 2007. 8. 1. · 3 CS 5204 – Fall, 2007 Virtualization 3 Definitions Virtualization A layer mapping its visible interface and

25

CS 5204 – Fall, 2007

Virtualization

25

Writeable Page Tables : 3 - First Use

MMU

Guest OS

Xen VMMHardware

page fault

guest writes

guest reads

Virtual → MachineX

Page 26: Part 1 – Concepts & XENcs5204/fall07-kafura/... · 2007. 8. 1. · 3 CS 5204 – Fall, 2007 Virtualization 3 Definitions Virtualization A layer mapping its visible interface and

26

CS 5204 – Fall, 2007

Virtualization

26

Writeable Page Tables : 4 – Re-hook

MMU

Guest OS

Xen VMMHardware

validate

guest writes

guest reads

Virtual → Machine

Page 27: Part 1 – Concepts & XENcs5204/fall07-kafura/... · 2007. 8. 1. · 3 CS 5204 – Fall, 2007 Virtualization 3 Definitions Virtualization A layer mapping its visible interface and

27

CS 5204 – Fall, 2007

Virtualization

27

I/OSafe hardware interfaces

I/O SpacesRestricts access to I/O registers

Isolated Device Drive Driver isolated from VMM in its own “domain” (i.e., VM)Communication between domains via device channels

Unified interfacesCommon interface for group of similar devicesExposes raw device interface (e.g., for specialized devices like sound/video)

Separate request/response from event notification

I/O descriptor ringsUsed to communicate I/O requests and responsesFor bulk data transfer devices (DMA, network), buffer space allocated out of band by GuestOSDescriptor contains unique identifier to allow out of order processingMultiple requests can be added before hypercall made to begin processingEvent notification can be masked by GuestOS for its convenience

Page 28: Part 1 – Concepts & XENcs5204/fall07-kafura/... · 2007. 8. 1. · 3 CS 5204 – Fall, 2007 Virtualization 3 Definitions Virtualization A layer mapping its visible interface and

28

CS 5204 – Fall, 2007

Virtualization

28

Device Channels

Connects “front end” device drivers in GuestOS with “native” device driverIs an I/O descriptor ringBuffer page(s) allocated by GuestOS and “granted” to XenBuffer page(s) is/are pinned to prevent page-out during I/O operationPinning allows zero-copy data transfer

Page 29: Part 1 – Concepts & XENcs5204/fall07-kafura/... · 2007. 8. 1. · 3 CS 5204 – Fall, 2007 Virtualization 3 Definitions Virtualization A layer mapping its visible interface and

29

CS 5204 – Fall, 2007

Virtualization

29

System Performance

L X V USPEC INT2000 (score)

L X V ULinux build time (s)

L X V UOSDB-OLTP (tup/s)

L X V USPEC WEB99 (score)

0.00.10.20.30.40.50.60.70.80.91.01.1

Benchmark suite running on Linux (L), Xen (X), VMware Workstation (V), and UML (U)

Benchmark suitesSpec INT200: compute intensive workloadLinux build time: extensive file I/O, scheduling, memory managementOSBD-OLTP: transaction processing workload, extensive synchronous disk I/OSpec WEB99: web-like workload (file and network traffic)

Fair comparison?

Page 30: Part 1 – Concepts & XENcs5204/fall07-kafura/... · 2007. 8. 1. · 3 CS 5204 – Fall, 2007 Virtualization 3 Definitions Virtualization A layer mapping its visible interface and

30

CS 5204 – Fall, 2007

Virtualization

30

I/O Peformance

SystemsL: LinuxIO-S: Xen using IO-Space accessIDD: Xen using isolated device driver

BenchmarksLinux build time: file I/O, scheduling, memory managementPM: file system benchmarkOSDB-OLTP: transaction processing workload, extensive synchronous disk I/Ohttperf: static document retrievelSpecWeb99: web-like workload (file and network traffic)