device virtualization architecture jake oshins architect windows virtualization microsoft...
TRANSCRIPT
Device VirtualizationDevice VirtualizationArchitectureArchitecture
Jake OshinsJake OshinsArchitectArchitectWindows VirtualizationWindows VirtualizationMicrosoft CorporationMicrosoft Corporation
GoalsGoals
Participants will leave with an Participants will leave with an understanding ofunderstanding of
How Microsoft intends to enable efficient How Microsoft intends to enable efficient I/O virtualizationI/O virtualization
How others’ I/O solutions interact with How others’ I/O solutions interact with Microsoft’s virtualization systemsMicrosoft’s virtualization systems
Which I/O virtualization strategies willWhich I/O virtualization strategies willbe available with Windows Server be available with Windows Server virtualization and which must waitvirtualization and which must wait
AgendaAgenda
General strategies for I/O virtualizationGeneral strategies for I/O virtualization
Technical overview ofTechnical overview ofVirtual Device FrameworkVirtual Device Framework
Technical overview of VMBusTechnical overview of VMBus
Device EmulationDevice Emulation
Virtual machine “Virtual machine “seessees” real hardware devices” real hardware devices
Each access to the “Each access to the “devicedevice” involves an intercept, sent to ” involves an intercept, sent to the parent virtual machinethe parent virtual machine
Performance is sub-optimalPerformance is sub-optimal
Compatibility with existing software can be perfectCompatibility with existing software can be perfect
Microsoft provides emulationsMicrosoft provides emulationsThe hardware that is emulated is from ~1997, providingThe hardware that is emulated is from ~1997, providingin-box compatibility with old OSesin-box compatibility with old OSes
Requires a “Requires a “monitormonitor” partition that contains software for ” partition that contains software for emulating the devicesemulating the devices
Physical devices can be shared amongPhysical devices can be shared amongmultiple guestsmultiple guests
I/O EnlightenmentI/O Enlightenment
Uses abstract protocols to describe I/OUses abstract protocols to describe I/OUseful protocols already existUseful protocols already exist
SCSI, iSCSISCSI, iSCSIRNDISRNDISRDPRDP
New device stack implementations in theNew device stack implementations in thesecondary guests can be written that usesecondary guests can be written that usethese abstract protocolsthese abstract protocolsProtocol servers exist in a primary guestProtocol servers exist in a primary guest(parent), which is the partition that controls(parent), which is the partition that controlsthe physical devicesthe physical devicesMultiple secondary guests can share the servicesMultiple secondary guests can share the servicesof a single hardware deviceof a single hardware deviceDoesn’t require an emulatorDoesn’t require an emulatorDoesn’t require a monitor partitionDoesn’t require a monitor partition
Device AssignmentDevice Assignment
Guest OSes control their devices directlyGuest OSes control their devices directlyParent OS gives up control of these devicesParent OS gives up control of these devices
Ownership of a device is exclusiveOwnership of a device is exclusive
Performance can match that of aPerformance can match that of anon-virtualized machinenon-virtualized machine
Interdependence of partitions canInterdependence of partitions canbe minimizedbe minimized
Strong isolation of partitions can Strong isolation of partitions can be achievedbe achieved
Windows Virtualization Windows Virtualization Will ProvideWill Provide
Device emulationDevice emulationProvides migration path for Microsoft Virtual Provides migration path for Microsoft Virtual Server usersServer users
~1997 era virtual motherboard~1997 era virtual motherboard
Good for compatibility with old OSesGood for compatibility with old OSes
I/O enlightenmentI/O enlightenmentStorageStorage
NetworkingNetworking
VideoVideo
USBUSB
AgendaAgenda
General strategies for I/O virtualizationGeneral strategies for I/O virtualization
Overview ofOverview ofVirtual Device FrameworkVirtual Device Framework
Technical overview of VMBusTechnical overview of VMBus
Virtualization I/O DefinitionsVirtualization I/O Definitions
Virtual Device (VDev)Virtual Device (VDev)A software module that provides a point of configuration and control over A software module that provides a point of configuration and control over an I/O path for a partitionan I/O path for a partition
Virtualization Service Provider (VSP)Virtualization Service Provider (VSP)A server component (in a parent or other partition) that handlesA server component (in a parent or other partition) that handlesI/O requestsI/O requests
Can pass I/O requests on to native services like a file systemCan pass I/O requests on to native services like a file system
Can pass I/O requests directly to physical devicesCan pass I/O requests directly to physical devices
Can be in either kernel- or user-modeCan be in either kernel- or user-mode
Virtualization Service Consumer (VSC)Virtualization Service Consumer (VSC)A client component (in a child partition) which serves as the bottom of an A client component (in a child partition) which serves as the bottom of an I/O stack within that partitionI/O stack within that partition
Sends requests to a VSPSends requests to a VSP
VMBusVMBusA system for sending requests and data between virtual machinesA system for sending requests and data between virtual machines
Virtual Devices (VDevs)Virtual Devices (VDevs)
Come in two varietiesCome in two varietiesCore: Device emulatorsCore: Device emulators
Written by MicrosoftWritten by Microsoft
Plug-in: Enlightened I/OPlug-in: Enlightened I/OWritten by Microsoft and industryWritten by Microsoft and industry
Management is through WMIManagement is through WMI
Packaged as COM objectsPackaged as COM objectsRun within the VM Worker ProcessRun within the VM Worker Process
Often work in conjunction with a VSPOften work in conjunction with a VSP
VDev EnvironmentVDev EnvironmentVirtual Machine Worker Process
Core VDevs Emulated Devices & Hybrid Devices
Plug-InVDevs
CO
M
Configuration
Repositories
Sta
te
Cha
nge
s
CO
M
Virtual Motherboard
TimersVirtual Hardware
Guest Memory
IRQ Generation
Memory and Port-Mapped IO
Core Only
Core VDevs Emulated Devices & Hybrid Devices
Core Vdevs - Emulated Devices
To
VS
P
Plug-InVDevsPlug-InVDevs
To
VS
P
Activation
Save
Current Time
Create Timer
State Machine
Active
Powering Up
Saving
Powering Down
Sta
te
Cha
nge
s
Sta
te
Cha
nge
s
Virtualization Service Virtualization Service Providers (VSPs)Providers (VSPs)
Communicate with a VDev for Communicate with a VDev for configuration and state managementconfiguration and state management
Can exist in user- or kernel-modeCan exist in user- or kernel-modeCOM objectCOM object
ServiceService
DriverDriver
Use VMBus to communicateUse VMBus to communicatewith a VSC in the child partitionwith a VSC in the child partition
Example VSP/VSC DesignExample VSP/VSC DesignParent Child
User Mode
Kernel Mode
Viridian Virtualization Stack Worker Process
Image Parser
VMBUS
VirtualStorageMiniport(VSC)
iSCSIprt
Parition
Volume
File System
VM
SR
Bs
VirtualStorageServer(VSP)
Storport Miniport
StorPort
Hardware
Disk
FastPath filterV
M S
RB
s
User Mode
Kernel Mode
Parition
Volume
File System
Disk
AgendaAgenda
General strategies for I/O virtualizationGeneral strategies for I/O virtualization
Technical overview ofTechnical overview ofVirtual Device FrameworkVirtual Device Framework
Technical overview of VMBusTechnical overview of VMBus
VMBus – What Is It?VMBus – What Is It?
A protocol for transferring data through a ring bufferA protocol for transferring data through a ring bufferA means of mapping a ring buffer into multiple partitionsA means of mapping a ring buffer into multiple partitionsA definition for the format of the ring bufferA definition for the format of the ring bufferA means of signaling that a ring buffer has gone non-emptyA means of signaling that a ring buffer has gone non-empty
A protocol for offering/discovering servicesA protocol for offering/discovering servicesA protocol for managing guest physical addressesA protocol for managing guest physical addressesA protocol for enumerating WDM device objectsA protocol for enumerating WDM device objectsthat represent a data channelthat represent a data channelA bus driver which implements all of those protocolsA bus driver which implements all of those protocolsA data transfer library which can be linked intoA data transfer library which can be linked intoa user-mode service or applicationa user-mode service or applicationA data transfer library which can be linked intoA data transfer library which can be linked intoa kernel-mode drivera kernel-mode driver
VMBus DefinitionsVMBus Definitions
EndpointEndpointA module that reads or writes data through VMBusA module that reads or writes data through VMBus
ChannelChannelTwo endpoints – one server, one clientTwo endpoints – one server, one client
Two ring buffersTwo ring buffers
Transfer PageTransfer PagePre-allocated page of memory that is mappedPre-allocated page of memory that is mappedinto both endpoints’ partitionsinto both endpoints’ partitions
Not part of a ring bufferNot part of a ring buffer
Used as a target for DMA or for other operations that Used as a target for DMA or for other operations that may take a “long” time to completemay take a “long” time to complete
VMBus DefinitionsVMBus Definitions
Guest Physical Address DescriptorGuest Physical Address DescriptorList (GPADL)List (GPADL)
Memory descriptor list that can beMemory descriptor list that can bepassed to another partitionpassed to another partitionAllows a device to do DMA to or fromAllows a device to do DMA to or froma child partition directlya child partition directly
PipePipeA default channel protocol that allowsA default channel protocol that allowsa client to use ReadFile or WriteFile to send a client to use ReadFile or WriteFile to send data between partitionsdata between partitionsServes as the basis for cross-partition Serves as the basis for cross-partition Remote Procedure CallRemote Procedure Call
How Is Data Moved How Is Data Moved Between Partitions?Between Partitions?
Commands are placed in ring buffersCommands are placed in ring buffers
Small data is placed in ring buffersSmall data is placed in ring buffers
Larger data is placed in pre-arranged Larger data is placed in pre-arranged pages shared between partitionspages shared between partitions
Described by commands in ring buffersDescribed by commands in ring buffers
Largest data is mapped into another Largest data is mapped into another partition without copyingpartition without copying
Described by GPADLs placed inDescribed by GPADLs placed inring buffersring buffers
Hypervisor InvolvementHypervisor Involvement
When is it necessary?When is it necessary?Channel setupChannel setup
Signaling another partitionSignaling another partitionModeled as a hardware interruptModeled as a hardware interrupt
When is it not necessary?When is it not necessary?When placing packets in a ring bufferWhen placing packets in a ring buffer
When removing packets from a ring bufferWhen removing packets from a ring buffer
When reading or writing Transfer PagesWhen reading or writing Transfer Pages
When translating guest memory mapsWhen translating guest memory maps
Guest Physical Address SpaceGuest Physical Address SpaceGPADLsGPADLs
Allow transactions to refer to guest buffersAllow transactions to refer to guest buffersNo data copying requiredNo data copying required
Built within the Virtualization Stack in theBuilt within the Virtualization Stack in theparent partitionparent partitionAllows I/O to be handled without switchingAllows I/O to be handled without switchinginto and out of the hypervisorinto and out of the hypervisorAllows child partitions’ VSCs to use theirAllows child partitions’ VSCs to use theirown physical addresses in requests to VSPsown physical addresses in requests to VSPsAllows VSPs easy access to translationsAllows VSPs easy access to translations
Particularly if VSP is a driver in kernel-modeParticularly if VSP is a driver in kernel-modeTypical transaction can involve no hypercallsTypical transaction can involve no hypercalls
Request Packet StructureRequest Packet Structure
Parent Partition Child Partition
12
3 ApplicationBuffers
Header GPADL – Describes Application Buffers
Protocol – Device Specific
What Does Traffic Look Like?What Does Traffic Look Like?
VMBus underlying protocol isVMBus underlying protocol isvery simplevery simple
Packets are sent asynchronouslyPackets are sent asynchronouslyPrimitives exist to allow synchronizationPrimitives exist to allow synchronization
Packets have very little structurePackets have very little structurePacket may reference Transfer PagesPacket may reference Transfer Pages
Packet may reference a GPADLPacket may reference a GPADL
Other protocols must be definedOther protocols must be definedby the users of the channelby the users of the channel
Request Packet FlowRequest Packet Flow
Parent Partition Child Partition
VSCVSP
Request Packet FlowRequest Packet Flow
Parent Partition Child Partition
VSCVSP
InterruptInterruptthroughthrough
HypervisorHypervisor
Request Packet FlowRequest Packet Flow
Parent Partition Child Partition
VSCVSP
Data FlowData Flow
Parent Partition Child Partition
VSCVSP
Interrupt ManagementInterrupt Management
Can be sent between partitions to signal VSPCan be sent between partitions to signal VSPor VSC code to start runningor VSC code to start running
Avoids software pollingAvoids software polling
Cost of an interrupt is a hypercall and maybeCost of an interrupt is a hypercall and maybea partition context switcha partition context switchOnly necessary when VSP/VSC wouldn’t Only necessary when VSP/VSC wouldn’t already be runningalready be running
When ring buffer was previously emptyWhen ring buffer was previously emptyWhen ring buffer was previously fullWhen ring buffer was previously full
Multiple channels’ interrupts can be coalescedMultiple channels’ interrupts can be coalescedVMBus can track latency requirementsVMBus can track latency requirements
Allows requests to be batchedAllows requests to be batched
Bus DriverBus Driver
VMBus acts as a bus driverVMBus acts as a bus driver
It can form the bottom of a device stackIt can form the bottom of a device stack
VSCs can be instantiated on top of VMBusVSCs can be instantiated on top of VMBus
(Names of components (Names of components not finalized)not finalized)
Call To ActionCall To Action
Please attend the following session on Please attend the following session on Virtual Networking and StorageVirtual Networking and Storage
Participate in future Windows Server Participate in future Windows Server virtualization virtualization Beta programsBeta programs
© 2006 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions,
it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.