oracle open source software...oracle open source software pvh: salient features (contd.) uses lot of...
TRANSCRIPT
Oracle Open Source SoftwareOracle Open Source Software
PVH : PV Guest in HVM containerPVH : PV Guest in HVM container
Mukesh RathorMukesh RathorSoftware EngineerSoftware EngineerOracle CorporationOracle Corporation
March 2014March 2014 The Linux Foundation Collaboration The Linux Foundation Collaboration Summit Summit
The following is intended to outline our general product direction. It is intended for
information purposes only, and may not be incorporated into any contract.
It is not a commitment to deliver any material, code, or functionality, and should not
be relied upon in making purchasing decisions. The development, release, and
timing of any features or functionality described for Oracle’s products remains at the
sole discretion of Oracle.
Oracle Open Source SoftwareOracle Open Source Software
PVH: A new guest type on XenPVH: A new guest type on Xen
Oracle Open Source SoftwareOracle Open Source Software
Before PVH :Before PVH :
● PV guest PV guest – Modified to run in a virtualized Modified to run in a virtualized
environmentenvironment
● HVM guestHVM guest– Unmodified (Boots on baremetal)Unmodified (Boots on baremetal)
Oracle Open Source SoftwareOracle Open Source Software
PV GUESTSPV GUESTS
Oracle Open Source SoftwareOracle Open Source Software
PV Guest :PV Guest :
● Modified to skip BIOS bootup Modified to skip BIOS bootup sequence, cpu bringup, and other sequence, cpu bringup, and other hardware initializationhardware initialization
● Context setup by Xen to start at special Context setup by Xen to start at special entry point in guestentry point in guest
● Does some initialization and meets Does some initialization and meets baremetal boot at start_kernelbaremetal boot at start_kernel
Oracle Open Source SoftwareOracle Open Source Software
PV Guest (contd):PV Guest (contd):
● Is a “translating” guest (as opposed to Is a “translating” guest (as opposed to auto_translated).auto_translated).
● Thus manages its list of mfns provided Thus manages its list of mfns provided to it by Xen via local p2mto it by Xen via local p2m
● p2m maps a pfn to a mfn.p2m maps a pfn to a mfn.● All PTE updates via hypercalls using All PTE updates via hypercalls using
mfnsmfns
Oracle Open Source SoftwareOracle Open Source Software
PV Guest (contd):PV Guest (contd):● Address space shared with XenAddress space shared with Xen● 32bit runs in ring 132bit runs in ring 1
– GDT uses DPL for segment privilege GDT uses DPL for segment privilege level (0 to 3)level (0 to 3)
● 64bit runs in ring 364bit runs in ring 3– No segmentation, hence use pagingNo segmentation, hence use paging– Paging doesn't distinguish levels 0-2, Paging doesn't distinguish levels 0-2,
hence ring 3hence ring 3
Oracle Open Source SoftwareOracle Open Source Software
PV Guest (contd):PV Guest (contd):
● IDT installed into Xen via hypercallIDT installed into Xen via hypercall
● Exceptions are received by Xen then Exceptions are received by Xen then delivered to guest via bounce framesdelivered to guest via bounce frames
● Interrupts are received by Xen then Interrupts are received by Xen then delivered to guest via event channeldelivered to guest via event channel
Oracle Open Source SoftwareOracle Open Source Software
PV Guest (contd):PV Guest (contd):
● Built-in PV frontend drivers pass info to Built-in PV frontend drivers pass info to driver backend via ring interfacedriver backend via ring interface
● Access is granted via grant table opsAccess is granted via grant table ops
● Communicates using event channelCommunicates using event channel
Oracle Open Source SoftwareOracle Open Source Software
X86 Virtualization Extensions :X86 Virtualization Extensions :
HVM : Hardware Virtual MachineHVM : Hardware Virtual Machine
● Provides protected environmentProvides protected environment● Guest kernel can run in any ringGuest kernel can run in any ring● VMCS maintains guest stateVMCS maintains guest state
● One per VCPUOne per VCPU● VMEXITs transfers control to XenVMEXITs transfers control to Xen
Oracle Open Source SoftwareOracle Open Source Software
HVM GUESTSHVM GUESTS
Oracle Open Source SoftwareOracle Open Source Software
HVM Guest :HVM Guest :
● An unmodified guest that runs in an An unmodified guest that runs in an HVM containerHVM container
● Boots and runs like on baremetalBoots and runs like on baremetal● hvmloader running in guest context hvmloader running in guest context
emulates firmware, then BIOS emulates firmware, then BIOS emulator runs before kernel entryemulator runs before kernel entry
● Hardware devices emulated by QEMU Hardware devices emulated by QEMU (running in dom0 or stubdomain)(running in dom0 or stubdomain)
Oracle Open Source SoftwareOracle Open Source Software
HVM Guest (contd):HVM Guest (contd):
● Gets virtualized E820 with pfnsGets virtualized E820 with pfns● Native MMUNative MMU● auto_translated guest. P2M managed auto_translated guest. P2M managed
by Xen.by Xen.● First ShadowFirst Shadow● Then HAPThen HAP
● IDT is native, hence APIC emulation by IDT is native, hence APIC emulation by XenXen
Oracle Open Source SoftwareOracle Open Source Software
HVM Guest (contd):HVM Guest (contd):
● PVHVM drivers add PV style frontend PVHVM drivers add PV style frontend drivers for networking and block drivers for networking and block devices, and also keyboard, console..devices, and also keyboard, console..
● Callback vector for event channel Callback vector for event channel delivery to avoid APIC emulationdelivery to avoid APIC emulation
● Exceptions delivered via VMCS event Exceptions delivered via VMCS event injection (could be ext intr, exception,..)injection (could be ext intr, exception,..)
Oracle Open Source SoftwareOracle Open Source Software
Drawbacks of PV and HVM GuestsDrawbacks of PV and HVM Guests
Oracle Open Source SoftwareOracle Open Source Software
Issues with PV guests:Issues with PV guests:
1. Performance1. Performance::● System calls overhead for 64bit guestsSystem calls overhead for 64bit guests
– Certain benchmarks upto 30% slowerCertain benchmarks upto 30% slower– System calls trap and return thru XenSystem calls trap and return thru Xen
● Exceptions, specially, page faultsExceptions, specially, page faults– overhead of going thru Xenoverhead of going thru Xen
Oracle Open Source SoftwareOracle Open Source Software
Issues with PV guests:Issues with PV guests:
2. Code Maintenance:2. Code Maintenance:
● PV MMUPV MMU– Lot of complex code in linux and Xen Lot of complex code in linux and Xen
to support PV MMU.to support PV MMU.● Uses lot of PV-OPS: pv_cpu_ops, Uses lot of PV-OPS: pv_cpu_ops,
pv_irq_ops, pv_mmu_ops,... pv_irq_ops, pv_mmu_ops,...
Oracle Open Source SoftwareOracle Open Source Software
Issues with HVM guests:Issues with HVM guests:
● QEMU/hvmloader neededQEMU/hvmloader needed– Syscall overhead from running this on Syscall overhead from running this on
64bit kernel in user process if on dom064bit kernel in user process if on dom0
● blktap driver: user space syscall blktap driver: user space syscall overheadoverhead
● Can't be booted as dom0Can't be booted as dom0
Oracle Open Source SoftwareOracle Open Source Software
PVH – A new guest type for XenPVH – A new guest type for Xen
Oracle Open Source SoftwareOracle Open Source Software
PVH: Salient FeaturesPVH: Salient Features
● Runs in an HVM container, hence ring 0Runs in an HVM container, hence ring 0
● Uses the PV entry point, thus skipping Uses the PV entry point, thus skipping BIOS emulations after hvmloader, BIOS emulations after hvmloader, hardware initializations, etc..hardware initializations, etc..
● Boots fasterBoots faster
● Enable via pvh=1 in guest config fileEnable via pvh=1 in guest config file
Oracle Open Source SoftwareOracle Open Source Software
PVH: Salient Features (PVH: Salient Features (contdcontd.).)● Uses lot of HVM/native code paths in Uses lot of HVM/native code paths in
linux, thus reducing most of PV-OPSlinux, thus reducing most of PV-OPS● Phase I : 64bit onlyPhase I : 64bit only
● Can't get rid of pv-ops in linux yetCan't get rid of pv-ops in linux yet● In Linux, is a PV guest, In Linux, is a PV guest, ieie, ,
xen_hvm_domain() would be false.xen_hvm_domain() would be false.● In Xen, new guest type,In Xen, new guest type,
enum guest_type {enum guest_type {
guest_type_pv, guest_type_pvh, guest_type_hvmguest_type_pv, guest_type_pvh, guest_type_hvm
Oracle Open Source SoftwareOracle Open Source Software
PVH – Technical DetailsPVH – Technical Details
Oracle Open Source SoftwareOracle Open Source Software
PVH MMU:PVH MMU:
● Native Page tablesNative Page tables● Is auto_translated guest Is auto_translated guest (P2M in xen)(P2M in xen)● populated with pfns instead of mfnspopulated with pfns instead of mfns● HAP only. No Shadow yetHAP only. No Shadow yet● Only pv_mmu_ops used is Only pv_mmu_ops used is
flush_tlb_others flush_tlb_others (no need to IPI non running vcpus)(no need to IPI non running vcpus)
● IO space is mapped 1:1 in the IO space is mapped 1:1 in the EPT/NPT for dom0.EPT/NPT for dom0.
Oracle Open Source SoftwareOracle Open Source Software
PVH Interrupts:PVH Interrupts:
● Native IDT Native IDT ● no bounce framesno bounce frames
● Uses event channel, so no APIC Uses event channel, so no APIC emulationemulation
● All native pv_irq_ops (no xen_irq_ops)All native pv_irq_ops (no xen_irq_ops)● Xen honors guest EFLAGS.IF. Xen honors guest EFLAGS.IF. Hence, xen_irq_disable not needed Hence, xen_irq_disable not needed to mask event channelto mask event channel
Oracle Open Source SoftwareOracle Open Source Software
PVH Interrupts (PVH Interrupts (contdcontd):):
● Evtchn handler at vector 0xf3/243Evtchn handler at vector 0xf3/243● xen sets bit corresponding to INT xen sets bit corresponding to INT vector in shared vcpu infovector in shared vcpu info
● xen injects event oxf3 via VMCS xen injects event oxf3 via VMCS ● Guest reads bitmap for the actual Guest reads bitmap for the actual INT vector numberINT vector number
● Calls appropriate interrupt handlerCalls appropriate interrupt handler
Oracle Open Source SoftwareOracle Open Source Software
PVH Exceptions:PVH Exceptions:
● Most are handled by guest itselfMost are handled by guest itself● Thus, xen doesn't trap page faultsThus, xen doesn't trap page faults
● Page FaultsPage Faults● Native handler runs and does Native handler runs and does normal pte update from pfnnormal pte update from pfn
● Invalid pfn causes HAP fault in Xen Invalid pfn causes HAP fault in Xen and guest is killedand guest is killed
Oracle Open Source SoftwareOracle Open Source Software
PVH cpu ops:PVH cpu ops:
● Uses pvclock mechanismUses pvclock mechanism● No rtc or emulated timersNo rtc or emulated timers
● pv_cpu_ops are all nativepv_cpu_ops are all native● .read_cr0/4, .read_tsc, .set_gdt, ....read_cr0/4, .read_tsc, .set_gdt, ...● GDT, LDT, etc.. managed locallyGDT, LDT, etc.. managed locally● No TSC emulation at this timeNo TSC emulation at this time● native cpuidnative cpuid
Oracle Open Source SoftwareOracle Open Source Software
PVH grant table:PVH grant table:
● Guest creates itself during boot like Guest creates itself during boot like HVMHVM
● pfn space gotten from ballooning down pfn space gotten from ballooning down guest memory, instead of ioremapguest memory, instead of ioremap
● Then common code path with HVM to Then common code path with HVM to ask Xen to update p2mask Xen to update p2m
Oracle Open Source SoftwareOracle Open Source Software
PVH support in Xen:PVH support in Xen:
● pvh shares HVM data structurespvh shares HVM data structures● Creation path shares hvm paths to Creation path shares hvm paths to
create VMCScreate VMCS● Few fields needed and honored for Few fields needed and honored for
initial guest context:initial guest context:● ctrlreg[3], GPRs (no selectors), ctrlreg[3], GPRs (no selectors), debugreg. debugreg.
● Rest must be zeroed. Rest must be zeroed.
Oracle Open Source SoftwareOracle Open Source Software
PVH support in xen (PVH support in xen (contdcontd):):
● Guest started in protected mode with Guest started in protected mode with paging, PAE, and long mode with 64bit paging, PAE, and long mode with 64bit sub modesub mode
● Default power on vcpu state set is:Default power on vcpu state set is:● CR0: PE PG TS ET NECR0: PE PG TS ET NE● CR4: PAE MCE VMXECR4: PAE MCE VMXE● IA32e: Long mode (VMCS field)IA32e: Long mode (VMCS field)● CS.L(seg desc): 64bit code (CS.L(seg desc): 64bit code (compat latercompat later))
Oracle Open Source SoftwareOracle Open Source Software
PVH guest startup:PVH guest startup:
● starts with minimal contextstarts with minimal context● cr3, rip, cs = 0cr3, rip, cs = 0
● seg descriptors cached in vmcs, gdt nullseg descriptors cached in vmcs, gdt null● First thing guest must do upon boot:First thing guest must do upon boot:
● Initialize the GDTInitialize the GDT● Selectors CS, GS, FS, ....Selectors CS, GS, FS, ....● Any other CR featuresAny other CR features
Oracle Open Source SoftwareOracle Open Source Software
PVH guest features:PVH guest features:
● Required features for PVH:Required features for PVH:● XENFEAT_writable_page_tablesXENFEAT_writable_page_tables● XENFEAT_auto_translated_physmapXENFEAT_auto_translated_physmap● XENFEAT_supervisor_mode_kernelXENFEAT_supervisor_mode_kernel● XENFEAT_hvm_callback_vectorXENFEAT_hvm_callback_vector
Oracle Open Source SoftwareOracle Open Source Software
PVH – PerformancePVH – Performance
Oracle Open Source SoftwareOracle Open Source Software
LMBENCH 2.0:Host: 2 CPUs. PV dom0: 1 vcpu, 1.2 GB. guest: 1 vcpu, 2GB
Processor, Processes times in microseconds smaller is betterHost OS Mhz null null open selct sig sig fork exec sh call I/O stat clos TCP inst hndl proc proc proc PV Linux 3.14.0 2638 0.46 0.70 2.14 4.07 5.077 0.55 2.45 367. 1036 3272HVM Linux 3.14.0 2638 0.06 0.14 0.77 1.81 4.545 0.15 3.39 86.9 349. 1475PVH Linux 3.14.0 2638 0.06 0.14 0.78 1.94 4.505 0.16 3.17 109. 403. 1620
Oracle Open Source SoftwareOracle Open Source Software
PVH: Conclusion
Oracle Open Source SoftwareOracle Open Source Software
PVH In a nutshell:
● Reduced hypercalls● Significantly reduced pv-ops● Native Syscalls ● No PV MMU maintenance
Oracle Open Source SoftwareOracle Open Source Software
PVH: Status
Oracle Open Source SoftwareOracle Open Source Software
Where are we:
● DomU support in xen 4.4● Linux changes upstream● Working on dom0 pvh support now
● Majority changes on Xen side● Small changes on linux
Oracle Open Source SoftwareOracle Open Source Software
PVH: Upcoming
Oracle Open Source SoftwareOracle Open Source Software
PVH: more to doPVH: more to do
● 32bit support. 32bit support. ● AMD port.AMD port.● Live migration, save/restore.Live migration, save/restore.● PCI passthruPCI passthru● Enhancements like tsc emulation, pirq Enhancements like tsc emulation, pirq
eoi, ...eoi, ...
Oracle Open Source SoftwareOracle Open Source Software
PVH: Performance fine tuningPVH: Performance fine tuning
● Goal is to have best performance of PV Goal is to have best performance of PV and HVMand HVM
● fork/exec lmbench for examplefork/exec lmbench for example● Future optimizations, like delivering Future optimizations, like delivering
interrupt directly to dom0 etc...interrupt directly to dom0 etc...
Oracle Open Source SoftwareOracle Open Source Software
The End