development of i/o pass-through: current status & the future

21
Development of I/O Pass Development of I/O Pass - - through: through: Current Status & the Future Current Status & the Future Nov 21, 2008 Nov 21, 2008 Yuji Shimada Yuji Shimada NEC System Technologies, Ltd. NEC System Technologies, Ltd.

Upload: others

Post on 07-Dec-2021

1 views

Category:

Documents


0 download

TRANSCRIPT

Development of I/O PassDevelopment of I/O Pass--through:through:Current Status & the FutureCurrent Status & the Future

Nov 21, 2008Nov 21, 2008

Yuji ShimadaYuji Shimada

NEC System Technologies, Ltd.NEC System Technologies, Ltd.

2 © NEC Corporation 2008

Xen Summit Tokyo 2008

Agenda

1.Implementation of I/O Pass-through

2.Future Enhancement Plan

3.Challenges for I/O Pass-through

4.Conclusion

1. Implementation of I/O Pass1. Implementation of I/O Pass--throughthrough

4 © NEC Corporation 2008

Xen Summit Tokyo 2008

What is I/O Pass-through

• Assigning I/O device to domain and directly use from guest software

Assign HBA → High performance access to storageAssign NIC → High performance access to network

In this presentation I will talk about I/O pass-through for HVM domain

HBA#1

HBA#2

Dual Port NIC

Assign

Domain #1

Domain #2

5 © NEC Corporation 2008

Xen Summit Tokyo 2008

Overview of I/O Pass-through(1)

• MMIO Virtualization– The mapping from guest physical address to machine

address is done per-page– Reassign Memory Resources to I/O Device

• If BIOS assigned non-page-aligned MMIO resource, then domain0 linux reassign page-aligned MMIO resource to I/O device

• Configuration Register Access Virtualization– ioemu traps configuration register access from guest

software, and emulates or pass-through as to bit type

Enhancement 1

Enhancement 2

6 © NEC Corporation 2008

Xen Summit Tokyo 2008

Summary of I/O Pass-through(2)

• DMA Virtualization– Device driver writes guest physical address to the I/O device– I/O device executes DMA by using guest physical address– IOMMU on chipset translates guest physical address to

machine address

• Port I/O Virtualization– Hypervisor traps IN/OUT instructions and accesses I/O

device instead of guest software

• Interrupt Virtualization– Hypervisor receives interrupt from I/O device, emulates

interrupt controller, and injects interrupt to HVM domain

7 © NEC Corporation 2008

Xen Summit Tokyo 2008

Enhancement 1 –Reassign Memory Resources to I/O Device

• Problem– If the resources assigned by BIOS is not page-aligned, assigning

I/O device to HVM domain fails

• Enhancement– Release resources set by BIOS, and reassign page-aligned

resources to I/O device – Add boot parameter to specify I/O device to reassign resources

We can assign I/O device to HVM domain regardless of the initial resource assignment by BIOS.

Error: pci: 0000:00:1d.0: non-page-aligned MMIO BAR found.

00:01.0

00:1d.1

pagesize

Memory Resource assigned by BIOS(Machine Address Space)

Reassigned Memory Resource(Machine Address Space)

10002000h

10003000h

10004000h

10002000h

10003000h

10004000h

Boot Xen

reassigndev=00:1d.0,00:1d.1,00:01.0

NoError

BootHVM Domain

DISK

Network

HBA

NIC

00:01.0

00:1d.0

00:1d.1

00:1d.0

8 © NEC Corporation 2008

Xen Summit Tokyo 2008

Enhancement 2 –Configuration Register Access Virtualization(1)

• Problem– I/O device which can be assigned to HVM domain was limited

because the registers other than Configuration Header Type 0 were read only

• Enhancement– Allow guest software to read/write access to the following registers,

and hide other registers• Configuration Header Type 0• MSI Capability Structure• MSI-X Capability Structure• PCI Express Capability Structure• PCI Power Management Capability Structure• Vital Product Data Capability Structure• Vendor Specific Capability Structure• Device Specific Register (excluding Header & Capability Structure)

9 © NEC Corporation 2008

Xen Summit Tokyo 2008

Enhancement 2 –Configuration Register Access Virtualization(2)

• Enhancement (cont.)– Pass-through the directly controllable bit by guest software– Emulate the bit that BIOS initial settings should be protected and

the bit which has special reasons

We can assign several types of I/O device, USB and SAS HBA etc. Protect the whole system from invalid guest software behavior.

PhysicalConfiguration Registers

Configuration Access by Guest Software

Emulated Configuration Registers

Pas

s-th

roug

h ty

pe

Em

ulat

e ty

pe

ioemu

I/O Device

31 16 15 0PCI Express Capability Structure

Root Status

Root ControlRoot Capabilities

Slot ControlSlot Status

Slot Capabilities

Link ControlLink Status

Link Capabilities

Device ControlDevice Status

Device Capabilities

Capability IDNext PointerPCI Express Capability Register

Capability Pointer + 20h…

Capability Pointer + 1Ch

Capability Pointer + 18h

Capability Pointer + 14h

Capability Pointer + 10h

Capability Pointer + 0Ch

Capability Pointer + 08h

Capability Pointer + 04h

offsetCapability Pointer

Emulate typePass-through type

2. Future Enhancement Plan2. Future Enhancement Plan

11 © NEC Corporation 2008

Xen Summit Tokyo 2008

Future Plan (1)

• Manage I/O device’s power from Guest software– Direct management with PMCSR (Power Management

Control/Status Register) of Device– Support D0, D1, D2, and D3hot

• Reduce interrupt redirection– On Xen 3.3.0, an interrupt is delivered to one processor

(vCPU#0)• If that processor is not the one specified by guest software,

then interrupt is redirected– Support interrupt delivery to processor specified by guest

software in order to reduce redirection

Save power of I/O device.

Improve performance of interrupt delivery.

low powerhigh power

12 © NEC Corporation 2008

Xen Summit Tokyo 2008

• Improve ioemu log– Attach date and pid information to the log of ioemu

• Support I/O pass-through using stub domain– Stub domain emulates I/O instead of domain0– Improve stub domain to enable I/O pass-through

Future Plan (2)

[2008-11-21 16:10:05 4567] can't store dev vc name for domid 1 in /parallel/0 from a stub domain[2008-11-21 16:10:06 4567] qemu_map_cache_init nr_buckets = 10000 size 3145728[2008-11-21 16:10:06 4567] shared page at pfn 1fffe

example

Make it easy to analyze the log when any troublehappens in a system.

Improve the scalability and make domain0 smaller.

Dom0 HVMDom

HVMDom

Hypervisor

Hardware

ioemuStub Dom

3. Challenges for I/O Pass3. Challenges for I/O Pass--throughthrough

14 © NEC Corporation 2008

Xen Summit Tokyo 2008

Issue 1 - Unsuitable Device for I/O Pass-through

• I/O device which accesses internal memory with physical address obtained from device driver does not work

• When we use I/O pass-through, physical address used by device driver is “guest physical address”. It is distinguished from machine address.

• On DMA, IOMMU translates guest physical address to machine address.

• On internal memory access, IOMMU does not translate addresses. It causes access failure.

I/O Device

Host InterfaceChipset

(IOMMU) PCIeProcessorBAR

Internal Bus

Internal Memory

Internal Processor

System Memory

Device Driver Pass Guest Physical Address

Some Interface

Access with Guest Physical Address

×

Access with Guest Physical Address

Access with Machine Address

Translate

Pass Guest Physical Address

DMA

Internal Memory Access

Firmware

15 © NEC Corporation 2008

Xen Summit Tokyo 2008

Issue 1 - Virtualization Hole Causing the Issue.

• No one translates guest physical address to machine address if I/O device uses guest physical address obtained from device driver for the purpose other than DMA

• This is common problem among IOMMU based Hypervisors

We expect the adapter vender to design I/O device suitable for I/O pass-through.

16 © NEC Corporation 2008

Xen Summit Tokyo 2008

Issue 2 – ioemu does not Support PCIe

• ioemu does not have following functions– MMCFG mechanism

• Accessing configuration registers by accessing memory space

• Needed to access configuration register at offset from 100h to FFFh

– Capability Structure from 100h to FFFhe.g. Device Serial Number Capability Structure

PCI Express VSEC Structure– Root Port

• Device with registers to control PCIe-specific function

– Advanced Error Reporting (AER)• Reporting PCIe error to software

– PCIe Hot-plug• It is newer mechanism than ACPI hot-plug

Guest software can’t access register or capability structure it needs

Guest software can’t recover PCIe errorGuest software has to use ACPI hot-plug

As a result…

Topology at guest software’s view is not PCIe machine

17 © NEC Corporation 2008

Xen Summit Tokyo 2008

Issue 2 - Development to Support PCIe

• First, add MMCFG mechanism(1) and Root Port emulator(2) into ioemu

• Then, implement others

MMCFG<256MB>

F000 0000h

0h

E000 0000h

Guest Physical Address Space

FFF.…..FFFh

(1)Host Bridge

ISA Bridge IDE Cont.

KB / MouseRTC / PIDMACI/O APICPIT / HPETCOM

VGA Cont.

ACPI Cont.

Emulated I/O Device

Assigned Processor / Memory / I/O

Legacy Device CD-ROM Display

Memory

Processor

Platform DeviceRoot Port

Passthrough Device

Virtual Machine Topology

(2)

It is difficult for one developer or a single team to do all the implementations. We need cooperation from all of you!

4. Conclusion4. Conclusion

19 © NEC Corporation 2008

Xen Summit Tokyo 2008

Conclusion

• I/O pass-through becomes practicable now

• We will continue developing the functions related to I/O pass-through

• There are still some issues for I/O pass-through

• We need many developers cooperation for progress

20 © NEC Corporation 2008

Xen Summit Tokyo 2008

Acknowledgements

• Thank you very much to all of you:

– All the people concerned with Xen Summit Tokyo 2008

– All the developers in Xen Community