an updated overview of the qemu storage stack · an updated overview of the qemu storage stack...

26
Linux is a registered trademark of Linus Torvalds. An Updated Overview of the QEMU Storage Stack Stefan Hajnoczi – [email protected] Open Virtualization IBM Linux Technology Center 2011

Upload: others

Post on 27-Jun-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: An Updated Overview of the QEMU Storage Stack · An Updated Overview of the QEMU Storage Stack Stefan Hajnoczi – stefanha@linux.vnet.ibm.com Open Virtualization IBM Linux Technology

Linux is a registered trademark of Linus Torvalds. 

An Updated Overview of the QEMU Storage Stack

Stefan Hajnoczi – [email protected] VirtualizationIBM Linux Technology Center

2011

Page 2: An Updated Overview of the QEMU Storage Stack · An Updated Overview of the QEMU Storage Stack Stefan Hajnoczi – stefanha@linux.vnet.ibm.com Open Virtualization IBM Linux Technology

The topic

● What is the QEMU storage stack?● Configuring the storage stack● Recent and future developments

– “Cautionary statement regarding forward-looking statements”

Page 3: An Updated Overview of the QEMU Storage Stack · An Updated Overview of the QEMU Storage Stack Stefan Hajnoczi – stefanha@linux.vnet.ibm.com Open Virtualization IBM Linux Technology

QEMU and its uses

● “QEMU is a generic and open source machine emulator and virtualizer”

– http://www.qemu.org/● Emulation:

– For cross-compilation, development environments

– Android Emulator, shipping in an Android SDK near you

● Virtualization:– KVM and Xen use QEMU device emulation

Page 4: An Updated Overview of the QEMU Storage Stack · An Updated Overview of the QEMU Storage Stack Stefan Hajnoczi – stefanha@linux.vnet.ibm.com Open Virtualization IBM Linux Technology

Storage in QEMU● Devices and media:

– Floppy, CD-ROM, USB stick, SD card, harddisk

● Host storage:– Flat files (img, iso)

● Also over NFS– CD-ROM host device (/dev/cdrom)– Block devices (/dev/sda3, LVM volumes,

iSCSI LUNs)– Distributed storage (Sheepdog, Ceph)

Page 5: An Updated Overview of the QEMU Storage Stack · An Updated Overview of the QEMU Storage Stack Stefan Hajnoczi – stefanha@linux.vnet.ibm.com Open Virtualization IBM Linux Technology

QEMU -drive optionqemu -drive

if=ide|virtio|scsi,file=path/to/img,cache=writethrough|writeback|none|unsafe

● Storage interface is set with if=● Path to image file or device is set with path=● Caching mode is set with cache=

● More on what this means later, but first the picture of the overall storage stack...

Page 6: An Updated Overview of the QEMU Storage Stack · An Updated Overview of the QEMU Storage Stack Stefan Hajnoczi – stefanha@linux.vnet.ibm.com Open Virtualization IBM Linux Technology

The QEMU storage stack

Application

File system & block layer

Driver

Hardware emulation

Image format (optional)

File system & block layer

Driver

•Application and guest kernel work similar to bare metal.•Guest talks to QEMU via emulated hardware.

•QEMU performs I/O to an image file on behalf of the guest.•Host kernel treats guest I/O like any userspace application.

Guest QEMU Host

Page 7: An Updated Overview of the QEMU Storage Stack · An Updated Overview of the QEMU Storage Stack Stefan Hajnoczi – stefanha@linux.vnet.ibm.com Open Virtualization IBM Linux Technology

Seeing double● There may be two file systems. The guest file

system and the host file system (which holds the image file).

● There may be two volume managers. The guest and host can both use LVM and md independently.

● There are two page caches. Both guest and host can buffer pages from a file.

● There are two I/O schedulers. The guest will reorder or delay I/O but the host will too.

● Configuring either the guest or the host to bypass these layers typically leads to best performance.

Page 8: An Updated Overview of the QEMU Storage Stack · An Updated Overview of the QEMU Storage Stack Stefan Hajnoczi – stefanha@linux.vnet.ibm.com Open Virtualization IBM Linux Technology

Emulated storage overview

Application

File system & block layer

Driver

Hardware emulation

Image format (optional)

File system & block layer

Driver

Guest QEMU Host

Page 9: An Updated Overview of the QEMU Storage Stack · An Updated Overview of the QEMU Storage Stack Stefan Hajnoczi – stefanha@linux.vnet.ibm.com Open Virtualization IBM Linux Technology

Emulated storage● QEMU presents emulated storage interfaces to

the guest● Virtio is a paravirtualized storage interface,

delivers the best performance, and is extensible for the future

– One virtio-blk PCI adapter per block device● IDE emulation is used for CD-ROMs and is also

available for disks– Good guest compatibility but low

performance● SCSI emulation can be used for special

applications but is still under development

Page 10: An Updated Overview of the QEMU Storage Stack · An Updated Overview of the QEMU Storage Stack Stefan Hajnoczi – stefanha@linux.vnet.ibm.com Open Virtualization IBM Linux Technology

Emulated storage in the future

● SATA (AHCI) emulation– Currently experimental– Promises better performance than IDE– Relatively wide compatibility

● Renewed focus on SCSI– Patches to make SCSI emulation robust

continue to come in, though slowly– Virtio-scsi is being prototyped– Industry standard, rich features

Page 11: An Updated Overview of the QEMU Storage Stack · An Updated Overview of the QEMU Storage Stack Stefan Hajnoczi – stefanha@linux.vnet.ibm.com Open Virtualization IBM Linux Technology

Host page cache overview

Application

File system & block layer

Driver

Hardware emulation

Image format (optional)

File system & block layer

Driver

Guest QEMU Host

Page 12: An Updated Overview of the QEMU Storage Stack · An Updated Overview of the QEMU Storage Stack Stefan Hajnoczi – stefanha@linux.vnet.ibm.com Open Virtualization IBM Linux Technology

Host page cache● Writes complete after copying data to page

cache● Cache is flushed on fsync(2)● Reads may be satisfied from the cache● Guest has its own page cache

– Two copies of data in memory● Disabling host page cache:

– O_DIRECT I/O on the host– Bypasses host page cache when possible– Zero-copy when possible

Page 13: An Updated Overview of the QEMU Storage Stack · An Updated Overview of the QEMU Storage Stack Stefan Hajnoczi – stefanha@linux.vnet.ibm.com Open Virtualization IBM Linux Technology

Guest disk write cache overview

Application

File system & block layer

Driver

Hardware emulation

Image format (optional)

File system & block layer

Driver

Guest QEMU Host

Page 14: An Updated Overview of the QEMU Storage Stack · An Updated Overview of the QEMU Storage Stack Stefan Hajnoczi – stefanha@linux.vnet.ibm.com Open Virtualization IBM Linux Technology

Guest disk write cache● Disk completes writes after they reach cache

– Data may not be on disk● Volatile disk write cache loses contents on

power failure– Correct applications fsync(2) to guarantee

data is on disk● When write cache is disabled:

– Writes complete when they are on disk– Write performance is reduced

● Enabling write cache:– Improves write performance– Only ensures data integrity if applications

and storage stack flush cache correctly

Page 15: An Updated Overview of the QEMU Storage Stack · An Updated Overview of the QEMU Storage Stack Stefan Hajnoczi – stefanha@linux.vnet.ibm.com Open Virtualization IBM Linux Technology

Caching modes in QEMUMode Host page cache Guest disk write cache

none off onwritethrough on off

writeback on onunsafe on ignored

● Default is writethrough● Unsafe is a new mode that ignores cache flush

operations– Only use for temporary data– Useful for speeding up guest installs– Switch to another mode for production

Page 16: An Updated Overview of the QEMU Storage Stack · An Updated Overview of the QEMU Storage Stack Stefan Hajnoczi – stefanha@linux.vnet.ibm.com Open Virtualization IBM Linux Technology

Caching modes in the future● Guest control over disk write cache (WCE)

– Real disks allow WCE toggling at runtime– Lets guest determine whether to enable

● Useful for hosting or cloud environments● Ability to change host page cache option at

runtime– Today QEMU requires restart to change host

page cache

Page 17: An Updated Overview of the QEMU Storage Stack · An Updated Overview of the QEMU Storage Stack Stefan Hajnoczi – stefanha@linux.vnet.ibm.com Open Virtualization IBM Linux Technology

Image formats overview

Application

File system & block layer

Driver

Hardware emulation

Image format (optional)

File system & block layer

Driver

Guest QEMU Host

Page 18: An Updated Overview of the QEMU Storage Stack · An Updated Overview of the QEMU Storage Stack Stefan Hajnoczi – stefanha@linux.vnet.ibm.com Open Virtualization IBM Linux Technology

Image formats● Supported image formats:

– QCOW2, QED – QEMU– VMDK – VMware– VHD – Microsoft– VDI – VirtualBox

● Features that various image formats provide:– Sparse images– Backing files (delta images)– Encryption– Compression– Snapshots

Page 19: An Updated Overview of the QEMU Storage Stack · An Updated Overview of the QEMU Storage Stack Stefan Hajnoczi – stefanha@linux.vnet.ibm.com Open Virtualization IBM Linux Technology

How image formats work

● Map logical block addresses to file offsets● Apply transformations on data (compression,

encryption)

Metadata Data

Block device

Image file

I/O from guest

1. Map/allocate 2. Transfer data

Page 20: An Updated Overview of the QEMU Storage Stack · An Updated Overview of the QEMU Storage Stack Stefan Hajnoczi – stefanha@linux.vnet.ibm.com Open Virtualization IBM Linux Technology

Manipulating image files● Only raw image files can be loopback mounted

– Use qemu-nbd to access image files on host● http://tinyurl.com/qemu-nbd

– Or use the powerful libguestfs:● Http://libguestfs.org/

● Convert image formats with qemu-img– Qemu-img is the Rosetta Stone of image

formats– Supports all image formats that QEMU does– Stand-alone program, can be used without

installing QEMU

Page 21: An Updated Overview of the QEMU Storage Stack · An Updated Overview of the QEMU Storage Stack Stefan Hajnoczi – stefanha@linux.vnet.ibm.com Open Virtualization IBM Linux Technology

Image formats in the future● Improving VMDK compatibility

– Adding support for latest file format versions– Google Summer of Code 2011 project

● QCOW2<->QED in-place conversion– Convert formats without copying data– Google Summer of Code 2011 project

● QED image streaming– Start new guest immediately, populate data

from backing file as it runs● QCOW2v3

– Currently in design phase– Enhance format with new ideas and address

pain points

Page 22: An Updated Overview of the QEMU Storage Stack · An Updated Overview of the QEMU Storage Stack Stefan Hajnoczi – stefanha@linux.vnet.ibm.com Open Virtualization IBM Linux Technology

Recommendations

● Emulated storage interface:– Virtio for Linux and Windows guests– IDE when virtio is not possible

● Caching mode:– cache=none for local storage

● Host storage:– LVM if flexibility of image files not needed– Raw image files if features not needed– QCOW2 or QED if more features are

required– Vmdk and others convert to native format

Page 23: An Updated Overview of the QEMU Storage Stack · An Updated Overview of the QEMU Storage Stack Stefan Hajnoczi – stefanha@linux.vnet.ibm.com Open Virtualization IBM Linux Technology

Summary● There are many layers to the storage stack

– Some layers are optional– Choose what you need

● Defaults: IDE storage interface and writethrough cache mode

– Conservative and compatible– Consider virtio-blk and none cache mode

● Image formats can be tamed with qemu-img, qemu-nbd, and libguestfs

Page 24: An Updated Overview of the QEMU Storage Stack · An Updated Overview of the QEMU Storage Stack Stefan Hajnoczi – stefanha@linux.vnet.ibm.com Open Virtualization IBM Linux Technology

Questions?

Blog: http://blog.vmsplice.net/

Page 25: An Updated Overview of the QEMU Storage Stack · An Updated Overview of the QEMU Storage Stack Stefan Hajnoczi – stefanha@linux.vnet.ibm.com Open Virtualization IBM Linux Technology

QEMU Architecture● Each guest CPU has a

dedicated vcpu thread that uses the kvm.ko module to execute guest code.

● There is an I/O thread that runs a select(2) loop to handle events.kvm.ko

vcpu0 vcpu1I/O

thread

qemu-kvm

Linux

● Only one thread may be executing QEMU code at any given time. This excludes guest code and blocking in select(2).

Page 26: An Updated Overview of the QEMU Storage Stack · An Updated Overview of the QEMU Storage Stack Stefan Hajnoczi – stefanha@linux.vnet.ibm.com Open Virtualization IBM Linux Technology

Virtio-blk request lifecycle

● Request/response data and metadata live in guest memory.

● Virtqueue kick is a pio write to a virtio PCI hardware register.

● Completion is signaled by virtio PCI interrupt.

Data

2. Virtqueue kick 5. Interrupt3. DMA

Vring

1. Publish req

Vring

4. Publish resp