nesting virtual machines in virtualization test …nyh/nested/thesis_olivierbergh...nesting virtual...

99
Nesting Virtual Machines in Virtualization Test Frameworks Dissertation submitted on May 2010 to the Department of Mathematics and Computer Science of the Faculty of Sciences, University of Antwerp, in partial fulfillment of the requirements for the degree of Master of Science. Supervisor: prof. Dr. Jan Broeckhove Co-supervisor: Dr. Kurt Vanmechelen Mentors: Sam Verboven & Ruben Van den Bossche Olivier Berghmans Research Group Computational Modelling and Programming

Upload: others

Post on 25-Jul-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Nesting Virtual Machines in Virtualization Test …nyh/nested/Thesis_OlivierBergh...Nesting Virtual Machines in Virtualization Test Frameworks Dissertation submitted on May 2010 to

Nesting Virtual Machines inVirtualization Test Frameworks

Dissertation submitted on May 2010 to theDepartment of Mathematics and Computer Scienceof the Faculty of Sciences, University of Antwerp,in partial fulfillment of the requirementsfor the degree of Master of Science.

Supervisor: prof. Dr. Jan BroeckhoveCo-supervisor: Dr. Kurt Vanmechelen

Mentors: Sam Verboven & Ruben Van den Bossche

Olivier Berghmans

Research Group ComputationalModelling and Programming

Page 2: Nesting Virtual Machines in Virtualization Test …nyh/nested/Thesis_OlivierBergh...Nesting Virtual Machines in Virtualization Test Frameworks Dissertation submitted on May 2010 to

Contents

List of Figures iv

List of Tables vi

Nederlandstalige samenvatting vii

Preface viii

Abstract x

1 Introduction 11.1 Goals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

2 Virtualization 32.1 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42.2 Taxonomy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2.2.1 Process virtual machines . . . . . . . . . . . . . . . . . . . . . 62.2.2 System virtual machines . . . . . . . . . . . . . . . . . . . . . 7

2.3 x86 architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92.3.1 Formal requirements . . . . . . . . . . . . . . . . . . . . . . . . 92.3.2 The x86 protection level architecture . . . . . . . . . . . . . . . 112.3.3 The x86 architecture problem . . . . . . . . . . . . . . . . . . . 11

3 Evolution of virtualization for the x86 architecture 133.1 Dynamic binary translation . . . . . . . . . . . . . . . . . . . . . . . . 13

3.1.1 System calls . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153.1.2 I/O virtualization . . . . . . . . . . . . . . . . . . . . . . . . . 153.1.3 Memory management . . . . . . . . . . . . . . . . . . . . . . . 16

3.2 Paravirtualization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

i

Page 3: Nesting Virtual Machines in Virtualization Test …nyh/nested/Thesis_OlivierBergh...Nesting Virtual Machines in Virtualization Test Frameworks Dissertation submitted on May 2010 to

3.2.1 System calls . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183.2.2 I/O virtualization . . . . . . . . . . . . . . . . . . . . . . . . . 183.2.3 Memory management . . . . . . . . . . . . . . . . . . . . . . . 19

3.3 First generation hardware support . . . . . . . . . . . . . . . . . . . . 193.4 Second generation hardware support . . . . . . . . . . . . . . . . . . . 223.5 Current and future hardware support . . . . . . . . . . . . . . . . . . . 233.6 Virtualization software . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

3.6.1 VirtualBox . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243.6.2 VMware . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243.6.3 Xen . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253.6.4 KVM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253.6.5 Comparison between virtualization software . . . . . . . . . . . 26

4 Nested virtualization 284.1 Dynamic binary translation . . . . . . . . . . . . . . . . . . . . . . . . 304.2 Paravirtualization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 324.3 Hardware supported virtualization . . . . . . . . . . . . . . . . . . . . . 33

5 Nested virtualization in Practice 345.1 Software solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

5.1.1 Dynamic binary translation . . . . . . . . . . . . . . . . . . . . 365.1.2 Paravirtualization . . . . . . . . . . . . . . . . . . . . . . . . . 385.1.3 Overview software solutions . . . . . . . . . . . . . . . . . . . . 40

5.2 First generation hardware support . . . . . . . . . . . . . . . . . . . . 405.2.1 Dynamic binary translation . . . . . . . . . . . . . . . . . . . . 425.2.2 Paravirtualization . . . . . . . . . . . . . . . . . . . . . . . . . 435.2.3 Hardware supported virtualization . . . . . . . . . . . . . . . . 445.2.4 Overview first generation hardware support . . . . . . . . . . . 45

5.3 Second generation hardware support . . . . . . . . . . . . . . . . . . . 465.3.1 Dynamic binary translation . . . . . . . . . . . . . . . . . . . . 475.3.2 Paravirtualization . . . . . . . . . . . . . . . . . . . . . . . . . 485.3.3 Hardware supported virtualization . . . . . . . . . . . . . . . . 485.3.4 Overview second generation hardware support . . . . . . . . . . 49

5.4 Nested hardware support . . . . . . . . . . . . . . . . . . . . . . . . . 505.4.1 KVM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 515.4.2 Xen . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

6 Performance results 566.1 Processor performance . . . . . . . . . . . . . . . . . . . . . . . . . . 576.2 Memory performance . . . . . . . . . . . . . . . . . . . . . . . . . . . 586.3 I/O performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

6.3.1 Network I/O . . . . . . . . . . . . . . . . . . . . . . . . . . . . 616.3.2 Disk I/O . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

6.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

ii

Page 4: Nesting Virtual Machines in Virtualization Test …nyh/nested/Thesis_OlivierBergh...Nesting Virtual Machines in Virtualization Test Frameworks Dissertation submitted on May 2010 to

7 Conclusions 667.1 Nested virtualization and performance results . . . . . . . . . . . . . . 667.2 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

Appendices 72

Appendix A Virtualization software 73A.1 VirtualBox . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73

Appendix B Details of the nested virtualization in practice 76B.1 Dynamic binary translation . . . . . . . . . . . . . . . . . . . . . . . . 76

B.1.1 VirtualBox . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76B.1.2 VMware Workstation . . . . . . . . . . . . . . . . . . . . . . . 78

B.2 Paravirtualization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79B.3 First generation hardware support . . . . . . . . . . . . . . . . . . . . 80

B.3.1 Dynamic binary translation . . . . . . . . . . . . . . . . . . . . 80B.3.2 Paravirtualization . . . . . . . . . . . . . . . . . . . . . . . . . 82

B.4 Second generation hardware support . . . . . . . . . . . . . . . . . . . 83B.4.1 Dynamic binary translation . . . . . . . . . . . . . . . . . . . . 84B.4.2 Paravirtualization . . . . . . . . . . . . . . . . . . . . . . . . . 84

B.5 KVM’s nested SVM support . . . . . . . . . . . . . . . . . . . . . . . 84

Appendix C Details of the performance tests 86C.1 sysbench . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86C.2 iperf . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88C.3 iozone . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88

iii

Page 5: Nesting Virtual Machines in Virtualization Test …nyh/nested/Thesis_OlivierBergh...Nesting Virtual Machines in Virtualization Test Frameworks Dissertation submitted on May 2010 to

List of Figures

2.1 Implementation layers in a computer system. . . . . . . . . . . . . . 52.2 Taxonomy of virtual machines. . . . . . . . . . . . . . . . . . . . . . 82.3 The x86 protection levels. . . . . . . . . . . . . . . . . . . . . . . . . 11

3.1 Memory management in x86 virtualization using shadow tables. . . . 173.2 Execution flow using virtualization based on Intel VT-x. . . . . . . . 203.3 Latency reductions by CPU implementation [30]. . . . . . . . . . . . 21

4.1 Layers in a nested virtualization setup with hosted hypervisors. . . . 294.2 Memory architecture in a nested situation. . . . . . . . . . . . . . . . 31

5.1 Layers for nested paravirtualization in dynamic binary translation. . 375.2 Layers for nested Xen paravirtualization. . . . . . . . . . . . . . . . . 395.3 Layers for nested dynamic binary translation in paravirtualization. . 395.4 Layers for nested dynamic binary translation in a hypervisor based

on hardware support. . . . . . . . . . . . . . . . . . . . . . . . . . . . 425.5 Layers for nested paravirtualization in a hypervisor based on hard-

ware support. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 445.6 Nested virtualization architecture based on hardware support. . . . . 535.7 Execution flow in nested virtualization based on hardware support. . 54

6.1 CPU performance for native with four cores and L1 guest with onecore. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

6.2 CPU performance for native, L1 and L2 guest with four cores. . . . 596.3 CPU performance for L1 and L2 guests with one core. . . . . . . . . 606.4 Memory performance for L1 and L2 guests. . . . . . . . . . . . . . . 616.5 Threads performance for native, L1 guests and L2 guests with sys-

bench benchmark. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 626.6 Network performance for native, L1 guests and L2 guests. . . . . . . 63

iv

Page 6: Nesting Virtual Machines in Virtualization Test …nyh/nested/Thesis_OlivierBergh...Nesting Virtual Machines in Virtualization Test Frameworks Dissertation submitted on May 2010 to

6.7 File I/O performance for native, L1 guests and L2 guests with sys-bench benchmark. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

6.8 File I/O performance for native, L1 guests and L2 guests with iozonebenchmark. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

v

Page 7: Nesting Virtual Machines in Virtualization Test …nyh/nested/Thesis_OlivierBergh...Nesting Virtual Machines in Virtualization Test Frameworks Dissertation submitted on May 2010 to

List of Tables

3.1 Comparison between a selection of the most popular hypervisors. . . 27

5.1 Index table containing directions in which subsections informationcan be found about a certain nested setup. . . . . . . . . . . . . . . 35

5.2 The nesting setups with dynamic binary translation as the L1 hyper-visor technique. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

5.3 The nesting setups with paravirtualization as the L1 hypervisor tech-nique. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

5.4 Overview of the nesting setups with a software solution as the L1hypervisor technique. . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

5.5 The nesting setups with first generation hardware support as the L1hypervisor technique and DBT as the L2 hypervisor technique. . . . 43

5.6 The nesting setups with first generation hardware support as the L1hypervisor technique and PV as the L2 hypervisor technique. . . . . 44

5.7 The nesting setups with first generation hardware support as the L1and L2 hypervisor technique. . . . . . . . . . . . . . . . . . . . . . . 45

5.8 Overview of the nesting setups with first generation hardware supportas the L1 hypervisor technique. . . . . . . . . . . . . . . . . . . . . . 46

5.9 The nesting setups with second generation hardware support as theL1 hypervisor technique and DBT as the L2 hypervisor technique. . 48

5.10 The nesting setups with second generation hardware support as theL1 hypervisor technique and PV as the L2 hypervisor technique. . . 49

5.11 The nesting setups with first generation hardware support as the L1and L2 hypervisor technique. . . . . . . . . . . . . . . . . . . . . . . 49

5.12 Overview of the nesting setups with second generation hardware sup-port as the L1 hypervisor technique. . . . . . . . . . . . . . . . . . . 50

5.13 Overview of all nesting setups. . . . . . . . . . . . . . . . . . . . . . 55

vi

Page 8: Nesting Virtual Machines in Virtualization Test …nyh/nested/Thesis_OlivierBergh...Nesting Virtual Machines in Virtualization Test Frameworks Dissertation submitted on May 2010 to

Nederlandstalige samenvatting

Virtualisatie is uitgegroeid tot een wijdverspreide technologie die gebruikt wordtom computing resources te abstraheren, te combineren of op te delen. Verzoekenvoor deze resources zijn op deze manier minimaal afhankelijk van de onderliggendefysieke laag. De x86 architectuur is niet speciaal ontworpen voor virtualisatie enbevat een aantal niet-virtualiseerbare instructies. Verschillende software-oplossingenen hardware-ondersteuning hebben hier voor een oplossing gezorgd. Het groeiendaantal toepassingen zorgt ervoor dat gebruikers steeds meer wensen virtualisatiete hanteren. Onder andere de noodzaak voor volledige fysieke opstellingen vooronderzoeksdoeleinden kan vermeden worden door het gebruik van virtualisatie. Omcomponenten, die mogelijk zelf virtualisatie gebruiken, te kunnen virtualiseren, moethet mogelijk zijn om virtuele machines in elkaar te nesten. Er was slechts weiniginformatie over geneste virtualisatie beschikbaar en dit proefschrift gaat dieper inop wat mogelijk is met de huidige technieken.

We testen het nesten van hypervisors gebaseerd op de verschillende virtualisatietechnieken. De technieken die gebruikt werden zijn dynamic binary translation, par-avirtualization en hardware-ondersteuning. Voor de hardware-ondersteuning werdeen onderscheid gemaakt tussen eerste generatie en tweede generatie hardware-on-dersteuning. Succesvolle geneste opstellingen maken gebruik van software-oplossing-en voor de tweede hypervisor en hardware-ondersteuning voor de eerste hypervisor.Slechts een werkende geneste oplossing gebruikt voor beide een software-oplossing.

Benchmarks werden uitgevoerd om te kijken of de prestaties van werkende ge-neste opstellingen performant zijn. De prestaties van de processor, het geheugen enI/O werden getest en vergeleken met de verschillende niveaus van virtualisatie.

We ontdekten dat geneste virtualisatie werkt voor bepaalde opstellingen, voor-al met een software-oplossing bovenop een hypervisor met hardware-ondersteuning.Opstellingen met hardware-ondersteuning voor de bovenste hypervisor zijn nog nietmogelijk. Geneste hardware-ondersteuning zal binnenkort beschikbaar worden, maarvoorlopig is de enige optie het gebruik van een software-oplossing voor de bovenstehypervisor. Uit de resultaten van de benchmarks bleek dat de prestaties van genesteopstellingen veelbelovend zijn.

vii

Page 9: Nesting Virtual Machines in Virtualization Test …nyh/nested/Thesis_OlivierBergh...Nesting Virtual Machines in Virtualization Test Frameworks Dissertation submitted on May 2010 to

Preface

In this section I will give some insight on the creation of this thesis. It was submitted

in partial fulfillment of the requirements for a Master’s degree of Computer Science.

I have always been fascinated by virtualization and during the presentation of open

thesis subjects I stumbled upon the subject of nested virtualization. Right from

the start I found the subject very interesting so I made an appointment for more

information and I eventually got it!

I had already used some virtualization software but I did not know much about

the underlying techniques. During the first semester I followed a course on virtual-

ization, which helped me to learn the fundamentals. It took time to become familiar

with the installation and use of the different virtualization packages. At first, it

took a long time to test one nested setup and it seemed that all I was doing was

installing operating systems in virtual machines. Predefined images can save a lot

of work but I had to find this out the hard way! But even with these predefined

images, a nested setup can take a long time to test and re-test since there are so

many possible configurations.

After the first series of tests, I was quite disappointed about the obtained results.

Due to some setbacks in December and January, I also fell behind on schedule leading

to a hard second semester. It was hard combining this thesis with other courses and

with extracurricular responsibilities during this second semester. I am pleased that

I got back on track and finished the thesis on time! This would not have been

possible without the help from the people around me. I want to thank my girlfriend

Anneleen Wislez for supporting me, not only during this year but during the last

few years. She also helped me with creating the figures for this thesis and reading

the text.

viii

Page 10: Nesting Virtual Machines in Virtualization Test …nyh/nested/Thesis_OlivierBergh...Nesting Virtual Machines in Virtualization Test Frameworks Dissertation submitted on May 2010 to

Further, I would like to show appreciation to my mentors Sam Verboven and

Ruben Van den Bossche for always pointing me in the right direction and for the

help during this thesis. Additionally, I also want to thank my supervisor Prof. Dr.

Jan Broeckhove and co-supervisor Dr. Kurt Vanmechelen giving me the opportunity

to make this thesis.

A special thank you goes out to all my fellow students and especially to Kristof

Overdulve for the interesting conversations and the laughter during the past years.

And last but not least I want to thank my parents and sister for supporting me

throughout my education; my dad for offering support by buying his new computer

earlier and borrowing it so I could do a second series of tests on a new processor

and my mom for the excellent care and interest in what I was doing.

ix

Page 11: Nesting Virtual Machines in Virtualization Test …nyh/nested/Thesis_OlivierBergh...Nesting Virtual Machines in Virtualization Test Frameworks Dissertation submitted on May 2010 to

Abstract

Virtualization has become a widespread technology that is used to abstract, combineor divide computing resources to allow resource requests to be described and fulfilledwith minimal dependence on the underlying physical delivery. The x86 architecturewas not designed with virtualization in mind and contains certain non-virtualizableinstructions. This has resulted in the emergence of several software solutions and hasled to the introduction of hardware support. The expanding range of applicationsensures that users increasingly want to use virtualization. Among other things,the need for entire physical setups for research purposes can be avoided by usingvirtualization. For components that already use virtualization, executing a virtualmachine inside a virtual machine is necessary, this is called nested virtualization.There has been little related work on nested virtualization and this thesis elaborateson what is possible with current techniques.

We tested the nesting of hypervisors based on the different virtualization tech-niques. The techniques that were used are dynamic binary translation, paravir-tualization and hardware support. For hardware support, a distinction was madebetween first generation and second generation hardware support. Successful nestedsetups use a software solution for the inner hypervisor and hardware support for thebottom layer hypervisor. Only one working nested setup uses software solutions forboth hypervisors.

Performance benchmarks were conducted to find out if the performance of work-ing nested setups is reasonable. The performance of the processor, the memory andI/O was tested and compared with the different levels of virtualization.

We found that nested virtualization on the x86 architecture works for certainsetups, especially with a software solution on top of a hardware supported hyper-visor. Setups with hardware support for the inner hypervisor are not yet possible.The nested hardware support will be coming soon but until then, the only option isthe use of a software solution for the inner hypervisor. Results of the performancebenchmarks showed that performance of the nested setups is promising.

x

Page 12: Nesting Virtual Machines in Virtualization Test …nyh/nested/Thesis_OlivierBergh...Nesting Virtual Machines in Virtualization Test Frameworks Dissertation submitted on May 2010 to

CHAPTER 1

Introduction

Within the research surrounding grid and cluster computing there are many devel-opments at different levels that make use of virtualization. Virtualization can beused for all, or a selection of the components in grid or cluster middleware. Grids orclusters are also using virtualization to run separate applications in a sandbox envi-ronment. Both developments bring advantages concerning security, fault tolerance,legacy support, isolation, resource control, consolidation, etc.

Complete test setups are not available or desirable for many development andresearch purposes. If certain performance limitations do not pose a problem, virtu-alization of all components in a system can avoid the need for physical grid or clustersetups. This thesis focusses on the latter, the consolidation of several physical clus-ter machines by virtualizing them on a single physical machine. The virtualizationof cluster machines that use virtualization themselves leads to a combination of theabove mentioned levels.

1.1 Goals

The goal of this thesis is to find out whether different levels of virtualization are pos-sible with current virtualization techniques. The research question is whether nestedvirtualization works on the x86 architecture. In cases where nested virtualizationworks we want to find out what the performance degradation is when comparedto a single level of virtualization or to a native solution. For cases where nestedvirtualization does not work we search for the reasons of the failure and what needsto be changed in order for it to work. The experiments are conducted with some ofthe most popular virtualization software to find an answer to the posed question.

Page 13: Nesting Virtual Machines in Virtualization Test …nyh/nested/Thesis_OlivierBergh...Nesting Virtual Machines in Virtualization Test Frameworks Dissertation submitted on May 2010 to

1.2. OUTLINE 2

1.2 Outline

The outline of this thesis is as follows. Chapter 2 contains an introduction to vir-tualization, a brief history of virtualization is given followed by a few definitionsand a taxonomy of virtualization in general. The chapter ends with the formal re-quirements needed for virtualization on a computer architecture and how the x86architecture compares to these requirements.

Chapter 3 describes the evolution of virtualization for the x86 architecture. Vir-tualization software first used software techniques, at a later stage processor vendorsprovided hardware support for virtualization. The last section of the chapter pro-vides an overview of a selection of the most popular virtualization software.

Chapter 4 provides a theoretical view for the requirements of nested virtualiza-tion on the x86 architecture. For each technique described in chapter 3, a detailedexplanation of the theoretical requirements gives more insight in whether nestedvirtualization can work for the given technique.

Chapter 5 investigates the actual nesting of virtual machines using some of themost popular virtualization software solutions. The different virtualization tech-niques are combined to get an overview of which nested setup works best. Chap-ter 6 presents performance results of the working nested setups in chapter 5. Systembenchmarks are executed on each setup and the results are compared.

Chapter 7 summarizes the results in this thesis and gives directions for futurework.

Page 14: Nesting Virtual Machines in Virtualization Test …nyh/nested/Thesis_OlivierBergh...Nesting Virtual Machines in Virtualization Test Frameworks Dissertation submitted on May 2010 to

CHAPTER 2

Virtualization

In recent years virtualization has become a widespread technology that is used toabstract, combine or divide computing resources to allow resource requests to be de-scribed and fulfilled with minimal dependence on the underlying physical delivery.The first tracks of virtualization can be traced back to the 1960’s [1, 2] in researchprojects that provided concurrent, interactive access to mainframes. Each virtualmachine (VM) gave the user the illusion of working directly on a physical machine.By partitioning the system into virtual machines, multiple users could concurrentlyuse the system each within their own operating system. The projects provided anelegant way to enable time- and resource-sharing on expensive mainframes. Userscould execute, develop, and test applications within their own virtual machine with-out interfering with other users. In that time, virtualization was used to reduce thecost of acquiring new hardware and to improve the productivity by letting moreusers work simultaneously.

In the late 1970’s and early 1980’s virtualization became unpopular because ofthe introduction of cheaper hardware and multiprocessing operating systems. Thepopular x86 architecture lacked the power to run multiple operating systems at thesame time. But since this hardware was so cheap, a dedicated machine was used foreach separate application. The use of these dedicated machines led to a decrease inthe use of virtualization.

The ideas of virtualization became popular again in the late 1990’s with theemergence of a wide variety of operating systems and hardware configurations. Vir-tualization was used for executing a series of applications, targeted for differenthardware or operating systems, on a given machine. Instead of buying dedicatedmachines and operating systems for each application, the use of virtualization onone machine offers the ability to create virtual machines that are able to run theseapplications.

Virtualization concepts can be used in many areas of computer science. Largevariations in the abstraction level and underlying architecture lead to many defini-

Page 15: Nesting Virtual Machines in Virtualization Test …nyh/nested/Thesis_OlivierBergh...Nesting Virtual Machines in Virtualization Test Frameworks Dissertation submitted on May 2010 to

2.1. APPLICATIONS 4

tions of virtualization. In “A survey on virtualization technologies”, S. Nanda andT. Chiueh define virtualization by the following relaxed definition [1]:

Definition 2.1 Virtualization is a technology that combines or divides computingresources to present one or many operating environments using methodologies likehardware and software partitioning or aggregation, partial or complete machine sim-ulation, emulation, time-sharing, and many others.

The definition mentions the aggregation of resources but in this context the focuslies on the partitioning of resources. Throughout the rest of this thesis, virtualizationprovides infrastructure used to abstract lower-level, physical resources and to createmultiple independent and isolated virtual machines.

2.1 Applications

The expanding range of computer applications and their varied requirements forhardware and operating systems increases the need for users to start using virtu-alization. Most people will have already used virtualization without realizing itbecause there are many applications where virtualization can be used in some form.This section elaborates on some practical applications where virtualization can beused. S. Nanda and T. Chiueh enumerate some of these applications in “A surveyon virtualization technologies” but the list is not complete and one can easily thinkof other applications [1].

A first practical application that benefits from using virtualization is server con-solidation [3]. It allows system administrators to consolidate workloads of multipleunder-utilized machines to a few powerful machines. This saves hardware, man-agement, administration of the infrastructure, space, cooling and power. A secondapplication that also involves consolidation is application consolidation. A legacyapplication might require faster and newer hardware but might also require a legacyoperating system. The need for such legacy applications could be served well byvirtualizing the newer hardware.

Virtual machines can be used for providing secure, isolated environments torun foreign or less-trusted applications. This form of sandboxing can help buildsecure computing platforms. Besides sandboxing, virtualization can also be usedfor debugging purposes. It can help debug complicated software such as operatingsystems or device drivers by letting the user execute them on an emulated PCwith full software controls. Moreover, virtualization can help produce arbitrary testscenarios that are hard to produce in reality and thus eases the testing of software.

Virtualization provides the ability to capture the entire state of a running virtualmachine, which creates new management possibilities. Saving the state of a virtualmachine, also called a snapshot, offers the user the capability to roll back to the savedstate when, for example, a crash occurs in the virtual machine. The saved state canalso be used to package an application together with its required operating system,this is often called an “appliance”. This eases the installation of that application on

Page 16: Nesting Virtual Machines in Virtualization Test …nyh/nested/Thesis_OlivierBergh...Nesting Virtual Machines in Virtualization Test Frameworks Dissertation submitted on May 2010 to

2.2. TAXONOMY 5

a new server, lowering the entry barrier for its use. Another advantage of snapshotsis that the user can copy the saved state to other physical servers and use the newinstance of the virtual machine without having to install it from scratch. This isuseful for migrating virtual machines from one physical server to other physicalservers when needed.

Another practical application is the use of virtualization within distributed net-work computing systems [4]. Such a system must deal with the complexity of de-coupling local administration policies and configuration characteristics of distributedresources from the quality of service expected from end users. Virtualization cansimplify or eliminate this complex decoupling because it offers functionality likeconsolidation of physical resources, security and isolation, flexibility and ease ofmanagement.

It is not difficult to see that the practical applications given in this section are justa few examples of the many possible uses for virtualization. The number of possibleadvantages that virtualization can provide continues to rise, making it more andmore popular.

2.2 Taxonomy

Virtual machines can be divided into two main categories, namely process virtualmachines and system virtual machines. In order to describe the differences, thissection starts with an overview of the different implementation layers in a computersystem, followed by the characteristics of process virtual machines. Finally, thecharacteristics of system virtual machines are explained. Most information in thissection is deduced from the book “Virtual machines: Versatile platforms for systemsand processes” by J. E. Smith and R. Nair [5].

Figure 2.1: Implementation layers in a computer system.

Page 17: Nesting Virtual Machines in Virtualization Test …nyh/nested/Thesis_OlivierBergh...Nesting Virtual Machines in Virtualization Test Frameworks Dissertation submitted on May 2010 to

2.2. TAXONOMY 6

The complexity in computer systems is tackled by the division into levels ofabstraction separated by well-defined interfaces. Implementation details at lowerlevels are ignored or simplified by introducing levels of abstraction. In both hard-ware and software in a computer system, the levels of abstraction correspond toimplementation layers. A typical architecture of a computer system consist of sev-eral implementation layers. Figure 2.1 shows the key implementation layers in atypical computer system. At the base of the computer system we have the hardwarelayer consisting of all the different components of a modern computer. Just abovethe hardware layer, we find the operating system layer which exploits the hardwareresources to provide a set of services to system users [6]. The libraries layer allowsapplication calls to invoke various services available on the system, including thoseprovided by the operating system. At the top, the application layer consists of theapplications running on the computer system.

Figure 2.1 also shows the three interfaces between the implementation layers –the instruction set architecture (ISA), the application binary interface (ABI), and theapplication programming interface (API) – which are especially important for virtualmachine construction [7]. The division between hardware and software is markedby the instruction set architecture. The ISA consists of two interfaces, the user ISAand the system ISA. The user ISA includes the aspects visible to the libraries andapplication layers. The system ISA is a superset of the user ISA which also includesthose aspects visible to supervisor software, such as the operating system.

The application binary interface provides a program or library access to thehardware resources and services available in the system. This interface consists ofthe user ISA and a system call interface which allows application programs to interactwith the shared hardware resources indirectly. The ABI allows the operating systemto perform operations on behalf of a user program.

The application programming interface allows a program to invoke various ser-vices available on the system and is usually defined with respect to a high-levellanguage (HHL). An API enables application written to the API to be ported easilyto other systems that support the same API. The interface consists of the user ISAand of HHL library calls.

Using the three interfaces, virtual machines can be divided into two main cate-gories: process virtual machines and system virtual machines. A process VM runs asingle program, supporting only an individual process. It provides a user applicationwith a virtual ABI or API environment. The process virtual machine is created whenthe corresponding process is created and terminates when the process terminates.The system virtual machines provide a complete system environment in which manyprocesses can coexist. System VMs do this by virtualizing the ISA layer.

2.2.1 Process virtual machines

Process virtual machines virtualize the ABI or API and can run only a single userprogram. Each virtual machine thus supports a single process, possibly consistingof multiple threads. The most common process VM is an operating system. Itsupports multiple user processes to run simultaneously by time-sharing the limitedhardware resources. The operating system provides a replicated process VM for

Page 18: Nesting Virtual Machines in Virtualization Test …nyh/nested/Thesis_OlivierBergh...Nesting Virtual Machines in Virtualization Test Frameworks Dissertation submitted on May 2010 to

2.2. TAXONOMY 7

each executing program so that each program thinks that it has its own machine.Program binaries that are compiled for a different instruction set are also sup-

ported by process VMs. There are two approaches for emulating the instruction set.Interpretation is a simple but slow approach; an interpreter fetches, decodes andemulates each individual instruction. A more efficient approach is dynamic binarytranslation, which is explained in section 3.1.

The emulation between different instruction sets provides cross-platform com-patibility only on case-by-case basis and requires considerable programming effort.Designing a process-level VM together with an HLL application development en-vironment is an easier way to achieve full cross-platform portability. The HHLvirtual machine does not correspond to any real platform, but is designed for easeof portability. The Java programming language is a widely used example of a HHLVM.

2.2.2 System virtual machines

System virtual machines provide a complete system environment by virtualizing theISA layer. They allow a physical hardware system to be shared among multiple,isolated guest operating system environments simultaneously. The layer that pro-vides the hardware virtualization is called the virtual machine monitor (VMM) orhypervisor. It manages the hardware resources so that multiple guest operatingsystem environments and their user programs can execute simultaneously. Subdi-vision is centered on the supported ISAs of the guest operating systems, whethervirtualization or emulation is used. Virtualization can be further subdivided basedon the location where the hypervisor is executed: native or hosted. The followingtwo paragraphs clarify the subdivision according to the supported ISAs.

Emulation: Guest operating systems with a different ISA from the host ISAcan be supported through emulation. The hypervisor must emulate both the appli-cation and operating system code by translating each instruction to the ISA of thephysical machine. The translation is applied to each instruction so that the hyper-visor can easily manage all hardware resources. Using emulation for guest operatingsystems with the same ISA as the host ISA, performance will be severely lower thanusing virtualization.

Virtualization: When the ISA of the guest operating system is the same asthe host ISA, virtualization can be used to improve performance. It treats non-privileged instructions and privileged instructions differently. A privileged instruc-tion is an instruction that traps when executed in user mode instead of in kernelmode and will be discussed in more detail in section 2.3. Non-privileged instructionsare executed directly on the hardware without intervention of the hypervisor. Privi-leged instructions are caught by the hypervisor and translated in order to guaranteecorrect results. When guest operating systems primarily execute non-privileged in-structions, the performance is comparable to near native speed.

Thus, when the ISA of the guest and the host are the same, the best performingtechnique is virtualization. It improves performance in terms of execution speed byrunning non-privileged instructions directly on the hardware. If the ISA of the guestand the host are different, emulation is the only way to execute the guest operating

Page 19: Nesting Virtual Machines in Virtualization Test …nyh/nested/Thesis_OlivierBergh...Nesting Virtual Machines in Virtualization Test Frameworks Dissertation submitted on May 2010 to

2.2. TAXONOMY 8

system. The subdivision of virtualization based on the location of the hypervisor isclarified in the next two paragraphs.

Native, bare-metal hypervisor: A native, bare-metal hypervisor, also re-ferred to as a Type 1 hypervisor, is the first layer of software installed on a cleansystem. The hypervisor runs in the most privileged mode, while all the guests runin a less privileged mode. It runs directly on the hardware and executes the in-tercepted instructions directly on the hardware. According to J. E. Smith and R.Nair, a bare-metal hypervisor is more efficient than a hosted hypervisor in manyrespects since it has direct access to hardware resources, enabling greater scalabil-ity, robustness and performance [5]. There are some variations of this architecturewhere a privileged guest operating system handles the intercepted instructions. Thedisadvantage of a native, bare-metal hypervisor is that a user must clear the existingoperating systems in order to install the hypervisor.

Hosted hypervisor: An alternative to a native, bare-metal hypervisor isthe hosted or Type 2 hypervisor. It runs on top of a standard operating systemand supports the broadest range of hardware configurations [3]. The installationof the hypervisor is similar to the installation of an application within the hostoperating system. The hypervisor relies on the host OS for device support andphysical resource management. Privileged instructions cannot be executed directlyon the hardware but are modified by the hypervisor and passed down to the hostOS.

The implementation specifics of Type 1 and Type 2 hypervisors can be separatedinto several categories: dynamic binary translation, paravirtualization and hardwareassisted virtualization. These approaches are discussed in more detail in chapter 3,which elaborates on virtualization within system virtual machines. An overview ofthe taxonomy of virtual machines is shown in figure 2.2.

Figure 2.2: Taxonomy of virtual machines.

Page 20: Nesting Virtual Machines in Virtualization Test …nyh/nested/Thesis_OlivierBergh...Nesting Virtual Machines in Virtualization Test Frameworks Dissertation submitted on May 2010 to

2.3. X86 ARCHITECTURE 9

2.3 x86 architecture

The taxonomy given in the previous section provides an overview of different virtualmachines and different implementation approaches. This section gives detailed infor-mation about the requirements associated with virtualization and the problems thatoccur when virtualization technologies are implemented on the x86 architecture.

2.3.1 Formal requirements

In order to provide insight into the problems and solutions for virtualization on topof the x86 architecture, the formal requirements for a virtualizable architecture aregiven first. These requirements describe what is needed in order to use virtualiza-tion on a computer architecture. In “Formal requirements for virtualizable thirdgeneration architectures”, G. J. Popek and R. P. Goldberg defined a set of formalrequirements for a virtualizable computer architecture [8]. They divided the ISAinstruction into several groups. The first group contains the privileged instructions:

Definition 2.2 Privileged instructions are all the ISA instruction that only work inkernel mode and trap when executed in user mode instead of in kernel mode.

Another important group of instructions that will have a big influence on the vir-tualizability of a particular machine are the sensitive instructions. Before definingsensitive instructions, the notions of behaviour sensitive and control sensitive areexplained.

Definition 2.3 An instruction is behaviour sensitive if the effect of its executiondepends on the state of the hardware, i.e. upon its location in real memory, or onthe mode.

Definition 2.4 An instruction is control sensitive if it changes the state of thehardware upon execution, i.e. it attempts to change the amount of resources availableor affects the processor mode without going through the memory trap sequence.

With these notions, instructions can be separated into sensitive instructions andinnocuous instructions.

Definition 2.5 Sensitive instructions is the group of instructions that are eithercontrol sensitive or behaviour sensitive.

Definition 2.6 Innocuous instructions is the group of instruction that are not sen-sitive instructions.

According to Popek and Goldberg, there are three properties of interest when anyarbitrary program is executed while the control program (the virtual machine mon-itor) is resident: efficiency, resource control, and equivalence.

The efficiency property: All innocuous instructions are executed by the hard-ware directly, with no intervention at all on the part of the control program.

Page 21: Nesting Virtual Machines in Virtualization Test …nyh/nested/Thesis_OlivierBergh...Nesting Virtual Machines in Virtualization Test Frameworks Dissertation submitted on May 2010 to

2.3. X86 ARCHITECTURE 10

The hypervisor should not intervene for instructions that do no harm. These in-structions do not change the state of the hardware and should be executed by thehardware directly in order to preserve performance. The more instructions areexecuted directly, the better the performance of the virtualization will be. Thisproperty highlights the contrast between emulation - where every single instructionis analyzed - and virtualization.

The resource control property: It must be impossible for that arbitrary pro-gram to affect the system resources, i.e. memory, available to it; the allocator of thecontrol program is to be invoked upon any attempt.The hypervisor is in full control of the hardware resources. A virtual machine shouldnot be able to access the hardware resources directly. It should go through the hy-pervisor to ensure correct results and isolation from other virtual machines.

The equivalence property: Any program K executing with a control programresident, with two possible exceptions, performs in a manner indistinguishable fromthe case when the control program did not exist and K had whatever freedom of ac-cess to privileged instructions that the programmer had intended.A program running on top of a hypervisor should perform the identical behaviouras in the case where the program would run on the hardware directly. As men-tioned, there are two exceptions: timing and resource availability problems. Thehypervisor will occasionally intervene and instruction sequences may take longer toexecute. This can lead to incorrect results in the assumptions about the length ofthe program. The second exception, the resource availability problem, might occurwhen the hypervisor does not satisfy a particular request for space. The programmay then be unable to function in the same way as if the space were made available.The problem could easily occur, since the virtual machine monitor itself and otherpossible virtual machines take space as well. A virtual machine environment canbe seen as a “smaller” version of the actual hardware: logically the same, but withlesser quantity of certain resources.

Given the categories of instructions and the properties, they define the hypervisorand a virtualizable architecture as:

Definition 2.7 We say that a virtual machine monitor, or hypervisor, is any con-trol program that satisfies the three properties of efficiency, resource control andequivalence. Then functionally, the environment which any program sees when run-ning with a virtual machine present is called a virtual machine. It is composed ofthe original real machine and the virtual machine monitor.

Definition 2.8 For any conventional third generation computer, a virtual machinemonitor may be constructed, i.e. it is a virtualizable architecture, if the set of sen-sitive instructions for that computer is a subset of the set of privileged instructions.

Page 22: Nesting Virtual Machines in Virtualization Test …nyh/nested/Thesis_OlivierBergh...Nesting Virtual Machines in Virtualization Test Frameworks Dissertation submitted on May 2010 to

2.3. X86 ARCHITECTURE 11

2.3.2 The x86 protection level architecture

The x86 architecture recognizes four privilege levels, numbered from 0 to 3 [9].Figure 2.3 shows how the privilege levels can be interpreted as rings of protection.The center ring, ring 0, is reserved for the most privileged code and is used forthe kernel of an operating system. When the processor is running in kernel mode,the code is executing in ring 0. Rings 1 and 2 are less privileged and are usedfor operating system services. These two are rarely used but some techniques invirtualization will run the guests inside ring 1. The most outer ring is used forapplications and has the least privileges. The code of applications running in usersmode will execute in ring 3.

Figure 2.3: The x86 protection levels.

These rings are used to prevent a program operating in a lower ring from access-ing more privileged system routines. A call gate is used to allow an outer ring toaccess an inner ring’s resource in a predefined manner.

2.3.3 The x86 architecture problem

A computer architecture can support virtualization if it meets the formal require-ments described in subsection 2.3 . The x86 architecture, however, does not meet therequirements posed above. The x86 instruction set architecture contains sensitiveinstructions that are non-privileged, called non-virtualizable instructions. In otherwords, these instruction will not trap when executed in user mode and they dependon or change the hardware state. This is not desirable because the hypervisor can-not simulate the effect of the instruction. The current hardware state could belongto another virtual machine, producing an incorrect result for the current virtualmachine.

The non-virtualizable instructions make virtualization on the x86 architecturemore difficult. Virtualization techniques will need to deal with these instructions.Applications will only run at near native speed when they contain a minimum

Page 23: Nesting Virtual Machines in Virtualization Test …nyh/nested/Thesis_OlivierBergh...Nesting Virtual Machines in Virtualization Test Frameworks Dissertation submitted on May 2010 to

2.3. X86 ARCHITECTURE 12

amount of non-virtualizable instructions. Approaches that overcome the limitationsof the x86 architecture are discussed in the next chapter.

Page 24: Nesting Virtual Machines in Virtualization Test …nyh/nested/Thesis_OlivierBergh...Nesting Virtual Machines in Virtualization Test Frameworks Dissertation submitted on May 2010 to

CHAPTER 3

Evolution of virtualization for the x86 architecture

Developers of virtualization software did not wait until processor vendors solved thex86 architecture problem. They introduced software solutions like binary transla-tion and, when virtualization became more popular, paravirtualization. Processorvendors then introduced hardware support to solve the design problem of the x86architecture and at a later stage to improve the performance. The next generationhardware support was introduced to improve performance concerning the memorymanagement. This chapter gives an overview of the evolution towards hardwaresupported virtualization on x86 architectures. Dynamic binary translation, a soft-ware solution that tries to circumvent the design problem of the x86 architecture,is explained in the first section. The second section explains paravirtualization, asoftware solution which tries to improve the binary translation concept. It has someadvantages and disadvantages over dynamic binary translation. The third sectiongives details on the first generation hardware support and its advantages and disad-vantages over software solutions. In many cases the software solutions outperformthe hardware support. The next generation hardware support tries to further closethe performance gap by eliminating major sources of virtualization overhead. Thesecond generation hardware support focusses on memory management and is dis-cussed in the fourth section. The last section gives an overview of VirtualBox, KVMand Xen, which are virtualization products and VMware, a company providing mul-tiple virtualization products.

3.1 Dynamic binary translation

In full virtualization, the guest OS is not aware that it is running inside a vir-tual machine and requires no modifications [10]. Dynamic binary translation is atechnique that implements full virtualization. It requires no hardware assisted or

Page 25: Nesting Virtual Machines in Virtualization Test …nyh/nested/Thesis_OlivierBergh...Nesting Virtual Machines in Virtualization Test Frameworks Dissertation submitted on May 2010 to

3.1. DYNAMIC BINARY TRANSLATION 14

operating system assisted support while other techniques, like paravirtualization,need modifications to either the hardware or the operating system.

Dynamic binary translation is a technique which works by translating code fromone instruction set to another. The word “dynamic” indicates that the translationis done on the fly and is interleaved with execution of the generated code [11]. Theword “binary” indicates that the input is binary code and not source code. Toimprove performance, the translation is mostly done on blocks of code instead ofsingle instructions [12]. A block of code is defined by a sequence of instructionsthat end with a jump or branch instruction. A translation cache is used to avoidretranslating code blocks multiple times.

In x86 virtualization, dynamic binary translation is not used to translate be-tween different instruction set architectures. Instead, the translation is done fromx86 instructions to x86 instructions. This makes the translation a lot lighter thanprevious binary translation technologies [13]. Since it is a translation between thesame ISA, a copy of the original instructions often suffices. In other words, generallyno translation is needed and the code can be executed as is. In particular, when-ever the guest OS is executing code in user mode, no translation will be carried outand the instructions are executed directly, which is comparable in performance toexecution of the code natively. Code that the guest OS wants to execute in kernelmode will be translated on the fly and is saved in the translation cache.

Even when the guest OS is running kernel code, most times no translation isneeded and the code is copied as is. Only in some cases will the hypervisor need totranslate instructions of the kernel code to guarantee the integrity of the guest. Thekernel of the guest is executed in ring 1 instead of ring 0 when using software virtu-alization. As explained in section 2.3, the x86 instruction set architecture containssensitive instructions that are non-privileged. If the kernel of the guest operatingsystem wants to execute privileged instructions or one of these non-virtualizableinstructions, the dynamic binary translation will translate the instructions into asafe equivalent. The safe equivalent will not harm other guests or the hypervisor.For example, if access to the physical hardware is needed, the performed transla-tion assures that the code will use the virtual hardware instead. In these cases, thetranslation ensures that the safe code is also less costly than the code with privilegedinstructions. The code with privileged instructions would trap when running in ring1 and the hypervisor should handle these traps. The dynamic binary translationthus avoids the traps by replacing the privileged instruction so that there are lessinterrupts and the safe code will be less costly.

The translation of code into safer equivalents is less costly than letting the priv-ileged instructions trap, but the translation itself should also be taken into account.Luckily, the translation overhead is rather low and will decrease over time sincetranslated pieces of code are cached in order to avoid retranslation in case of loopsin the code. Yet, dynamic binary translation has a few cases it cannot fully solve:system calls, I/O, memory management and complex code. The latter is the setof code that, for example, does self-modification or has indirect control flows. Thiscode is complex to execute, even on an operating system that runs natively. Theother cases are now described in more detail in the next subsections.

Page 26: Nesting Virtual Machines in Virtualization Test …nyh/nested/Thesis_OlivierBergh...Nesting Virtual Machines in Virtualization Test Frameworks Dissertation submitted on May 2010 to

3.1. DYNAMIC BINARY TRANSLATION 15

3.1.1 System calls

A system call is a mechanism used by processes to access the services provided by theoperating system. This involves a transition to the kernel where the required functionis then performed [6, 14]. The kernel of an operating system is also a process, but itdiffers from other processes in that it has privileged access to processor instructions.The kernel will not execute directly but only when it receives an interrupt from theprocessor or a system call from another process also running in the operating system.There are many different techniques for implementing system calls. One way is touse a software interrupt and trap, but for x86 a faster technique was chosen [13, 15].Intel and AMD have come up with the instructions SYSCALL/SYSENTER andSYSRET/SYSEXIT for a process to do a system call. These instructions transfercontrol to the kernel without the overhead of an interrupt.

In software virtualization the kernel of the guest will run inside ring 1 insteadof ring 0. This implies that the hypervisor should intercept a SYSENTER (orSYSCALL), translate the code and hand over control to the kernel of the guest.This kernel then executes the translated code and execute a SYSEXIT (or SYS-RET) to return control back to the process that requested the service of the kernel.Because the kernel of the guest is running inside ring 1, it does not have the privi-lege to perform the SYSEXIT. This will cause an interrupt at the processor and thehypervisor has to emulate the effect of this instruction.

System calls will cause a significant amount of overhead when using softwarevirtualization. In a virtual machine, a system call costs about 10 times the cyclesneeded for a system call on a native machine. In “A comparison of software andhardware techniques for x86 virtualization”, the authors measured that a systemcall on a 3.8 GHz Pentium 4 takes 242 cycles [11]. On the same machine, a systemcall in a virtual machine, virtualized with dynamic binary translation and the kernelrunning in ring 1, takes 2308 cycles. In an environment where virtualization is usedthere will most likely be more than one virtual machine on a physical machine.In this case, the overhead of the system calls can become a significant part of thevirtualization overhead. As we will see later, hardware support for virtualizationoffers a solution for this.

3.1.2 I/O virtualization

When creating a virtual machine, not only the processor needs to be virtualizedbut also all the essential hardware like memory and storage. Each I/O device typehas its own characteristics and needs to be controlled in its own special way [5].There are often a large number of devices for an I/O device type and this numbercontinues to rise. The strategy consists of constructing a virtual I/O device andthen virtualizing the I/O activity that is directed at the device. Every access tothis virtual hardware must be translated to the real hardware. The hypervisormust intercept all I/O operations issued by the guest operating system and it mustemulate these instructions using software that understands the semantics of thespecific I/O port accessed [16]. The I/O devices are emulated because of the ease ofmigration and multiplexing advantages [17]. Migration is easy because the virtual

Page 27: Nesting Virtual Machines in Virtualization Test …nyh/nested/Thesis_OlivierBergh...Nesting Virtual Machines in Virtualization Test Frameworks Dissertation submitted on May 2010 to

3.1. DYNAMIC BINARY TRANSLATION 16

device exists in memory and can easily be transferred. The hypervisor can presenta virtual device to each guest while performing the multiplexing.

Emulation has the disadvantage of poor performance. The hypervisor mustperform a significant amount of work to present the illusion of a virtual device.The great number of physical devices make the emulation of the I/O devices in thehypervisor complex. The hypervisor needs drivers for every physical device in orderto be usable on different physical systems. A hosted hypervisor has the advantagethat it can reuse the device drivers provided by the host operating system. Anotherproblem is that the virtual I/O device is often a device model which does not matchthe full power of the underlying physical devices [18]. This means that optimizationsimplemented by specific devices can be lost in the process of emulation.

3.1.3 Memory management

In an operating system, every application has the illusion that it is working with apiece of contiguous memory. Whereas in reality, the memory used by applicationscan be dispersed across the physical memory. The application is working with virtualaddresses that are translated to physical addresses. The operating system managesa set of tables to do the translation of the virtual memory to the physical addresses.The x86 architecture provides support for paging in the hardware. Paging is theprocess that translates virtual addresses of a process to a system physical address.The hardware that translates the virtual addresses to physical addresses is calledthe memory management unit or MMU.

The page table walker performs address translation using the page tables anduses a hardware page table pointer, the CR3 register, to start the page walk [19].It will traverse several page table entries which point to the next level of the walk.The memory hierarchy will be traversed many times when the page walker performsaddress translation. To keep this overhead within limits, a translation look-asidebuffer (TLB) is used. The most recent translation will be saved in this buffer. Theprocessor will first check the TLB to see whether the translation is located in thecache. When the translation is found in the buffer this translation is used, otherwisea page walk is performed and this result is saved in the TLB. The operating systemand the processor must cooperate in order to assure that the TLB stays consistent.

Inside a virtual machine the guest operating system manages its own page tables.The task of the hypervisor is to virtualize the memory but also virtualize the virtualmemory so that the guest operating system can use virtual memory [20]. Thisintroduces an extra level of translation which maps physical addresses of the guestto real physical addresses of the system. The hypervisor must manage the addresstranslation on the processor using software techniques. It derives a shadow versionof the page table from the guest page table, which holds the translations of thevirtual guest addresses to the real physical addresses. This shadow page table willbe used by the processor when the guest is active and the hypervisor manages thisshadow table to keep it synchronized with the guest page table. The guest doesnot have access to these shadow page tables and can only see his guest page tableswhich runs on an emulated MMU. It has the illusion that it can translate the virtualaddresses to real physical ones. In the background, the hypervisor will deal with the

Page 28: Nesting Virtual Machines in Virtualization Test …nyh/nested/Thesis_OlivierBergh...Nesting Virtual Machines in Virtualization Test Frameworks Dissertation submitted on May 2010 to

3.2. PARAVIRTUALIZATION 17

real translation using the shadow page tables.

Figure 3.1: Memory management in x86 virtualization using shadow tables.

Figure 3.1 shows the translations needed for translating a virtual guest addressinto a real physical address. Without the shadow page tables, the virtual guestmemory (orange area) will be translated into physical guest memory (blue area)and the latter is translated into real physical memory (white area). The shadowpage tables avoid the double translation by immediately translating the virtual guestmemory (orange) into real physical memory (white) as shown by the red arrow.

In software, several techniques can be used to keep the shadow page tables andguest page tables consistent. These techniques use the page fault exception mecha-nism of the processor. It throws an exception when a page fault occurred and allowsthe hypervisor to update the current shadow page table. This introduces extra pagefaults due to the shadow paging. The shadow page tables introduce an overheadbecause of the extra page faults and the extra work in keeping the shadow tablesup to date. The shadow page tables also consume additional memory. Maintainingshadow page tables for SMP guests also introduces a certain overhead. Each proces-sor in the guest can use the same guest page table instance. The hypervisor couldmaintain shadow page tables instances that can be used at each processor, whichresults in memory overhead. Another possibility is to share the shadow page tablebetween the virtual processors leading to synchronization overheads.

3.2 Paravirtualization

Paravirtualization is in many ways comparable to dynamic binary translation. It isalso a software technique designed to enable virtualization on the x86 architecture.As explained in “Denali: Lightweight Virtual Machines for Distributed and Net-worked Applications,” and used in Denali [21], paravirtualization exposes a virtualarchitecture to the guest that is slightly different than the physical architecture.

Page 29: Nesting Virtual Machines in Virtualization Test …nyh/nested/Thesis_OlivierBergh...Nesting Virtual Machines in Virtualization Test Frameworks Dissertation submitted on May 2010 to

3.2. PARAVIRTUALIZATION 18

Dynamic binary translation translates “critical” code into safe code on the fly. Par-avirtualization does the same thing but requires changes in the source code of theoperating system in advance. The operating systems built for the x86 architectureare by default not compatible with the paravirtualized architecture. This is a majordisadvantage for existing operating systems because extra effort is needed in orderto run these operating systems inside a paravirtualized guest. In the case of De-nali, which provides light weight virtual machines, it allowed them to co-design thevirtual architecture with the operating system.

The advantages of a successful paravirtualization is a simpler hypervisor imple-mentation and an improvement in the performance degradation compared to thephysical system. Better performance is achieved because many unnecessary trapsby the hypervisor are eliminated. The hypervisor provides hypercall interfaces forcritical kernel operations such as memory management, interrupt handling and timekeeping [10]. The guest operating system is adapted so that it is aware of the vir-tualization. The kernel is modified to replace non-virtualizable instructions withhypercalls that communicate directly with the hypervisor. The binary translationoverhead is completely eliminated since the modifications are done in the operat-ing system at design time. The implementation of the hypervisor is much simplerbecause it does not contain the binary translator.

3.2.1 System calls

The overhead of system calls can be improved a bit. The dynamic binary trans-lation technique intercepts each SYSENTER/SYSCALL instruction and translatesthe instruction to hand over the control to the kernel of the guest operating system.Afterwards, the guest operating system’s kernel executes a SYSEXIT/SYSRET in-struction to return to the application. This instruction is again intercepted andtranslated by the dynamic binary translation. The paravirtualization technique al-lows guest operating systems to install a handler for system calls, permitting directcalls from an application into its guest OS and avoiding indirection through thehypervisor on every call [22]. This handler is validated before installation and isaccessed directly by the processor without indirection via ring 0.

3.2.2 I/O virtualization

Paravirtualization software mostly uses a different approach for I/O virtualizationcompared to the emulation used with dynamic binary translation. The guest oper-ating system utilizes a paravirtualized driver that operates on a simplified abstractdevice model exported by the hypervisor [23]. The real device driver can reside inthe hypervisor, but often resides in a separate device driver domain which has privi-leged access to the device hardware. The latter one is attractive since the hypervisordoes not need to provide the device drivers but the drivers of a legacy operating sys-tem can be used. Separating the address space of the device drivers from guest andhypervisor code also prevents buggy device drivers from causing system crashes.

The paravirtualized drivers remove the need to emulate devices. They free upprocessor time and resources which would otherwise be needed to emulate hardware.Since there is no emulation of the device hardware, the overhead is significantly re-

Page 30: Nesting Virtual Machines in Virtualization Test …nyh/nested/Thesis_OlivierBergh...Nesting Virtual Machines in Virtualization Test Frameworks Dissertation submitted on May 2010 to

3.3. FIRST GENERATION HARDWARE SUPPORT 19

duced. In Xen, well-known for its use of paravirtualization, the real device driversreside in a privileged guest known as domain 0. A description of Xen can be foundin subsection 3.6.3. However, Xen is not the only hypervisor that uses paravirtu-alization for I/O. VMware has a paravirtualized I/O device driver, vmxnet, thatshares data structures with the hypervisor [10]. “A Performance Comparison ofHypervisors” states that by using the paravirtualized vmxnet network driver theycan now run network I/O intensive datacenter applications with very acceptablenetwork performance [24].

3.2.3 Memory management

Paravirtual interfaces can be used by both the hypervisor and guest to reduce hy-pervisor complexity and overhead in virtualizing x86 paging [19]. When using aparavirtualized memory management unit, the guest operating system page tablesare registered directly with the MMU [22]. To reduce the overhead and complexityassociated with the use of shadow page tables, the guest operating system has read-only access to the page tables. A page table update is passed to Xen via a hypercalland validated before being applied. Guest operating systems can locally queue pagetable updates and apply the entire batch with a single hypercall. This minimizesthe number of hypercalls needed for the memory management.

3.3 First generation hardware support

In the meantime, processor vendors noticed that virtualization was becoming in-creasingly popular and they created a solution that solves the virtualization prob-lem on the x86 architecture by introducing hardware assisted support. Hardwaresupport for processor virtualization enables simple, robust and reliable hypervisorsoftware [25]. It eliminates the need for the hypervisor to listen, trap and executecertain instructions for the guest OS [26]. Both Intel and AMD provide these hard-ware extensions in the form of Intel VT-x and AMD SVM respectively [11, 27, 28].

The first generation hardware support introduces a data structure for virtualiza-tion, together with specific instructions and a new execution flow. In AMD SVM,the data structure is called the virtual machine control block (VMCB). The VMCBcombines control state with the guest’s processor state. Each guest has its ownVMCB with its own control state and processor state. The VMCB contains a list ofwhich instructions or events in the guest to intercept, various control bits and theguest’s processor state. The various control bits specify the execution environmentof the guest or indicate special actions to be taken before running guest code. TheVMCB is accessed by reading and writing to its physical address. The executionenvironment of the guest is referred to as guest mode. The execution environment ofthe hypervisor is called host mode. The new VMRUN instruction transfers controlfrom host to guest mode. The instruction saves the current processor state and loadsthe corresponding guest state from the VMCB. The processor now runs the guestcode until an intercept event occurs. This results in a #VMEXIT at which point

Page 31: Nesting Virtual Machines in Virtualization Test …nyh/nested/Thesis_OlivierBergh...Nesting Virtual Machines in Virtualization Test Frameworks Dissertation submitted on May 2010 to

3.3. FIRST GENERATION HARDWARE SUPPORT 20

the processor writes the current guest state back to the VMCB and resumes hostexecution at the instruction following the VMRUN. The processor is then executingthe hypervisor again. The hypervisor can retrieve information from the VMCB tohandle the exit. When the effect of the exiting operation is emulated, the hypervisorcan execute VMRUN again to return to guest mode.

Although Intel has implemented their own version of hardware support, it hasmany similarities with the implementation of AMD although the terminology issomewhat different. Intel uses a virtual machine control structure (VMCS) insteadof a VMCB. A VMCS can be manipulated by the new instructions VMCLEAR,VMPTRLD, VMREAD and VMWRITE which clears, loads, reads from, and writesto a VMCS respectively. The hypervisor runs in “VMX root operation“ and theguest in ”VMX non-root operation“ instead of host and guest mode. Softwareenters the VMX operation by executing the VMXON instruction. From then on, thehypervisor can use a VMEntry to transfer control to one of its guest. There are twoinstructions available for triggering a VMEntry: VMLAUNCH and VMRESUME.As with AMD SVM, the hypervisor regains control using VMExits. Eventually, thehypervisor can leave the VMX operation with the instruction VMXOFF.

Figure 3.2: Execution flow using virtualization based on Intel VT-x.

The execution flow of a guest, virtualized by hardware support, can be seenin figure 3.2. The VMXON instruction starts and the VMXOFF stops the VMXoperation. The guest is started using a VMEntry which loads the VMCS of theguest into the hardware. The hypervisor regains control using a VMExit when aguest tries to execute a privileged instruction. After intervention of the hypervisor,a VMEntry transfers control back to the guest. In the end, the guest can shut downand control is handed back to the hypervisor with a VMExit.

The basic idea behind the first generation hardware support is to fix the problemthat the x86 architecture cannot be virtualized. The VMExit forces a transitionfrom guest to hypervisor, which is based on the trap all exceptions and privilegedinstructions philosophy. Nevertheless, each transition between the hypervisor and a

Page 32: Nesting Virtual Machines in Virtualization Test …nyh/nested/Thesis_OlivierBergh...Nesting Virtual Machines in Virtualization Test Frameworks Dissertation submitted on May 2010 to

3.3. FIRST GENERATION HARDWARE SUPPORT 21

virtual machine requires a fixed amount of processor cycles. When the hypervisor hasto handle a complex operation, the overhead is relatively low. However, for a simpleoperation the overhead of switching from guest to hypervisor and back is relativelyhigh. Creating processes, context switches, small page table updates are all simpleoperations that will have a large overhead. In these cases, software solutions likebinary translation and paravirtualization perform better than hardware supportedvirtualization.

The overhead can be improved by reducing the number of processor cycles re-quired for a transition between guest and hypervisor. The exact number of extraprocessor cycles depends on the processor architecture. For Intel, the format and lay-out of the VMCS in memory is not architecturally defined, allowing implementation-specific optimizations to improve performance in VMX non-root operation and toreduce the latency of a VMEntry and VMExit [29]. Intel and AMD are improvingthese latencies in their next processors, as you can see for Intel in figure 3.3.

Figure 3.3: Latency reductions by CPU implementation [30].

System calls are an example of complex operations having a low transition over-head. System calls do not automatically transfer control from the guest to thehypervisor in hardware supported virtualization. A hypervisor intervention is onlyneeded when the system call contains critical instructions. The overhead when a sys-tem call requires intervention is relatively low since a system call is rather complexand already requires a lot of processor cycles.

First generation hardware support does not include support for I/O virtualiza-tion and memory management unit virtualization. Hypervisors that use the firstgeneration hardware extensions will need to use a software technique for virtualiz-ing the I/O devices and the MMU. For the MMU, this can be done using shadowtables or paravirtualization of the MMU.

Page 33: Nesting Virtual Machines in Virtualization Test …nyh/nested/Thesis_OlivierBergh...Nesting Virtual Machines in Virtualization Test Frameworks Dissertation submitted on May 2010 to

3.4. SECOND GENERATION HARDWARE SUPPORT 22

3.4 Second generation hardware support

First generation hardware support has made the x86 architecture virtualizable, butonly in some cases an improvement in performance can be measured [11]. Main-taining the shadow tables can be an intensive task, as was pointed out in subsec-tion 3.1.3. The next step of the processor vendors was to provide hardware MMUsupport. This second generation hardware support adds memory management sup-port so the hypervisor does not have to maintain the integrity of the shadow pagetable mappings [17].

The shadow page tables remove the need to translate the virtual memory of theprocess to the guest OS physical memory and then translate the latter into the realphysical memory, as can be seen in figure 3.1. It provides the ability to immediatelytranslate the virtual memory of the guest process into real physical memory. On theother hand, the hypervisor must do the bookkeeping to keep the shadow page tableup to date when an update occurs to the guest OS page table. In existing softwaresolutions like binary translation, this bookkeeping introduces overhead which waseven worse for first generation hardware support. The hypervisor must maintainthe shadow page tables and every time a guest tries to translate a memory address,the hypervisor must intervene. In software solutions this intervention is an extrapage fault, but in the first generation hardware support this will result in a VMExitand VMEntry roundtrip. As shown in figure 3.3, the latencies of such a roundtripare improving but the second generation hardware support removes the need for theroundtrip.

Intel and AMD introduced their own hardware MMU support. Like the firstgeneration hardware support, this results in two different implementation but withsimilar characteristics. Intel proposed the extended page tables (EPT) and AMDproposed their nested page tables (NPT). In Intel’s EPT, the page tables translatefrom virtual memory to guest physical addresses while a separate set of page tables,the extended page tables, translate from guest physical addresses to the real physicaladdresses [29]. The guest can modify its page tables without hypervisor intervention.The new extended page tables remove the VMExits associated with page tablevirtualization.

AMD’s nested paging also use additional page tables, the nested page tables(nPT), to translate guest physical addresses to real physical addresses [19]. Theguest page tables (gPT) map the virtual memory addresses to guest physical ad-dresses. The gPT are set up by the guest and the nPT by the hypervisor. Whennested paging is enabled and a guest attempts to reference memory using a virtualaddress, the page walker performs a two dimensional walk using the gPT and nPTto translate the guest virtual address to the real physical address. Like Intel’s EPT,nested paging removes the overheads associated with software shadow paging.

Another feature introduced by both Intel and AMD in the second generationhardware support is tagged TLBs. Intel uses Virtual-Processor Identifiers (VPIDs)that allow a hypervisor to assign a different identifier to each virtual processor. Thezero VPID is reserved for the hypervisor itself. The processor then uses the VPIDs

Page 34: Nesting Virtual Machines in Virtualization Test …nyh/nested/Thesis_OlivierBergh...Nesting Virtual Machines in Virtualization Test Frameworks Dissertation submitted on May 2010 to

3.5. CURRENT AND FUTURE HARDWARE SUPPORT 23

to tag translations in the TLB. AMD calls these identifiers the Address Space IDs(ASIDs). During a TLB lookup, the VPID or ASID value of the active guest ismatched against the ID tag in the TLB entry. In this way, TLB entries belongingto different guests and to the hypervisor can coexist without causing incorrect ad-dress translations. The tagged TLBs eliminate the need for TLB flushes on everyVMEntry and VMExit, furthermore it eliminates the impact of those flushes onperformance. The tagged TLBs are an improvement compared to the other virtu-alization techniques. These techniques need to flush the TLB every time a guestswitches to the hypervisor or back. The drawback of the extended page tables ornested paging is that a TLB miss has a larger performance hit for guests because itintroduces an additional level of address translation. This is rectified by making theTLBs much larger than before. Previous techniques like shadow page tables imme-diately translate the virtual guest address to the real physical address eliminatingthe additional level of address translation.

The second generation hardware support is completely focussed on the improve-ment of the memory management. It eliminates the need for the hypervisor tomaintain the shadow tables and eliminates the TLB flushes. The EPT and NPThelp to improve performance for memory intensive workloads.

3.5 Current and future hardware support

Intel and AMD are still working on support for virtualization. They are improvingthe latencies of the VMEntry and VMExit instructions, but are also working on newhardware techniques for supporting virtualization on the x86 architecture. The firstgeneration hardware support for virtualization was based primarily on the processorand the second generation focusses on the memory management unit. The finalcomponent required next to CPU and memory virtualization is device and I/Ovirtualization [10]. Recent techniques are Intel VT-d and AMD IOMMU.

There are three general techniques for I/O virtualization. The first technique isemulation and is described in subsection 3.1.2. The second technique, explained insubsection 3.2.2, is paravirtualization. The last technique is direct I/O. The device isnot virtualized but assigned directly to a guest virtual machine. The guest’s devicedrivers are used for the dedicated device.

In order to improve the performance for I/O virtualization, Intel and AMD arelooking at allowing virtual machines to talk to the device hardware directly. WithIntel VT-d and AMD IOMMU, hardware support is introduced to support assigningI/O devices to virtual machines. In such cases, the ability to multiplex the I/O deviceis lost. Depending on the I/O device, this does not need to be an issue. For example,network card interfaces can easily be added to the hardware in order to provide aNIC for each virtual machine.

Page 35: Nesting Virtual Machines in Virtualization Test …nyh/nested/Thesis_OlivierBergh...Nesting Virtual Machines in Virtualization Test Frameworks Dissertation submitted on May 2010 to

3.6. VIRTUALIZATION SOFTWARE 24

3.6 Virtualization software

There are many different virtualization implementations. This section gives anoverview of some well-known virtualization software. Each implementation can beplaced in the categories explained throughout the previous sections.

3.6.1 VirtualBox

VirtualBox is a hosted hypervisor that performs full virtualization. It started as pro-prietary software but currently comes under a Personal Use and Evaluation License(PUEL). The software is free of charge for personal and educational use. Virtual-Box was initially created by Innotek and was released as an Open Source Editionon January 2007. The company was later purchased by Sun Microsystems, whichin turn was recently purchased by Oracle Corporation. VirtualBox software runs onWindows, Linux, Mac OS X and Solaris hosts. In depth information can be foundon the wiki of their site [31], more specifically in the technical documentation [32].Appendix A.1 presents an overview of VirtualBox, which is largely based on thetechnical documentation. A short summary is given in the following paragraph.

VirtualBox started as a pure software solution for virtualization. The hypervisorused dynamic binary translation to fix the problem of virtualization in the x86architecture. With the arrival of hardware support for virtualization, VirtualBoxnow also supports Intel VT-x and AMD SVM. The host operating system runs eachVirtualBox virtual machine as an application, i.e. just another process in the hostoperating system. A ring 0 driver needs to be loaded in the host OS for VirtualBoxto work. It only performs a few tasks: allocating physical memory for the virtualmachine, saving and restoring CPU registers and descriptor tables, switching fromhost ring 3 to guest context and enabling or disabling hardware support. The guestoperating system is manipulated to execute its ring 0 code in ring 1. This couldresult in poor performance since there is a possibility of generating a large amountof additional instruction faults. To address these performance issues, VirtualBox hascome up with a Patch Manager (PATM) and Code Scanning and Analysis Manager(CSAM). The PATM will scan code recursively and replace problematic instructionswith a jump to hypervisor memory where a more suitable implementation is placed.Every time a fault occurs, the CSAM will analyze the fault’s cause and determine ifit is possible to patch the offending code to prevent it from causing more expensivefaults.

3.6.2 VMware

VMware [33] provides several virtualization products. The company was founded in1998 and they released their first product, VMware Workstation, in May 1999. In2001, they also entered the server market with VMware GSX Server and VMwareESX Server. Currently, VMware provides a variety of products for datacenter anddesktop solutions together with management products. VMware software runs onWindows and Linux, and since the introduction of VMware Fusion it also runson Mac OS X. Like VirtualBox, VMware started with a software only solution

Page 36: Nesting Virtual Machines in Virtualization Test …nyh/nested/Thesis_OlivierBergh...Nesting Virtual Machines in Virtualization Test Frameworks Dissertation submitted on May 2010 to

3.6. VIRTUALIZATION SOFTWARE 25

for their hypervisors. In contrast with VirtualBox, VMware does not release thesource code of their products. VMware now supports both full virtualization withbinary translation and hardware assisted virtualization, and has a paravirtualizedI/O device driver, vmxnet, that shares data structures with the hypervisor [10].

VMware Server is a free product based on the VMware virtualization technology.It is a hosted hypervisor that can be installed in Windows or Linux hosts. A web-based user interface provides a simple way to manage virtual machines. Another freedatacenter product is VMware ESXi. It provides the same functionality but uses anative, bare-metal architecture for its hypervisor. VMware ESXi needs a dedicatedserver but has better performance. VMware makes these products available at nocost in order to help companies of all sizes experience the benefits of virtualization.

The desktop product is VMware Player. It is free for personal non-commercialuse and allows users to create and run virtual machines on a Windows or Linuxhost. It is a hosted hypervisor since this is common practice for desktop products.If users need developer-centric features, they can upgrade to VMware Workstation.

3.6.3 Xen

Xen [34] is an open source example of virtualization software that uses paravirtu-alization. It is a native, bare-metal hypervisor for the x86 architecture and wasinitially created by the University of Cambridge Computer Laboratory in 2003 [22].Xen is designed to allow multiple commodity operating systems to share conven-tional hardware. In 2007, Citrix Systems acquired the source of Xen and intendedto freely license to all vendors and projects that implement the Xen hypervisor.Since 2010, the Xen community maintains and develops Xen. The Xen hypervisoris licensed under the GNU General Public License.

After installation of the Xen hypervisor, the user can boot into Xen. When thehypervisor is started, it automatically boots a guest, domain 0, that has specialmanagement privileges and direct access to the physical hardware [35].

I/O devices are not emulated but Xen exposes a set of clean and simple deviceabstractions. There are two possibilities to run device drivers. In the first one,domain 0 is responsible for running the device drivers for the hardware. It willrun a BackendDriver which queues requests from other domains and relays them tothe real hardware driver. Each domain communicates with domain 0 through theFrontendDriver to access the devices. To the applications and the kernel, this driverlooks like a normal device. The other possibility is that a driver domain has beengiven the responsibility for a particular piece of hardware. It runs the hardwaredriver and the backend driver for that device class. When the hardware driver fails,only this domain is affected and all other domains will survive.

Apart from running paravirtualized guests, Xen supports Intel VT-x and AMDSVM since version 3.0.0 and 3.0.2 respectively. This allows users to run unmodifiedguest operating system in Xen.

3.6.4 KVM

KVM [36], short for Kernel-based Virtual Machine, is a virtualization product thatuses hardware support exclusively. Instead of creating major portions of an operating

Page 37: Nesting Virtual Machines in Virtualization Test …nyh/nested/Thesis_OlivierBergh...Nesting Virtual Machines in Virtualization Test Frameworks Dissertation submitted on May 2010 to

3.6. VIRTUALIZATION SOFTWARE 26

system kernel, as other hypervisors have done, the KVM developers turned thestandard Linux kernel into a hypervisor. By developing KVM as a loadable module,the virtualized environment can benefit from all the ongoing work on the Linuxkernel itself and reduce redundancy [37]. KVM uses a driver (”/dev/kvm“) thatcommunicates with the kernel and acts as an interface for userspace virtual machines.The initial version of KVM was released in November 2006 and it was first includedin the Linux kernel 2.6.20 on February 2007.

The recommended way of installing KVM is through the packaging system of aLinux distribution. The latest version of the KVM kernel modules and supportinguserspace can be found on their website. You can find the kernel modules in thekvm-kmod-kernel version releases and the userspace components are found in qemu-kvm-version. The latter is the stable branch of KVM based on QEMU [38] withthe KVM extras on top. QEMU is a machine emulator and can run an unmodifiedtarget operating system and all its applications in a virtual machine. The kvm-version releases are development releases but they are outdated.

Every virtual machine is a Linux process, scheduled by the standard Linux sched-uler [39]. A normal Linux process has two modes of execution: kernel and usermode. KVM adds a third mode of execution, guest mode. Processes that are runfrom within the virtual machine run in guest mode. Hardware virtualization is usedto virtualize the processor, memory management is handled by the host kernel andI/O is handled in user space through QEMU.

In this text, KVM is considered as a hosted hypervisor but there are some dis-cussions1 that KVM is more a native, bare-metal hypervisor. One side argues thatKVM turns Linux into a native, bare-metal hypervisor because Linux becomes thehypervisor and is running directly on top of the hardware. The other side arguesthat KVM runs on top of Linux and should be considered as hosted hypervisor.Regardless of what type of hypervisor KVM actually is, this text will consider KVMto be a hosted hypervisor.

3.6.5 Comparison between virtualization software

A high-level comparison is given in table 3.1. All virtualization products in thetable, except Xen, are installed within a host operating system. Xen is installeddirectly on the hardware. Most products provide two techniques for virtualizationon x86 architectures. Hardware support for virtualization on x86 architectures issupported by all virtualization software in the table.

1http://virtualizationreview.com/Blogs/Mental-Ward/2009/02/

KVM-BareMetal-Hypervisor.aspx

Page 38: Nesting Virtual Machines in Virtualization Test …nyh/nested/Thesis_OlivierBergh...Nesting Virtual Machines in Virtualization Test Frameworks Dissertation submitted on May 2010 to

3.6. VIRTUALIZATION SOFTWARE 27

VirtualBoxVMware

WorkstationXEN KVM

Hypervisor type Hosted HostedNative,

bare-metalHosted

Dynamic binarytranslation

X X

Paravirtualization X

Hardware support X X X X

Table 3.1: Comparison between a selection of the most popular hypervisors.

Page 39: Nesting Virtual Machines in Virtualization Test …nyh/nested/Thesis_OlivierBergh...Nesting Virtual Machines in Virtualization Test Frameworks Dissertation submitted on May 2010 to

CHAPTER 4

Nested virtualization

The focus of this thesis lies with nested virtualization on x86 architectures. Nestedvirtualization is executing a virtual machine inside a virtual machine. In case ofmultiple nesting levels, one can also talk about recursive virtual machines. In 1973and 1975 initial research was published about properties of recursive virtual machinearchitectures [40, 41]. These works refer to virtualization that was used in main-frames so that users could work simultaneously on a single mainframe. Multiple usecases come in mind for using nested virtualization.

• A possible use case for nested x86 virtualization is the development of testsetups for research purposes. Research in cluster1 and grid2 computing requiresextensive test setups, which might not be available. The latest developmentsin the research of grid and cluster computing make use of virtualization atdifferent levels. Virtualization can be used for all, or certain, components ofa grid or cluster. It can also be used to run applications within the grid orcluster in a sandbox environment. If certain performance limitations are notan issue, virtualizing all components of such a system can eliminate the needto acquire the entire test setup. Because these virtualized components, e.g.Eucalyptus3 or OpenNebula4, might use virtualization for running applicationsin a sandbox environment, two levels of virtualization are used. Nesting thephysical machines of a cluster or grid as virtual machines on one physicalmachine can offer security, fault tolerance, legacy support, isolation, resourcecontrol, consolidation, etc.

1A cluster is a group of interconnected computers working together as a single, integrated com-puter resource [42, 43].

2There is no strict definition of a grid. In [44], Bote-Lorenzo et al. listed a number of attempts tocreate a definition. Ian Foster created a three point checklist that combine the common propertiesof a grid. [45]

3http://www.eucalyptus.com4http://www.opennebula.org

Page 40: Nesting Virtual Machines in Virtualization Test …nyh/nested/Thesis_OlivierBergh...Nesting Virtual Machines in Virtualization Test Frameworks Dissertation submitted on May 2010 to

29

• A second possible use case is the creation of a test framework for hypervi-sors. As virtualization allows testing and debugging an operating system bydeploying the OS in a virtual machine, nested virtualization allows testing anddebugging a hypervisor inside a virtual machine. It eliminates the need for aseparate physical machine where a developer can test and debug a hypervisor.

• Another possible use case is the use of virtual machines inside a server rentedfrom the cloud5. Such a server is virtualized on its own so that the cloud vendorcan make optimal use of its resources. For example, Amazon EC26 offersvirtual private servers which are virtual machines using the Xen hypervisor.Hence, if a user wants to use virtualization software inside this server, nestedx86 virtualization is needed in order to make that setup work.

As explained in chapter 2, virtualization on the x86 architecture is not straight-forward. This has resulted in the emergence of several techniques that are givenin chapter 3. These different techniques produce many different combinations tonest virtual machines. A nested setup can consist of the same technique for bothhypervisors, but it can also consist of a different technique for either the first levelhypervisor or the nested hypervisor. Hence, if we divide the techniques in three ma-jor groups: dynamic binary translation, paravirtualization and hardware support,there are nine possible combinations for nesting a virtual machine inside anothervirtual machine. In the following sections, the theoretical possibilities and require-ments for each of these combinations are given. The results of nested virtualizationon x86 architectures are given in chapter 5.

Figure 4.1: Layers in a nested virtualization setup with hosted hypervisors.

To prevent confusion about which hypervisor or guest is meant, some terms areintroduced. In a nested virtualization setup, there are two levels of virtualization, see

5Two widely accepted definitions of the term cloud can be found in [46] and [47].6http://aws.amazon.com/ec2/

Page 41: Nesting Virtual Machines in Virtualization Test …nyh/nested/Thesis_OlivierBergh...Nesting Virtual Machines in Virtualization Test Frameworks Dissertation submitted on May 2010 to

4.1. DYNAMIC BINARY TRANSLATION 30

figure 4.1. The first level, referred to as L1, is the layer of virtualization that is usedin a non-nested setup. Thus, this level is the virtualization layer that is closest to thehardware. The terms L1 or bottom layer indicate the first level of virtualization, e.g.the L1 hypervisor is the hypervisor that is used in the first level of virtualization.The second level, referred to as L2, is the new layer of virtualization, introduced bythe nested virtualization. Hence, the terms L2, nested or inner indicate the secondlevel of virtualization, e.g. the L2 hypervisor is the hypervisor that will be installedinside the L1 guest.

4.1 Dynamic binary translation

This section focusses on L1 hypervisors that use dynamic binary translation fornested virtualization on x86 architectures. This can be in the host operating sys-tem or directly on the hardware. The hypervisor can be VirtualBox (see subsec-tion 3.6.1), a VMware product (see subsection 3.6.2) or any other hypervisor usingdynamic binary translation. The nested hypervisor can be any hypervisor, resultingin three major combinations. Each combination uses a nested hypervisor that allowsvirtualization through a different technique. The nested hypervisor will be installedin a guest virtualized by the L1 hypervisor. The first combination is again a hy-pervisor using dynamic binary translation. In the second combination a hypervisorusing paravirtualization is installed in the guest. The last combination is a nestedhypervisor that uses hardware support.

It should be theoretically possible to nest virtual machines using dynamic binarytranslation as L1. When using dynamic binary translation, no modifications areneeded to the hardware or to the operating system, as pointed out in section 3.1.Code running in ring 0 will actually run in ring 1, but the guest is not aware of this.

Dynamic binary translation: The first combination nests a L2 hypervisorinside a guest virtualized by a L1 hypervisor where both hypervisors are based ondynamic binary translation. The L2 hypervisor will be running in guest ring 0.Since the hypervisor will not be aware that its code is actually running in ring 1, itshould be possible to run a hypervisor in this guest.

The nested hypervisor will have to take care of the memory management inthe L2 guest. It will have to maintain the shadow page tables for its guests, seesubsection 3.1.3. The hypervisor uses these shadow page tables to translate theL2 virtual memory addresses to, what it thinks to be, real memory equivalents.But actually these translated addresses are in the virtual memory range of the L1guest and can be converted to real memory addresses by the shadow page tablesmaintained by the L1 hypervisor. The memory architecture in a nested setup isillustrated in figure 4.2. For a L1 guest, there are two levels of address translationas shown in figure 3.1. A nested guest has three levels of address translation resultingin the need for shadow tables in the L2 hypervisor.

Paravirtualization: The second combination uses paravirtualization as tech-nique for the L2 hypervisor. This situation is the same as the situation with dynamic

Page 42: Nesting Virtual Machines in Virtualization Test …nyh/nested/Thesis_OlivierBergh...Nesting Virtual Machines in Virtualization Test Frameworks Dissertation submitted on May 2010 to

4.1. DYNAMIC BINARY TRANSLATION 31

Figure 4.2: Memory architecture in a nested situation.

binary translation for the L2 hypervisor. The hypervisor using paravirtualizationwill be running in guest ring 0 and is not aware that it is actually running in ring1. This should make it possible to nest a L2 hypervisor based on paravirtualizationwithin a guest virtualized by a L1 hypervisor using dynamic binary translation.

Hardware supported virtualization: The virtualized processor that isavailable to the L1 guest is based on the x86 architecture in order to allow cur-rent operating system to work in the virtualized environment. However, are theextensions (see section 3.3 and 3.4) for virtualization on x86 architectures also in-cluded? In order to use a L2 hypervisor based on hardware support within the L1guest, the L1 hypervisor should virtualize or emulate the virtualization extensionsof the processor. A virtualization product that is based on hardware supportedvirtualization needs these extra extensions. If the extensions are not available, thehypervisor cannot be installed or activated. If the L1 hypervisor provides these ex-tensions, chances are that it requires a physical processor with the same extensions.It might be possible for hypervisors based on dynamic binary translation to providethe extensions without having a processor that supports the hardware virtualization.However, all current processors have these extensions. Therefore it is very unlikelythat developers will incorporate functionality that provides the hardware support tothe guest without a processor with hardware support for x86 virtualization.

Memory management in the L2 guest based on hardware support is not possiblebecause the second generation hardware support only provides two levels of addresstranslation. The L1 hypervisor should provide the EPT or NPT functionality to theguest together with the first generation hardware support, but it will have to use asoftware technique for the implementation of the MMU.

Page 43: Nesting Virtual Machines in Virtualization Test …nyh/nested/Thesis_OlivierBergh...Nesting Virtual Machines in Virtualization Test Frameworks Dissertation submitted on May 2010 to

4.2. PARAVIRTUALIZATION 32

4.2 Paravirtualization

The situation for nested virtualization is quite different when using paravirtualiza-tion as the bottom layer hypervisor. The most popular example of a hypervisorbased on paravirtualization is Xen (see subsection 3.6.3). There are again threecombinations. A nested hypervisor can be the same as the bottom layer hypervisor,based on paravirtualization. The second combination is the case where a dynamicbinary translation based hypervisor is used as the nested hypervisor. In the last com-bination a hypervisor based on hardware support is nested in the paravirtualizedguest. The main difference is that the L1 guest is aware of the virtualization.

Dynamic binary translation and paravirtualization: The paravirtual-ized guest is aware of the virtualization and should use the hypercalls provided bythe hypervisor. The guest’s operating system should be modified to use these hyper-calls, thus all code in the guest that runs in kernel mode needs these modificationsin order to work in the paravirtualized guest. This has major consequences for anested virtualization setup. A nested hypervisor can only work in a paravirtualizedenvironment if it is modified to work with these hypercalls. A native, bare-metalhypervisor should be adapted so that all ring 0 code is changed. For a hosted hy-pervisor this indicates that the module, that is loaded into the kernel of the hostoperating system, is modified to work in the paravirtualized environment. Hence,companies that develop virtualization products need to actively make their hyper-visors compatible for running inside a paravirtualized guest.

Memory management of the L2 guests is done by the nested hypervisor. Thepages tables of the L1 guests are directly registered with the MMU, so the nestedhypervisor can use the hypercalls to register its page tables with the MMU. A nestedhypervisor based on paravirtualization might allow a L2 guest to register its pagetables directly with the MMU, while a nested hypervisor based on dynamic binarytranslation will maintain shadow tables.

Hardware supported virtualization: Hardware support for x86 virtual-ization is also for paravirtualization an exceptional case. The L1 hypervisor shouldprovide the extensions for the hardware support to the guests, probably by meansof hypercalls. Modified hypervisors based on hardware support can then use thehardware extensions. Second generation hardware support can also only be used ifit is provided by the L1 hypervisor, together with first generation hardware support.

In conclusion, nested virtualization with paravirtualization as a bottom layerneeds modifications to the nested hypervisor, whereas nested virtualization withdynamic binary translation as bottom layer did not need these changes. On theother hand, the guests know that they are virtualized which might influence theperformance of the L2 guests in a positive way. The nested virtualization will notwork unless support is actively introduced. There is a low likelihood that virtual-ization software developers are willing to incorporate these modifications in theirhypervisors since the cost of the implementation does not exceed the benefits.

Page 44: Nesting Virtual Machines in Virtualization Test …nyh/nested/Thesis_OlivierBergh...Nesting Virtual Machines in Virtualization Test Frameworks Dissertation submitted on May 2010 to

4.3. HARDWARE SUPPORTED VIRTUALIZATION 33

4.3 Hardware supported virtualization

The last setup is to use a hypervisor based on hardware support for x86 virtualiza-tion as the bottom layer. For this configuration a processor is required that has theextensions for hardware support. KVM (see subsection 3.6.4) is a popular exampleof such a hypervisor but the latest versions of VMware, VirtualBox and Xen canalso use hardware support. As with the previous configurations, there are threecombinations. The combination using a hypervisor that is based on the same tech-nique as the L1 hypervisor. A combination where a hypervisor based on dynamicbinary translation is used and the last combination where a paravirtualization basedhypervisor is the nested hypervisor.

Dynamic binary translation and paravirtualization: These combina-tions are similar to the combinations where a hypervisor based on dynamic binarytranslation is used as bottom layer. A guest or its operating system does not needmodifications, hence it should in theory be possible to nest virtual machines in asetup where the bottom layer hypervisor is based on hardware support. The nestedhypervisor thinks its code is running in ring 0, but actually it is running in the guestmode of the processor, which is a result of a VMRUN or VMEntry instruction.

The memory management depends on whether the processor supports the sec-ond generation hardware support. If the processor does not support this, the L1hypervisor uses a software technique for virtualizing the MMU. In this case, memorymanagement will be the same as with dynamic binary translation where both L1and L2 hypervisor maintain shadow tables for virtualizing the MMU. Whereas, ifthe processor does support the hardware MMU, then the L1 hypervisor does notneed to maintain these shadow tables which can improve the performance.

Hardware supported virtualization: As for the other configurations, hard-ware support for nested hypervisors is a special case. The virtualized processor thatis provided to the L1 guest is based on the x86 processor but needs to contain thehardware extensions for virtualization if the nested hypervisor uses hardware sup-port. If the L1 hypervisor does not provide these hardware extensions to its guests,only the combination with a nested hypervisor that uses dynamic binary transla-tion or paravirtualization can work. KVM and Xen are doing research and work toprovide hardware extensions for virtualization on the x86 architecture to the guests.More details are given in section 5.4.

The hardware support for EPT or NPT (see section 3.4) in the guest, whichcan also be referred to as nested EPT or nested NPT, deserves special attentionaccording to Avi Kivity [48]. Avi Kivity is a lead developer and maintainer of KVMand posted some interesting information about nested virtualization on his blog.Nested EPT or nested NPT can be critical for obtaining reasonable performance.The guest hypervisor needs to trap and service context switches and writes to guesttables. A trap in the guest hypervisor is multiplied by quite a large factor into KVMtraps. Since the hardware only supports two levels of address translation, nestedEPT or NPT should be implemented in software.

Page 45: Nesting Virtual Machines in Virtualization Test …nyh/nested/Thesis_OlivierBergh...Nesting Virtual Machines in Virtualization Test Frameworks Dissertation submitted on May 2010 to

CHAPTER 5

Nested virtualization in Practice

The previous chapter gave some insight in the theoretical requirements of nested x86virtualization. The division into three categories resulted in nine combinations. Thischapter presents how nested x86 virtualization behaves in practice. Each of the ninecombinations is tested and performance tests are executed on working combinations.The results of these tests are discussed in the following chapter. The combinationsthat fail to run are analyzed in order to find the reason for the failure.

A selection of the currently popular virtualization products are tested. Theseproducts are VirtualBox, VMware Workstation, Xen and KVM as discussed in sec-tion 3.6. Table 3.1 shows a summary of these hypervisors and the supported virtu-alization techniques. There are seven different hypervisors if we consider that theproducts with multiple techniques consist of different hypervisors. For each hypervi-sor we can nest the other seven hypervisors. Thus, nesting these hypervisors resultin 49 different setups, which will be described in the following sections. Detailsof the tests are given in appendix B. It lists the used configuration for each setuptogether with version information of the hypervisors and the result of the setup.

The subsection in which each nested setup can be found is summarized in ta-ble 5.1. The columns of the table represent the L1 hypervisors and the rows representthe L2 hypervisors, i.e. the hypervisor represented by the row is nested inside thehypervisor represented by the column. For example, information about the nestedsetup where VirtualBox based on dynamic binary translation is nested inside Xenusing paravirtualization, can be found in subsection 5.1.2. The table cells for setupswith a L1 hypervisor based on hardware support are split in two cells, the uppercell represents the nested setup tested on a processor with first generation hardwaresupport. The bottom cell represents the setup tested on a processor with secondgeneration hardware support.

Page 46: Nesting Virtual Machines in Virtualization Test …nyh/nested/Thesis_OlivierBergh...Nesting Virtual Machines in Virtualization Test Frameworks Dissertation submitted on May 2010 to

35

Subs

ecti

ons

Vir

tual

Box

VM

war

eX

EN

KV

MG

en.

HV

DB

TH

VD

BT

HV

PV

HV

HV

Vir

tual

Box

DB

T5.

1.1

5.2.

15.

1.1

5.2.

15.

1.2

5.2.

15.

2.1

1st

gen.

5.3.

15.

3.1

5.3.

15.

3.1

2nd

gen.

HV

5.1.

15.

2.3

5.1.

15.

2.3

5.1.

25.

2.3

5.2.

31s

tge

n.

5.3.

35.

3.3

5.3.

35.

3.3

2nd

gen.

VM

war

e

DB

T5.

1.1

5.2.

15.

1.1

5.2.

15.

1.2

5.2.

15.

2.1

1st

gen.

5.3.

15.

3.1

5.3.

15.

3.1

2nd

gen.

HV

5.1.

15.

2.3

5.1.

15.

2.3

5.1.

25.

2.3

5.2.

31s

tge

n.

5.3.

35.

3.3

5.3.

35.

3.3

2nd

gen.

Xen

PV

5.1.

15.

2.2

5.1.

15.

2.2

5.1.

25.

2.2

5.2.

21s

tge

n.

5.3.

25.

3.2

5.3.

25.

3.2

2nd

gen.

HV

5.1.

15.

2.3

5.1.

15.

2.3

5.1.

25.

2.3

5.2.

31s

tge

n.

5.3.

35.

3.3

5.3.

35.

3.3

2nd

gen.

KV

MH

V5.

1.1

5.2.

35.

1.1

5.2.

35.

1.2

5.2.

35.

2.3

1st

gen.

5.3.

35.

3.3

5.3.

35.

3.3

2nd

gen.

Table 5.1: Index table containing directions in which subsections information canbe found about a certain nested setup.

Page 47: Nesting Virtual Machines in Virtualization Test …nyh/nested/Thesis_OlivierBergh...Nesting Virtual Machines in Virtualization Test Frameworks Dissertation submitted on May 2010 to

5.1. SOFTWARE SOLUTIONS 36

5.1 Software solutions

5.1.1 Dynamic binary translation

In this subsection, we will give the results of actually nesting the virtual machines in-side a L1 hypervisor based on dynamic binary translation, as discussed in section 4.1.The nested hypervisors should not need modifications. Only the nested hypervisorsbased on hardware support for virtualization need a virtual processor that containsthe hardware extensions. The L1 hypervisors are VirtualBox and VMware Worksta-tion using dynamic binary translation for virtualizing guests. Since we test two L1hypervisors, this subsection describes 14 setups. These setups are described in thefollowing paragraphs categorized by their technique for the L2 hypervisor. The firstparagraphs elaborate on the setups that use dynamic binary translation on top ofdynamic binary translation. The next paragraph presents the setups that use para-virtualization as the L2 hypervisor, followed by a paragraph that presents the setupsthat use hardware support as the L2 hypervisor. The last paragraph concludes thissubsection with an overview.

Dynamic binary translation: Each setup that used dynamic binary trans-lation as both the L1 and L2 hypervisor resulted in failure. The setups either hungor crashed when starting the inner guest. In the two setups where VMware Work-station was nested inside VMware Workstation and VirtualBox was nested insideVirtualBox, the L2 guest became unresponsive when started. After a few hours, thenested guests were still trying to start, so these setups could be marked as failures.In both setups the L1 and L2 hypervisors were the same, the developers know whatinstructions and functionality is used by the nested hypervisor and may have fore-seen this situation. However, the double layer of dynamic binary translation seemsto be inoperative or too slow for a working nested setup with the same hypervisorfor both L1 and L2 hypervisors.

The other two setups, where VMware Workstation is nested in VirtualBox andVirtualBox is nested in VMware Workstation, resulted in a crash. In the formersetup the L1 VirtualBox guest crashed which indicates that the L2 guest tried to usefunctionality that is not fully supported by VirtualBox. This can be functionalitythat was left out in order to improve performance or a simple bug. In the othersetup, with VMware Workstation as the L1 hypervisor and VirtualBox as the L2,the VirtualBox guest crashed but the VMware Workstation guest stayed operational.The L2 guest noticed that some conditions are not met and crashes with an assertionfailure. In both setups, it seems that the L2 guest does not see a fully virtualizedenvironment and one of the guests, in particularly VirtualBox, reports a crash. Moreinformation about the reported crash is given in section B.1. A possible reason thatin both cases VirtualBox reports the crash is that VirtualBox is open source andcan allow more information to be viewed by its users.

Paravirtualization: Of the two setups that use paravirtualization on top ofdynamic binary translation, one worked and the other crashed. Figure 5.1 shows thelayers of these setups, where the L1 guest and the L2 hypervisor are represented bythe same layer. The setup with VMware Workstation as the L1 hypervisor allowed

Page 48: Nesting Virtual Machines in Virtualization Test …nyh/nested/Thesis_OlivierBergh...Nesting Virtual Machines in Virtualization Test Frameworks Dissertation submitted on May 2010 to

5.1. SOFTWARE SOLUTIONS 37

Figure 5.1: Layers for nested paravirtualization in dynamic binary translation.

a Xen guest to boot successfully. In the other setup, using VirtualBox, the L1 guestcrashed and reported a message similar to the setup with VMware Workstationinside VirtualBox (see section B.1). The result, one setup that works and onethat does not, gives some insight in the implementation of VMware Workstationand VirtualBox. The latter contains one or more bugs which make the L1 guestcrash when a nested hypervisor starts a guest. The functionality could be left outdeliberately because such a situation might not be very common. Leaving out theseexceptional situations allows developers to focus on more important functionality forallowing virtualization. On the other hand, VMware Workstation does provide thefunctionality and could be considered more mature for nested virtualization usingdynamic binary translation as the L1 hypervisor.

Hardware supported virtualization: VirtualBox and VMware Worksta-tion do not provide the x86 virtualization processor extensions to their guests. Thismeans that there is no hardware support available in the guests, neither for theprocessor, nor the memory management. Since four of the hypervisors are basedon hardware support, there are eight setups that contain such a hypervisor. Thelack of hardware support causes the failure of these eight setups. Implementing thehardware support in the L1 hypervisor using software, without underlying supportfrom the processor, could result in bad performance. However, if performance is notan issue, such a setup could be useful to simulate a processor with hardware supporton an incompatible processor.

Only one out of 14 setups worked with dynamic binary translation as the L1hypervisor. The successful combination is the Xen hypervisor using paravirtuali-zation within the VMware Workstation hypervisor. Other setups hung or crashedand VirtualBox reported the most crashes. VirtualBox seems to contain some bugsthat VMware Workstation does not have, resulting in crashes in the guest beingvirtualized by VirtualBox. Hardware support for virtualization is not present inthe L1 guest using VMware Workstation or VirtualBox, which eliminates the eightsetups with a nested hypervisor that needs the hardware extensions. Table 5.2 givesa summary of the setups described in this subsection. The columns represent the

Page 49: Nesting Virtual Machines in Virtualization Test …nyh/nested/Thesis_OlivierBergh...Nesting Virtual Machines in Virtualization Test Frameworks Dissertation submitted on May 2010 to

5.1. SOFTWARE SOLUTIONS 38

Virtual-Box VMware

DBT DBT

VirtualBoxDBT × ×

HV × ×

VMwareDBT × ×

HV × ×

XenPV × X

HV × ×KVM HV × ×

Table 5.2: The nesting setups with dynamic binary translation as the L1 hypervisortechnique. DBT stands for dynamic binary translation, PV for paravirtualizationand HV for hardware virtualization.

L1 hypervisors and the rows represent the L2 hypervisors.

5.1.2 Paravirtualization

Previous subsection described the setups that use dynamic binary translation as theL1 hypervisor. The following paragraphs elaborate on the use of a L1 hypervisorbased on paravirtualization. In section 4.2, we concluded that nested virtualizationwith paravirtualization as a bottom layer needs modifications to the nested hyper-visor. The L1 hypervisor used for the tests is Xen. In all the nested setups, theL2 hypervisor should be modified to use the paravirtual interfaces offered by Xeninstead of executing ring 0 code. We discuss the problems for each hypervisor tech-nique in the following paragraphs, together with what the setup would look like if thenested virtualization works. The last paragraph summarizes the setups described inthis subsection.

Paravirtualization: The paravirtualized guest does not allow the start of aXen hypervisor within the guest. The kernel loaded in the paravirtualized guest isa kernel adapted for paravirtualization. The Xen hypervisor is not adapted to usethe provided interface and hence the paravirtualized guest removes the other kernelsfrom the bootloader. The complete setup, see figure 5.2, consists of Xen as the L1hypervisor which automatically starts domain 0. This domain 0 is a L1 privilegedguest. Another domain would run the nested hypervisor, which in turn would runits automatically started domain 0 and a nested virtual machine.

Dynamic binary translation: The hypervisor of VMware Workstation andVirtualBox based on dynamic binary translation could not be loaded in the paravir-tualized guest. The reason is that the ring 0 code is not adapted for the paravir-tualization. In practice this expresses itself as the inability to compile the driver ormodule that needs to be loaded. It should be compiled against the kernel headers

Page 50: Nesting Virtual Machines in Virtualization Test …nyh/nested/Thesis_OlivierBergh...Nesting Virtual Machines in Virtualization Test Frameworks Dissertation submitted on May 2010 to

5.1. SOFTWARE SOLUTIONS 39

Figure 5.2: Layers for nested Xen paravirtualization.

but fails to compile since it does not recognize the version of the adapted kernel andits headers. The setup for dynamic binary translation as technique for the nestedhypervisor (see figure 5.3) differs from the previous setup (figure 5.2) in that theL2 hypervisor is on top of a guest operating system. Xen is a native, bare-metalhypervisor which runs directly on the hardware, i.e. in this case the virtual hard-ware. VMware Workstation and VirtualBox are hosted hypervisors and do need anoperating system between the hypervisor and the virtual hardware.

Figure 5.3: Layers for nested dynamic binary translation in paravirtualization.

Hardware supported virtualization: The other four setups, where a nestedhypervisor based on hardware support is used, have the same problem. None of thehypervisors are modified to run in a paravirtualized environment. In addition, thevirtualization extensions are not provided in the paravirtualized guest. Even if thehypervisors were adapted for the paravirtualization, they would still need theseextensions. These setups look like figure 5.2 or figure 5.3, depending on whether thenested hypervisor is hosted or native, bare-metal.

None of the seven setups with paravirtualization as bottom layer worked. Theresults of the setups are shown in table 5.3. The column with the header “Xen”represents the L1 hypervisor. The main problem is the adaptation of the hypervisors.

Page 51: Nesting Virtual Machines in Virtualization Test …nyh/nested/Thesis_OlivierBergh...Nesting Virtual Machines in Virtualization Test Frameworks Dissertation submitted on May 2010 to

5.2. FIRST GENERATION HARDWARE SUPPORT 40

XEN

PV

VirtualBoxDBT ×

HV ×

VMwareDBT ×

HV ×

XenPV ×HV ×

KVM HV ×

Table 5.3: The nesting setups with paravirtualization as the L1 hypervisor technique.

Unless these hypervisors are modified, paravirtualization is not a good choice as L1hypervisor technique. It will always depend on the adaptation of the hypervisor andone could only use that hypervisor. When using paravirtualization, the best onecould do is hope that developers adapt their hypervisors or modify the hypervisoroneself.

5.1.3 Overview software solutions

Previous subsections explain the results of nested virtualization with software solu-tions for the bottom layer hypervisor. This subsection gives an overview of all thepossible setups described in the previous subsections. All these setups are gatheredin table 5.4. The columns of the table represent the setups belonging to the sameL1 hypervisor. The rows in the table indicate a different nested hypervisor, i.e. thehypervisor represented by the row is nested inside the hypervisor represented by thecolumn.

Nested x86 virtualization using a L1 hypervisor based on a software solutionis not successful. Out of the 21 setups that were tested, only one setup allows tosuccessfully boot a L2 guest: nesting Xen inside VMware Workstation. Note that12 setups are unsuccessful simply because hardware support for x86 virtualizationis not available in the L1 guest.

5.2 First generation hardware support

The setups with a bottom layer hypervisor based on hardware support are describedin this section. The theoretical possibilities and requirements needed for these se-tups are discussed in section 4.3. The conclusion was that it should be possible tonest virtual machines without modifying the guest operating systems, given thatthe physical processor provides the hardware extensions for x86 virtualization. In

Page 52: Nesting Virtual Machines in Virtualization Test …nyh/nested/Thesis_OlivierBergh...Nesting Virtual Machines in Virtualization Test Frameworks Dissertation submitted on May 2010 to

5.2. FIRST GENERATION HARDWARE SUPPORT 41

Virtual-Box VMware XEN

DBT DBT PV

Subsection 5.1.1 5.1.1 5.1.2

VirtualBoxDBT × × ×

HV × × ×

VMwareDBT × × ×

HV × × ×

XenPV × X ×

HV × × ×

KVM HV × × ×

Table 5.4: Overview of the nesting setups with a software solution as the L1 hyper-visor technique.

chapter 3, the hardware support for x86 virtualization was divided into the first gen-eration and second generation hardware support. The second generation hardwaresupport adds a hardware supported memory management unit so that the hyper-visor does not need to maintain shadow tables. The original research was doneon a processor1 that did not have second generation hardware support. Detailedinformation about the hypervisor versions is listed in section B.3. To make a com-parison between first generation and second generation hardware support for x86virtualization, the setups were also tested on a newer processor2 that does providethe hardware supported MMU. The results of the tests on the newer processor aregiven in section 5.3.

The tested L1 hypervisors using the hardware extensions for virtualization areVirtualBox, VMware Workstation, Xen and KVM. We nested the seven hypervisors(see table 3.1) within these four hypervisors, resulting in 28 setups. In the firstsubsection the nested hypervisor is based on dynamic binary translation. The secondsubsection described the setups with Xen paravirtualization as the L2 hypervisor.The last subsection handles the setups with a nested hypervisor based on hardwaresupport for x86 virtualization.

1Setups with a L1 hypervisor based on first generation hardware support for x86 virtualizationwere tested on an Intel R© CoreTM2 Quad Q9550 processor.

2Setups with a L1 hypervisor based on second generation hardware support for x86 virtualizationwere tested on an Intel R© CoreTM i7-860 processor.

Page 53: Nesting Virtual Machines in Virtualization Test …nyh/nested/Thesis_OlivierBergh...Nesting Virtual Machines in Virtualization Test Frameworks Dissertation submitted on May 2010 to

5.2. FIRST GENERATION HARDWARE SUPPORT 42

5.2.1 Dynamic binary translation

Using dynamic binary translation as the nested hypervisor technique, there areeight setups. Three of these setups are able to successfully boot and run a nestedvirtual machine. The layout of these setups can be seen in figure 5.4 where the L1hypervisor is based on hardware support and the L2 hypervisor is based on dynamicbinary translation. When Xen is used as the L1 hypervisor, the host OS layer canbe left out and a domain 0 is started next to VM1, which still uses hardware supportfor its virtualization.

Figure 5.4: Layers for nested dynamic binary translation in a hypervisor based onhardware support.

VirtualBox: When VirtualBox based on hardware support is used as the bot-tom layer hypervisor, none of the setups worked. Nesting VirtualBox inside Virtual-Box resulted in the L2 guest becoming unresponsive. The same result happenedwhen VirtualBox was nested in VirtualBox but used dynamic binary translation forboth levels. When trying to nest a VMware Workstation guest inside VirtualBox,the configuration of that setup is very unstable so that each minor change resultedin a setup that refuses to start the L2 guest. There was one working configurationwhich we listed in section B.3.

VMware Workstation: If the L1 hypervisor in figure 5.4 is VMware Work-station, the setups were successful in nesting virtual machines. Both VirtualBoxand VMware Workstation as nested hypervisors based on dynamic binary transla-tion were able to start the L2 guest which booted and ran correctly.

Xen: VMware Workstation3 checks whether there is an underlying hypervisorrunning. It noticed that Xen was running and refused to start a nested guest. Thisprevents a L2 VMware guest from starting within a Xen guest. In the other setup,where VirtualBox is used as inner hypervisor, the L2 again became unresponsive

3In version VMware Workstation 6.5.3 build-185404 and newer

Page 54: Nesting Virtual Machines in Virtualization Test …nyh/nested/Thesis_OlivierBergh...Nesting Virtual Machines in Virtualization Test Frameworks Dissertation submitted on May 2010 to

5.2. FIRST GENERATION HARDWARE SUPPORT 43

after starting. There is no crash, error message or warning which might indicatethat the L2 guest booted at a very slow pace.

KVM: The third and last working setup for nesting a hypervisor based ondynamic binary translation within one based on hardware support is nesting VMwareWorkstation inside KVM. In newer versions of VMware Workstation4, a check foran underlying hypervisor noticed that KVM was running and refused to boot anested guest. The setup with VirtualBox as the nested hypervisor crashed whilebooting. The L2 guest showed an error indicating a kernel panic because it couldnot synchronize. The guest became unresponsive after displaying the message.

Virtual-Box VMware XEN KVM

HV HV HV HV

VirtualBox DBT × X × ×VMware DBT ∼ X × X

Table 5.5: The nesting setups with first generation hardware support as the L1hypervisor technique and DBT as the L2 hypervisor technique.

Table 5.5 gives a summary of the eight setups discussed in this subsection. VM-ware Workstation is the best option since it allows nesting other hypervisors basedon dynamic binary translation, but it will also most likely work when used as nestedhypervisor based on dynamic binary translation. In comparison to nesting insidea software solution, VirtualBox is able to nest within VMware Workstation whenusing hardware support for the L1 hypervisor. VirtualBox is still not able to nestwithin KVM, Xen and within itself, while VMware Workstation is able to nest withinKVM and itself. It is regretful that VMware Workstation checks for an underlyinghypervisor, other than VMware itself, to prevent the use of VMware Workstationwithin other hypervisors.

5.2.2 Paravirtualization

In this subsection, we discuss the setups that nest a paravirtualized guest insidea guest virtualized using hardware support. Figure 5.5 shows the layers in thesesetups. The main differences with the setups in the previous subsection are thatthe L1 guest and the L2 hypervisor are represented by the same layer and that Xenautomatically starts domain 0.

There are just four setups tested in this subsection since only Xen is nested withinthe four hypervisors based on hardware support. All four setups could successfullynest a paravirtualized guest inside the L1 guest. However, the setup where Xen isnested inside VirtualBox was not very stable. Sometimes during the start-up of theprivileged domain several segmentation faults occurred. Domain 0 was able to bootand run successfully but the creation of another paravirtualized guest was sometimes

4In version VMware Workstation 7.0.1 build-227600 and newer

Page 55: Nesting Virtual Machines in Virtualization Test …nyh/nested/Thesis_OlivierBergh...Nesting Virtual Machines in Virtualization Test Frameworks Dissertation submitted on May 2010 to

5.2. FIRST GENERATION HARDWARE SUPPORT 44

Figure 5.5: Layers for nested paravirtualization in a hypervisor based on hardwaresupport.

impossible. Xen reported that the guest is created, however, it did not show up inthe list of virtual machines indicating that the guest crashed immediately.

Virtual-Box VMware XEN KVM

HV HV HV HV

Xen PV ∼ X X X

Table 5.6: The nesting setups with first generation hardware support as the L1hypervisor technique and PV as the L2 hypervisor technique.

An overview of the four setups is shown in table 5.6. It is clear that usingparavirtualization as technique for the nested hypervisor can be recommended. Theonly setup that does not completely work is the one with VirtualBox. Since theother three setups work and since previous conclusions were also not in favor ofVirtualBox, VirtualBox is probably the reason for the instability.

5.2.3 Hardware supported virtualization

The remaining setups, which attempt to nest a hypervisor based on hardware sup-ported virtualization, are discussed in this subsection. Nesting the four hypervisorsbased on hardware support within each other results in 16 setups. The layout of thesetups is equal to figure 5.4 and figure 5.5, depending on which hypervisor is used.None of the hypervisors provide the x86 virtualization processor extensions to theirguests indicating that none of the setups will work.

Developers of both KVM and Xen are working on support for nested hardwaresupport. Detailed information can be found in section 5.4. KVM has already releasedinitial patches for nested hardware support on AMD processors and is working onpatches for the nested support on Intel processors. Xen is also researching the ability

Page 56: Nesting Virtual Machines in Virtualization Test …nyh/nested/Thesis_OlivierBergh...Nesting Virtual Machines in Virtualization Test Frameworks Dissertation submitted on May 2010 to

5.2. FIRST GENERATION HARDWARE SUPPORT 45

to nest the hardware support so that nested virtual machines can use the hardwareextensions.

Virtual-Box VMware XEN KVM

HV HV HV HV

VirtualBox HV × × × ×VMware HV × × × ×

Xen HV × × × ×KVM HV × × × ×

Table 5.7: The nesting setups with first generation hardware support as the L1 andL2 hypervisor technique.

The results of this subsection are summarized in table 5.7. It is regretful thatcurrently none of the setups work because the L1 hypervisors do not yet provide thehardware support for virtualization to the guests. Nonetheless, it is hopeful thatKVM and Xen are doing research and work in this area. Their work can motivatemanagers and developers of other hypervisors to provide these hardware extensionsto their guests as well.

We would like to note that VMware and VirtualBox guests with a 64 bit oper-ating system need hardware support to execute. If we would use a 64 bit operatingsystem for the nested guest, the result would be the same as the results in thissection since there is currently no nested hardware support.

5.2.4 Overview first generation hardware support

In this subsection, we summarize the results using the setups that are described inthe previous subsections. All the setups were tested on a processor that had firstgeneration hardware support for virtualization on x86 architectures. The results ofall the setups are gathered in table 5.8. The columns indicate the L1 hypervisor andthe rows indicate the L2 hypervisor, i.e. the hypervisor represented by the row isnested inside the hypervisor represented by the column.

Nested x86 virtualization using a L1 hypervisor based on hardware support ismore successful than using a L1 hypervisor based on software solutions (see sec-tion 5.1.3). For nesting dynamic binary translation, results suggest that VMwareWorkstation was the best option and that VirtualBox works although it showedsome instabilities. Nesting paravirtualization is the most suitable solution when us-ing a L1 hypervisor based on hardware support on a processor that only supportsfirst generation hardware support. Nested hardware support is not present yet butKVM and Xen are working on it. The future will tell whether they will be successfulor not. The number of working setups increased when using hardware support inthe L1 hypervisor so the future looks promising for nested hardware support.

For now, VMware Workstation is the most suitable choice for the L1 hypervisor,

Page 57: Nesting Virtual Machines in Virtualization Test …nyh/nested/Thesis_OlivierBergh...Nesting Virtual Machines in Virtualization Test Frameworks Dissertation submitted on May 2010 to

5.3. SECOND GENERATION HARDWARE SUPPORT 46

Virtual-Box VMware XEN KVM

HV HV HV HV

VirtualBoxDBT × X × ×

HV × × × ×

VMwareDBT ∼ X × X

HV × × × ×

XenPV ∼ X X X

HV × × × ×

KVM HV × × × ×

Table 5.8: Overview of the nesting setups with first generation hardware support asthe L1 hypervisor technique.

directly followed by KVM, since it can nest three different hypervisors. The advisablechoice for the L2 hypervisor is a paravirtualized guest using Xen since it allowsnesting in all the hypervisors. VirtualBox as the L1 hypervisor has two unstablesetups which makes it rather unsuitable for nested virtualization.

5.3 Second generation hardware support

In section 4.3 we concluded that it should be possible to nest virtual machineswithout modifying the guest operating system, given that the hardware extensionsfor virtualization on x86 architectures are provided by the physical processor. Insection 5.2, the setups with a L1 hypervisor based on hardware support were testedon a processor that only provided first generation hardware support. In this section,we test the same setups but on a newer processor5 that provides second generationhardware support. The comparison of the results presented in both sections can givean insight in the influence of the hardware supported MMU for nested virtualization.Section B.4 lists detailed information about the hypervisor versions.

The second generation hardware support offers a hardware supported MMU.The hardware supported MMU provides the Extended Page Tables for Intel and theNested Page Tables for AMD (see section 3.4). The memory management in nestedvirtualization needs three levels of address translation, as can be seen in figure 4.2,while the hardware supported MMU only offers two levels of address translation.

5Setups with a L1 hypervisor based on second generation hardware support for x86 virtualizationwere tested on an Intel R© CoreTM i7-860 processor

Page 58: Nesting Virtual Machines in Virtualization Test …nyh/nested/Thesis_OlivierBergh...Nesting Virtual Machines in Virtualization Test Frameworks Dissertation submitted on May 2010 to

5.3. SECOND GENERATION HARDWARE SUPPORT 47

This problem is solved by reusing the existing code for the shadow tables. TheL2 hypervisor will maintain shadow tables that translate the nested virtual guestaddress to the physical guest address. These shadow tables are used together withthe EPT or NPT to translate the nested virtual guest address into a real physicaladdress. So in a nested setup, the L2 guest maintains its own page tables thattranslate nested virtual guest addresses into nested physical guest addresses. The L2hypervisor maintains shadow tables for these page tables that immediately translatenested virtual guest addresses into physical guest addresses. The L1 hypervisormaintains the EPT or NPT that translate the physical guest addresses into realphysical addresses.

The setups in this section remain unchanged; VirtualBox, VMware Workstation,Xen and KVM are used as L1 hypervisor, supporting the hardware extensions forvirtualization on x86 architectures. The first subsection elaborates on the results ofnesting a L2 hypervisor based on dynamic binary translation. The second subsectiondiscusses the results of a nested hypervisor based on paravirtualization. The resultsof nesting a hypervisor based on hardware support within the L1 hypervisor areexplained in the third subsection. Subsection 5.3.4 gives an overview of the resultsin this section and compares these results with the results obtained in section 5.2.

5.3.1 Dynamic binary translation

Eight setups were tested using a L2 hypervisor based on dynamic binary translation.Compared to the three working setups in subsection 5.2.1, there are six workingsetups when using a L1 hypervisor based on the second generation hardware supportfor virtualization on x86 architectures. The layout of the setups in this subsectionis shown in figure 5.4.

VirtualBox: Using VirtualBox based on hardware support as the bottomlayer hypervisor results in a different outcome for one of the setups. If the L2hypervisor is VMware Workstation, the result was very unstable comparable to sub-section 5.2.1. The other setup, which uses VirtualBox as the nested hypervisor, wasable to boot and run a L2 guest successfully. In the tests without second genera-tion hardware support, this setup became unresponsive. The use of the hardwaresupported MMU affects the outcome of the test for this setup.

VMware Workstation: Nothing has changed for these results. Both setups,with VMware Workstation as L1 hypervisor, were still successful in running a L2guest.

Xen: The setup with VirtualBox as the L2 hypervisor also has a differentoutcome with Xen based on hardware support as the L1 hypervisor. A L2 guest wasable to boot and run successfully, while in the test with first generation hardwaresupport, the setup became unresponsive. The setup with VMware Workstation asthe L2 hypervisor still does not work because the hypervisor checks for an under-lying L1 hypervisor. The Xen hypervisor was detected and VMware Workstation6

reported that the user should disable the other hypervisor.KVM: Nesting VirtualBox or VMware Workstation within KVM now worked

6In version VMware Workstation 6.5.3 build-185404 and newer

Page 59: Nesting Virtual Machines in Virtualization Test …nyh/nested/Thesis_OlivierBergh...Nesting Virtual Machines in Virtualization Test Frameworks Dissertation submitted on May 2010 to

5.3. SECOND GENERATION HARDWARE SUPPORT 48

for both setups. The setup with VMware Workstation as the inner hypervisor al-ready worked without second generation hardware support. The newer versions ofVMware Workstation7 checked whether there is an underlying hypervisor and no-ticed that KVM was running. The new check prevented the setup from working.The setup using VirtualBox as L2 hypervisor, which showed a kernel panic in theprevious test, now booted and ran successfully.

Virtual-Box VMware XEN KVM

HV HV HV HV

VirtualBox DBT X X X X

VMware DBT ∼ X × X

Table 5.9: The nesting setups with second generation hardware support as the L1hypervisor technique and DBT as the L2 hypervisor technique.

The new results are gathered in table 5.9. In subsection 5.2.1, VMware Work-station was the recommended option to use as both L1 and L2 hypervisor. Theconclusion is different with the results in this subsection. VMware Workstation nowshares the most suitable choice for the bottom layer hypervisor with KVM. Themost suitable choice for the L2 hypervisor is no longer VMware Workstation butVirtualBox since it could be nested in all setups. The check for an underlying hyper-visor in VMware Workstation prevented it from being nested in certain setups. Thesetup that nests VMware inside VirtualBox is very unstable, preventing VirtualBoxfrom being the advisable choice for the L1 hypervisor.

5.3.2 Paravirtualization

In this subsection, we replace dynamic binary translation as the L2 hypervisor withparavirtualization. The layout of the setups is shown in figure 5.5. There werethree setups that completely worked in subsection 5.2.2. The fourth setup was veryunstable because segmentation faults could occur during the start-up of domain 0.Using second generation hardware support these segmentation faults disappearedand the fourth setup successfully passed the test.

There is little difference with the previous results on a processor with first gen-eration hardware support. The new results are collected in table 5.10. Xen usingparavirtualization remains a perfect choice for nesting inside a virtual machine.

5.3.3 Hardware supported virtualization

None of the setups in subsection 5.2.3 worked. The results are the same for thissubsection since the problem is not the hardware supported MMU. The layout of thesetups with the L1 and L2 hypervisors based on hardware support for virtualizationon x86 architectures is similar to figure 5.4 and figure 5.5, depending on which

7In version VMware Workstation 7.0.1 build-227600 and newer

Page 60: Nesting Virtual Machines in Virtualization Test …nyh/nested/Thesis_OlivierBergh...Nesting Virtual Machines in Virtualization Test Frameworks Dissertation submitted on May 2010 to

5.3. SECOND GENERATION HARDWARE SUPPORT 49

Virtual-Box VMware XEN KVM

HV HV HV HV

Xen PV X X X X

Table 5.10: The nesting setups with second generation hardware support as the L1hypervisor technique and PV as the L2 hypervisor technique.

hypervisor is used. The problem is that there is no nested hardware support. Thereis work in progress on this subject by KVM and Xen, see section 5.4.

Virtual-Box VMware XEN KVM

HV HV HV HV

VirtualBox HV × × × ×VMware HV × × × ×

Xen HV × × × ×KVM HV × × × ×

Table 5.11: The nesting setups with first generation hardware support as the L1and L2 hypervisor technique.

For completeness, the results are shown in table 5.11 but the results are thesame as the results of the tests on a processor without second generation hardwaresupport.

5.3.4 Overview second generation hardware support

The intermediate results of the previous subsections are gathered in this subsection.The setups were tested on a processor that provides second generation hardwaresupport for virtualization. Table 5.12 shows a summary of the results obtained inthe previous subsections. The columns indicate the L1 hypervisor and the rowsrepresent the L2 hypervisor, i.e. the hypervisor indicated by the row is nested insidethe hypervisor represented by the column.

Nested x86 virtualization with a L1 hypervisor based on hardware support iseven more successful when the processor provides second generation hardware sup-port. Both dynamic binary translation and paravirtualization are capable of beingnested inside a hypervisor based on hardware support. The two setups that didnot work, had problems with configuration instability and the check for an under-lying hypervisor in VMware Workstation. KVM and VMware Workstation are theadvisable choice for the L1 hypervisor since all dynamic binary translation and para-virtualization setups worked for these hypervisors. VirtualBox using dynamic binary

Page 61: Nesting Virtual Machines in Virtualization Test …nyh/nested/Thesis_OlivierBergh...Nesting Virtual Machines in Virtualization Test Frameworks Dissertation submitted on May 2010 to

5.4. NESTED HARDWARE SUPPORT 50

Virtual-Box VMware XEN KVM

HV HV HV HV

VirtualBoxDBT X X X X

HV × × × ×

VMwareDBT ∼ X × X

HV × × × ×

XenPV X X X X

HV × × × ×

KVM HV × × × ×

Table 5.12: Overview of the nesting setups with second generation hardware supportas the L1 hypervisor technique.

translation and Xen using paravirtualization are the most suitable choice for the L2hypervisor.

Many setups that were unresponsive in section 5.2 became responsive when usinga hardware supported MMU. The use of EPT or NPT improves the performance forthe memory management and releases the L1 hypervisor from maintaining shadowtables. The maintenance of the shadow tables is based on software and can containbugs. It must also be implemented in a performance oriented way since it is a crucialpart. After some research8, it was clear that hypervisors normally take shortcutsin order to improve the performance of the memory management. Thus, the mainissue is the shadow tables, which optimize the MMU virtualization but not exactlyfollow architecture equivalence for performance reasons. Two levels of shadow pagetables seemed to be the cause of unresponsiveness in several setups. Replacing theshadow tables in the L1 hypervisor by the use of EPT or NPT removes the inaccuratevirtualization of the memory management unit. The second generation hardwaresupport inserts an accurate hardware MMU with two levels of address translationin the L1 hypervisor allowing L2 hypervisors and L2 guests to run successfully.

5.4 Nested hardware support

Nested hardware support is the support of hardware extensions for virtualizationon x86 architectures within a guest. The goal of nested hardware support is mainlysupporting nested virtualization for L2 hypervisors based on that hardware support.

8http://www.mail-archive.com/[email protected]/msg29779.html

Page 62: Nesting Virtual Machines in Virtualization Test …nyh/nested/Thesis_OlivierBergh...Nesting Virtual Machines in Virtualization Test Frameworks Dissertation submitted on May 2010 to

5.4. NESTED HARDWARE SUPPORT 51

In section 4.3, we concluded that in order to nest a hypervisor based on hardwaresupport, the virtualized processor should provide the hardware extensions. In sub-section 5.2.3 and subsection 5.3.3, we noticed that none of the hypervisors provide avirtualized processor with hardware extensions, resulting in none of the setups beingable to nest a hypervisor. Recently, KVM and Xen started research in this domainin order to develop nested hardware support. In the following subsection, the workin progress of both KVM and Xen is presented.

5.4.1 KVM

Nested hardware support was not supported by default in KVM. The virtualized pro-cessor provided to the guest is similar to the host processor, but lacks the hardwareextensions for virtualization. These extensions are needed in order to use KVM orany other hypervisor based on hardware support. The introduction of nested hard-ware support should allow these hypervisors to be nested inside a virtual machine.

The first announcement of nested hardware support was made on September 2008in a blog post of Avi Kivity [48]. He writes about an e-mail of Alexander Graf andJoerg Roedel presenting a patch for nested SVM support [49], i.e. nested hardwaresupport for AMD processors with SVM support, and about the relative simplicityof this patch. More information on AMD SVM itself can be found in section 3.3.Alexander Graf and Joerg Roedel are both developers working on new features forKVM. The patch was eventually included in development version kvm-82 and allowsthe guest on an AMD processor, with hardware extensions for virtualization, to runa nested hypervisor based on hardware support. The implementation of the patchstayed relatively simple by exploiting the design of the SVM instruction set.

A year later on September 2009, Avi Kivity announced that support for nestedVMX, i.e. nested hardware support for Intel processors with Intel VT-x exten-sions, is coming. The bad news is that it will take longer to implement this featuresince nested VMX is more complex than nested SVM. In section 3.3, we explainedthat Intel VT-x and AMD SVM are very similar but the terminology is somewhatdifferent. Besides the similarities, there are some fundamental differences in theirimplementation that make VMX support more complex.

A first difference is the manipulation of the data structure used by the hypervisorto communicate with the processor. For Intel VT-x, this data structure is called theVMCS, the equivalent in AMD SVM is called VMCB. Intel uses two instructions,VMREAD and VMWRITE, to manipulate the VMCS, while AMD allows manipula-tion of the VMCB by reading and writing in a memory region. The drawback of thetwo extra instructions is that KVM must trap and emulate the special instructions.For SVM, KVM could just allow the guest to read and write to the memory regionof the VMCB without intervention.

A second difference is the number of fields used in the data structure. Intel usesa lot more fields to allow hypervisor-processor intercommunication. AMD SVM has91 fields in the VMCS, while Intel VT-x has no less than 144 fields. KVM needs tovirtualize all these fields and make sure that the guest, running a hypervisor, canuse those fields in a correct way.

Besides the differences in the implementation of Intel VT-x and AMD SVM,

Page 63: Nesting Virtual Machines in Virtualization Test …nyh/nested/Thesis_OlivierBergh...Nesting Virtual Machines in Virtualization Test Frameworks Dissertation submitted on May 2010 to

5.4. NESTED HARDWARE SUPPORT 52

another reason for the longer development time for the nested VMX support isthat the patch will immediately support nested EPT. This means that not onlythe hypervisor in the host can use Extended Page Tables, see section 3.4, but thehypervisor in the guest also benefits from EPT support. As already pointed out insection 4.3, nested EPT or nested NPT could be critical for obtaining reasonableperformance. With the VMX support, a KVM guest must support the 32 bit and64 bit page tables format and the EPT format.

In practiceThe nested hardware support was tested on an AMD processor9 since the nestedSVM patch was already released. The installation is the same as a regular installbut in order to use the patch one must set a flag when loading the modules. We cando this using the following commands:

modprobe kvmmodprobe kvm−amd nested=1

“nested=1” indicates that we want to use the nested SVM. The tested setup wasKVM as both L1 and L2 hypervisor. After installing and booting the L1 guest, KVMwas installed inside the guest in exactly the same way as a normal installation ofKVM. The nested hypervisor’s modules do not need to be loaded with “nested=1”.In subsection 5.2.3 and subsection 5.3.3, we could not install KVM within the guest.Installing KVM within the guest is a promising step towards nested virtualizationwith KVM, or any other hypervisor based on hardware support, as a nested hy-pervisor. When starting the L2 guest for installation of an operating system or forbooting an existing operating system, some “handle exit” messages occurred. OnKVM’s mailing list, Joerg Roedel replied10 on March 2010 that the messages resultfrom a difference between a real hardware SVM and the emulated SVM from KVM.A patch should fix this issue, as it needs more testing the current setup was notable to boot. Nonetheless, developers are constantly improving the nested SVM bymeans of new patches and tests so it is just a matter of time before the current setupwill work.

5.4.2 Xen

Xen is also working on nested virtualization with an emphasis on virtualization basedon hardware support. On November 2009, during the Xen Summit in Asia, Qing Hepresented his work on nested virtualization [50]. Qing He has been working on Xensince 2006 and is a software engineer from the Intel Open Source Technology Center.His work focusses on hardware support based virtualization and more specificallyon Intel’s VT-x hardware support. The current progress is a proof of concept for asimple scenario with a single processor and one nested guest. The nested guest isable to boot to an early stage successfully with KVM as the L2 hypervisor. Beforereleasing the current version, it still needs some stabilization and refinement.

The main target is the virtualization of VMX in order to present a virtualizedVMX to the guest. This means that everything of the hardware support must be

9The nested hardware support was tested on a Quad-Core AMD OpteronTM 2350 processor.10http://www.mail-archive.com/[email protected]/msg31096.html

Page 64: Nesting Virtual Machines in Virtualization Test …nyh/nested/Thesis_OlivierBergh...Nesting Virtual Machines in Virtualization Test Frameworks Dissertation submitted on May 2010 to

5.4. NESTED HARDWARE SUPPORT 53

available in the guest. The guest should be able to use the data structures and theinstructions to manipulate the VMCS. The guest should also be able to control theexecution flow of the VMX with VMEntry and VMExit instructions.

Figure 5.6: Nested virtualization architecture based on hardware support.

The data structures are shown in figure 5.6. The L1 guest has a VMCS that isloaded into the hardware when this guest is running. The VMCS is maintained bythe L1 hypervisor. If the L2 guest wants to execute, it needs to have a correspondingVMCS. That corresponding VMCS is maintained by the L2 hypervisor running inthe L1 guest and is called the virtual VMCS, or vVMCS. The L2 hypervisor seesthe virtual VMCS as the controlling VMCS of the L2 guest but it is called virtualbecause the L1 hypervisor maintains a corresponding shadow VMCS, or sVMCS.This shadow VMCS is not a complete duplicate of the virtual VMCS but containstranslations, similar to the shadow tables (see subsection 3.1.3). It is the shadowVMCS that is loaded to the hardware when the L2 guest is running. Thus, eachnested guest has a virtual VMCS in the L2 hypervisor and a corresponding shadowVMCS in the L1 hypervisor. The general idea is to treat the L2 guests as a guestof the L1 hypervisor using the shadow VMCS.

Figure 5.7 shows the execution flow in a nested virtualization scenario basedon hardware support. On the left side of the figure, the L1 guest is running andwants to start a nested guest. The guest does this by executing a VMEntry with theinstruction VMLAUNCH or VMRESUME. The virtual VMEntry can not directlyswitch to the L2 guest because it is not supported by the hardware. The L1 guestis already using the VMX guest mode and can only trigger a VMExit. The VMExitresults in a transition to the L1 hypervisor which will intercept the VMEntry calland tries to switch to the shadow VMCS indicated by the VMEntry. This results inthe transition to the L2 guest and the L2 can run from then on.

Similar to a virtual VMEntry, the virtual VMExit will transition to the L1 hy-pervisor. The L1 hypervisor does not know whether the VMExit is a virtual VMExit

Page 65: Nesting Virtual Machines in Virtualization Test …nyh/nested/Thesis_OlivierBergh...Nesting Virtual Machines in Virtualization Test Frameworks Dissertation submitted on May 2010 to

5.4. NESTED HARDWARE SUPPORT 54

Figure 5.7: Execution flow in nested virtualization based on hardware support.

or whether the VMExit happened due to the L2 guest executing a privileged instruc-tion. When the L2 guest tries to run a privileged instruction, the L1 hypervisor canfix this without having to forward the VMExit to the L2 hypervisor. An algorithmin the L1 hypervisor determines whether this is a virtual VMExit and should be for-warded to the L2 hypervisor, or it is another type of VMExit that can be handledby the L1 hypervisor. For a virtual VMExit, the L1 hypervisor forwards to the L2hypervisor and the shadow VMCS of the L2 guest is unloaded. The L1 hypervisorswitches the controlling VMCS to the VMCS of the L1 guest. In the figure, thereare 3 VMExits which result in a transition to the L1 hypervisor. The first and thelast VMExit is forwarded by the L1 hypervisor to the L2 hypervisor and the secondVMExit is handled by the L1 hypervisor itself.

There is no special handling in place for the memory management. The nestedEPT, as described in the previous subsection, is also very helpful in this case becauseit significantly reduces the number of virtual VMExits. Nested EPT support is stillwork in progress.

Page 66: Nesting Virtual Machines in Virtualization Test …nyh/nested/Thesis_OlivierBergh...Nesting Virtual Machines in Virtualization Test Frameworks Dissertation submitted on May 2010 to

5.4. NESTED HARDWARE SUPPORT 55

Subs

ecti

ons

Vir

tual

Box

VM

war

eX

EN

KV

MG

en.

HV

DB

TH

VD

BT

HV

PV

HV

HV

Vir

tual

Box

DB

××

××

1st

gen.

XX

XX

2nd

gen.

HV

××

××

××

×1s

tge

n.

××

××

2nd

gen.

VM

war

e

DB

∼×

×X

1st

gen.

∼X

×X

2nd

gen.

HV

××

××

××

×1s

tge

n.

××

××

2nd

gen.

Xen

PV

×∼

XX

×X

X1s

tge

n.

XX

XX

2nd

gen.

HV

××

××

××

×1s

tge

n.

××

××

2nd

gen.

KV

MH

××

××

××

1st

gen.

××

××

2nd

gen.

Table 5.13: Overview of all nesting setups.

Page 67: Nesting Virtual Machines in Virtualization Test …nyh/nested/Thesis_OlivierBergh...Nesting Virtual Machines in Virtualization Test Frameworks Dissertation submitted on May 2010 to

CHAPTER 6

Performance results

This chapter elaborates on the performance of the working setups for nested virtu-alization on x86 architectures. Chapter 5 showed that there was one working setupfor nested virtualization when using dynamic binary translation as the L1 hyper-visor technique. There were also ten working setups when using a L1 hypervisorbased on hardware support with a processor that contains the second generationhardware extensions for virtualization on x86 architectures. The performance in anormal virtual machine is compared to the performance in a nested virtual machinein order to get an idea about the performance degradation between virtualizationand nested virtualization.

The performed tests measure the processor, memory and I/O performance.These are the three most important components of a computer system. The evolu-tion of hardware support for virtualization on x86 architecture also shows that theprocessor, the memory management unit and I/O are important components, seechapter 3. The first generation hardware support focusses on the processor, secondgeneration hardware support concentrates on a hardware supported MMU and thenewer generation provides support for directed I/O. The benchmarks used for thetests are sysbench1, iperf2 and iozone3. sysbench was used for the processor,memory and file I/O performance. iperf was used for network performance andiozone was used for a second benchmark for file I/O.

The rest of this chapter is organized using these three components. The firstsection elaborates on the performance of the processor in nested virtualization. Thenext section evaluates the memory performance of the nested virtual machines andthe third section shows the performance of I/O in a nested setup. The last sectiongives an overall conclusion on the performance of nested virtualization.

1http://sysbench.sourceforge.net/2http://iperf.sourceforge.net/3http://www.iozone.org/

Page 68: Nesting Virtual Machines in Virtualization Test …nyh/nested/Thesis_OlivierBergh...Nesting Virtual Machines in Virtualization Test Frameworks Dissertation submitted on May 2010 to

6.1. PROCESSOR PERFORMANCE 57

Whenever a test ran directly on the host operating system, without any virtual-ization, the test is labeled with the word “native”. If the label is a name of a singlevirtualization product, the test ran inside a L1 guest with the indicated hypervisoras L1 hypervisor. The “DBT” suffix indicates that the L1 hypervisor uses the dy-namic binary translation technique. All “HV” tests use the hardware support of theprocessor4 for virtualization. A label of the form L1hypervisor -L2hypervisor showsthe result of a performance test executed in a L2 guest using the given L2 hypervisorand L1 hypervisor. For example, “KVM (HV) - VirtualBox (DBT)” indicates thesetup where KVM is used as L1 hypervisor and VirtualBox is used as L2 hypervisorbased on dynamic binary translation. All nested setups use the hardware supportof the processor in the L1 hypervisor, except for “VMware (DBT) - Xen (PV)”.The latter uses VMware as the L1 hypervisor based on dynamic binary translationand uses Xen as L2 hypervisor based on paravirtualization. The L2 hypervisor isnever based on hardware support as can be seen in chapter 5. Thus, VirtualBox andVMware are always based on dynamic binary translation and Xen is always basedon paravirtualization, when used as L2 hypervisor.

6.1 Processor performance

The experiment used to measure the performance of the processor consists of asysbench test which calculates prime numbers. It calculates the prime numbersuntil a set maximum and does this a given amount of times. The number of threadsthat will calculate the prime numbers can also be modified prior to running thetest. In the executed tests, the maximum number for the primes was 150000 andall prime numbers until 150000 were calculated 10000 times spread over 10 threads.The measured unit of the test was the duration in seconds.

Figure 6.1 shows the first results of the performance test for the processor. Theleft bar is the result on the computer system without virtualization and the otherbars are the results of the tests in L1 guests. The figure shows a serious gap betweenthe native performance and the performance in a virtual machine. The reason forthis big gap in performance is the use of only one core inside the virtual machinewhile the host operating system can use four cores. The tests were executed invirtual machines with only one core so that the comparison between the differentvirtualization software would be fair.

In order to get an indication of the real performance degradation, the same testwas executed in a VMware guest that can use four cores and in a “VMware (HV)- VMware (DBT)” nested guest that can use four cores. The results of these testsare given in figure 6.2. The figure shows that the performance degradation betweena virtual machine and a nested virtual machine is less than the performance degra-dation between a native platform and a virtual machine. By adding an extra level

4All performance tests were executed on an Intel R© CoreTM i7-860 processor that provides secondgeneration hardware support for x86 virtualization.

Page 69: Nesting Virtual Machines in Virtualization Test …nyh/nested/Thesis_OlivierBergh...Nesting Virtual Machines in Virtualization Test Frameworks Dissertation submitted on May 2010 to

6.2. MEMORY PERFORMANCE 58

0

50

100

150

200

250

300

350

400

native

VirtualBox (HV)

VMware (HV)

Xen (HV)

KVM (HV)

Dur

atio

n (in

sec

onds

)

CPU

Figure 6.1: CPU performance for native with four cores and L1 guest with one core(lower is better).

of virtualization, one expects a certain overhead, but this shows that the perfor-mance degradation for the extra level is promising. The performance overhead islinear and does not increase exponentially, which is promising because the latenciesof VMEntry and VMExit instructions (see section 3.3) do not have to be improveddramatically in order to get acceptable performance in the nested guest.

The results of the tests on virtual machines and nested virtual machines areshown in figure 6.3. The performance between L1 guests with “HV” is about thesame since the L1 hypervisors use hardware support for virtualization. The L1 guestthat is virtualized using dynamic binary translation, “VMware (DBT)”, was able toperform equally well. The results of the L2 guests vary heavily between the differentsetups and are higher than the results of the L1 guests. However, the performancedegradation is not problematic, except for one outlier which uses dynamic binarytranslation for the L1 hypervisor. With a duration of 496.83 seconds, the “VMware(DBT) - Xen (PV)” setup performs much worse than other nested setups.

6.2 Memory performance

In this section, the performance degradation of the memory management unit isevaluated. In section 5.3 we explained that the hardware supported L1 hypervisorsuse the hardware supported MMU of the processor and the L2 hypervisors use asoftware technique for maintaining the page tables of their guests. In the “VMware

Page 70: Nesting Virtual Machines in Virtualization Test …nyh/nested/Thesis_OlivierBergh...Nesting Virtual Machines in Virtualization Test Frameworks Dissertation submitted on May 2010 to

6.2. MEMORY PERFORMANCE 59

0

20

40

60

80

100

120

140

160

native

VMware (HV)

VMware (HV) - VM

ware (DBT)

Dur

atio

n (in

sec

onds

)

CPU

Figure 6.2: CPU performance for native, L1 and L2 guest with four cores (lower isbetter).

(DBT) - Xen (PV)” setup, the L1 hypervisor maintains shadow tables and the L2hypervisor provides paravirtual interfaces to its guests.

The performed memory tests evaluate the read and write throughput. The testsread or write data with a total size of 2 Gb from or to the memory in block sizes of 256bytes. The tests were done in twofold, one that reads or writes in a sequential orderand one that reads or writes in a random order. Figure 6.4 presents the results of thememory tests for the native platform, L1 guests and L2 guests. Several observationsfor nested virtualization can be made from the results.

A first observation is that the duration of the tests increases greatly when usingvirtualization. The L1 guests needed approximately 10 seconds to read or write 2 Gb,while the test on the native platform took about 1.5 seconds. Most L2 guests tookmore than 128 seconds to pass the test. For nested virtualization the performancedegradation of the memory is more significant than the performance degradation ofthe processor.

A second observation shows that nesting Xen allows to avoid the performancedegradation for the memory, except for the setup with dynamic binary translationas L1 hypervisor. While other nested setups took more than 128 seconds, the nestedsetups with Xen as L2 hypervisor took 10 seconds which is the same as in the L1

Page 71: Nesting Virtual Machines in Virtualization Test …nyh/nested/Thesis_OlivierBergh...Nesting Virtual Machines in Virtualization Test Frameworks Dissertation submitted on May 2010 to

6.2. MEMORY PERFORMANCE 60

300

310

320

330

340

350

360

370

380

390

400

410

VirtualBox (HV)

VMware (HV)

VMware (DBT)

Xen (HV)

KVM (HV)

VirtualBox (HV) - VirtualBox (DBT)

VirtualBox (HV) - Xen (PV)

VMware (HV) - VirtualBox (DBT)

VMware (HV) - VM

ware (DBT)

VMware (HV) - Xen (PV)

VMware (DBT) - Xen (PV)

Xen (HV) - VirtualBox (DBT)

Xen (HV) - Xen (PV)

KVM (HV) - VirtualBox (DBT)

KVM (HV) - VM

ware (DBT)

KVM (HV) - Xen (PV)

Dur

atio

n (in

sec

onds

)

CPU

Figure 6.3: CPU performance for L1 and L2 guests with one core (lower is better).

guests. Paravirtualization appears to be a promising technique for the L2 hypervisorto minimize the performance overhead of the memory. The reason why “Xen (PV)”does not minimize the performance overhead of the memory when compared tonative is unclear.

The figure also shows that the “VMware (DBT)” setup performs poorly com-pared to the other L1 setups and that “VMware (DBT) - Xen (PV)” did not takeadvantage of the paravirtualization in the L2 hypervisor. The nested setup “VM-ware (DBT) - Xen (PV)” is not the worst of all nested setups, but the durationstill increases despite the use of paravirtualization as the L2 hypervisor technique.Therefore, a L1 hypervisor that uses second generation hardware support for virtu-alization on x86 architectures performs better for memory management.

The thread test stresses many divergent paths through the hypervisor, suchas system calls, context switching, creation of address spaces and injection of pagefaults [11]. Figure 6.5 summarizes the results of a thread test. The test created 1000threads and 4 mutexes. Each thread locks a mutex, yields the CPU and unlocksthe mutex afterwards. These actions are performed in a loop so concurrency isplaced on each mutex. The results of the test are equal to the results of the memoryperformance test. This indicates that the thread test depends heavily on memorymanagement.

Page 72: Nesting Virtual Machines in Virtualization Test …nyh/nested/Thesis_OlivierBergh...Nesting Virtual Machines in Virtualization Test Frameworks Dissertation submitted on May 2010 to

6.3. I/O PERFORMANCE 61

1

2

4

8

16

32

64

128

256

512

1024

native

VirtualBox (HV)

VMware (HV)

VMware (DBT)

Xen (HV)

Xen (PV)

KVM (HV)

VirtualBox (HV) - VirtualBox (DBT)

VirtualBox (HV) - Xen (PV)

VMware (HV) - VirtualBox (DBT)

VMware (HV) - VM

ware (DBT)

VMware (HV) - Xen (PV)

VMware (DBT) - Xen (PV)

Xen (HV) - VirtualBox (DBT)

Xen (HV) - Xen (PV)

KVM (HV) - VirtualBox (DBT)

KVM (HV) - VM

ware (DBT)

KVM (HV) - Xen (PV)

Dur

atio

n (in

sec

onds

)

Memory Read RNDMemory Read SEQMemory Write RNDMemory Write SEQ

Figure 6.4: Memory performance for L1 and L2 guests (lower is better).

6.3 I/O performance

We evaluate the I/O performance in this section. There are many I/O devices sowe selected two major devices. The first test measured the network throughput andthe second test measured the reads and writes from and to the hard disk. The firstsubsection elaborates on the results of the network test and the second subsectionpresents the results of the disk I/O.

6.3.1 Network I/O

The network throughput was tested using the iperf benchmark which measuresthe TCP throughput during 10 seconds. Figure 6.6 shows that there is little or noperformance degradation between native and L1 guests. The bottleneck in thesetests was the 100 Mbit/s network card and not the virtualization. The results canbe different for a network card with a higher throughput. The performance overheadfor L2 guests heavily depends on which setup is used. The lowest performance wasmeasured for VirtualBox on top of Xen.

The nine nested setups can clearly be divided in groups of three. The nestedsetups in the first group perform rather poorly for network I/O with a throughput

Page 73: Nesting Virtual Machines in Virtualization Test …nyh/nested/Thesis_OlivierBergh...Nesting Virtual Machines in Virtualization Test Frameworks Dissertation submitted on May 2010 to

6.3. I/O PERFORMANCE 62

1

2

4

8

16

32

64

128

256

512

1024

2048

4096

8192

16384

32768

native

VirtualBox (HV)

VMware (HV)

VMware (DBT)

Xen (HV)

KVM (HV)

VirtualBox (HV) - VirtualBox (DBT)

VirtualBox (HV) - Xen (PV)

VMware (HV) - VirtualBox (DBT)

VMware (HV) - VM

ware (DBT)

VMware (HV) - Xen (PV)

VMware (DBT) - Xen (PV)

Xen (HV) - VirtualBox (DBT)

Xen (HV) - Xen (PV)

KVM (HV) - VirtualBox (DBT)

KVM (HV) - VM

ware (DBT)

KVM (HV) - Xen (PV)

Dur

atio

n (in

sec

onds

)

Threads

Figure 6.5: Threads performance for native, L1 guests and L2 guests with sysbenchbenchmark (lower is better).

of less than 50 Mbit/s. The second group achieved reasonable performance. Theyare the nested setups with a throughput between 50 Mbit/s and 80 Mbit/s. Thelast group has a good performance with a network throughput of more than 80Mbit/s nearing native performance. This group has little performance degradationcompared to L1 guests and native, taken into account that the network card is thebottleneck.

6.3.2 Disk I/O

We measured the disk I/O performance using two tests. The first test is a file I/Otest of sysbench. In the preparation stage, it creates a specified number of files witha specified total size. During the test, each thread performs specified I/O operationson this set of files.

In figure 6.7, we can observe that the setups with virtualization perform muchbetter than in the native case. These results are unusual since we would expectthat the virtualization layer adds a certain performance overhead. The test in theL1 VMware Workstation guest took 0.5 seconds while the same test on the nativeplatform took 37.7 seconds. This suggests that some optimization provides a speed-up for the disk performance. The optimization is not a feature of the hard disk

Page 74: Nesting Virtual Machines in Virtualization Test …nyh/nested/Thesis_OlivierBergh...Nesting Virtual Machines in Virtualization Test Frameworks Dissertation submitted on May 2010 to

6.3. I/O PERFORMANCE 63

0

10

20

30

40

50

60

70

80

90

100

native

VirtualBox (HV)

VMware (HV)

Xen (HV)

KVM (HV)

VirtualBox (HV) - VirtualBox (DBT)

VirtualBox (HV) - Xen (PV)

VMware (HV) - VirtualBox (DBT)

VMware (HV) - VM

ware (DBT)

Xen (HV) - VirtualBox (DBT)

Xen (HV) - Xen (PV)

KVM (HV) - VirtualBox (DBT)

KVM (HV) - VM

ware (DBT)

KVM (HV) - Xen (PV)

Net

wor

k th

roug

hput

(in

Mbi

t/s)

iperf

Figure 6.6: Network performance for native, L1 guests and L2 guests (higher isbetter).

because it is not possible for a virtual machine to read from a disk faster than anative machine. The documentation of the iozone benchmark suggests that theprocessor cache and buffer cache are helping out for smaller files. It advises to runtheir benchmark with a maximum file size that is larger than the available memoryon the computer system. The results of these tests are shown in figure 6.8. In thefigure we can clearly see that optimizations are obtained by the use of the caches.The theoretical speed of the hard disk is marked on the graphs. The throughput ofthe I/O exceeds this theoretical speed, indicating that the measured values are notthe real I/O performance.

The real I/O performance can be found when using larger files. In figure 6.8(a),the measured performance for the L1 VMware Workstation guest is lower than inthe native test for larger files. The performance in the L2 VMware Workstationguest is higher than in the L1 guest, but the test stopped after 2 Gb files sincethe hard disk of the nested guest was not large enough. The iozone tests showedthat the sysbench tests were inaccurate due to caching and suggest that real I/Operformance can be measured with writing and reading large files. In order to obtaingood performance results, these tests should be conducted for larger files than thetests we ran.

Page 75: Nesting Virtual Machines in Virtualization Test …nyh/nested/Thesis_OlivierBergh...Nesting Virtual Machines in Virtualization Test Frameworks Dissertation submitted on May 2010 to

6.4. CONCLUSION 64

0

10

20

30

40

50

60

70

80

90

100

native

VirtualBox (HV)

VMware (HV)

VMware (DBT)

Xen (HV)

KVM (HV)

VirtualBox (HV) - VirtualBox (DBT)

VirtualBox (HV) - Xen (PV)

VMware (HV) - VirtualBox (DBT)

VMware (HV) - VM

ware (DBT)

VMware (HV) - Xen (PV)

Xen (HV) - VirtualBox (DBT)

Xen (HV) - Xen (PV)

KVM (HV) - VirtualBox (DBT)

KVM (HV) - VM

ware (DBT)

KVM (HV) - Xen (PV)

Dur

atio

n (in

sec

onds

)

File I/O

Figure 6.7: File I/O performance for native, L1 guests and L2 guests with sysbenchbenchmark (lower is better).

6.4 Conclusion

Performance overhead for nested virtualization is linear for CPU performance andexponential for memory performance, except for the memory performance of nestedsetups with paravirtualization as L2 hypervisor technique. Paravirtualization mini-mizes the performance degradation. For CPU performance in nested virtualization,the setup that uses dynamic binary translation as the L1 hypervisor was the onlyoutlier, the other setups performed adequate. For memory performance, the setupsthat use paravirtualization as the L2 hypervisor performed as well as the L1 guests.The results for the I/O performance was split into network and disk performance.The network performance could be divided into three groups. The first group hadnear native performance, the second group performed acceptable and the last groupperformed rather poorly. The results for disk performance were not accurate enoughsince real disk I/O was difficult to measure due to caching. More testing is requiredfor disk I/O performance to reach an accurate conclusion.

Page 76: Nesting Virtual Machines in Virtualization Test …nyh/nested/Thesis_OlivierBergh...Nesting Virtual Machines in Virtualization Test Frameworks Dissertation submitted on May 2010 to

6.4. CONCLUSION 65

16384

32768

65536

131072

262144

524288

1048576

2097152

4194304

64 128256

5121024

20484096

819216384

32768

65536

131072

262144

524288

1048576

2097152

4194304

8388608

16777216

Spe

ed (

in K

Byt

es/s

ec)

File size (in KBytes)

SATA2 speed (3.0 Gbit/s)

write nativewrite L1write L2

(a) write test

65536

131072

262144

524288

1048576

2097152

4194304

8388608

16777216

64 128256

5121024

20484096

819216384

32768

65536

131072

262144

524288

1048576

2097152

4194304

8388608

16777216

Spe

ed (

in K

Byt

es/s

ec)

File size (in KBytes)

SATA2 speed (3.0 Gbit/s)

read nativeread L1read L2

(b) read test

Figure 6.8: File I/O performance for native, L1 guests and L2 guests with iozonebenchmark.

Page 77: Nesting Virtual Machines in Virtualization Test …nyh/nested/Thesis_OlivierBergh...Nesting Virtual Machines in Virtualization Test Frameworks Dissertation submitted on May 2010 to

CHAPTER 7

Conclusions

This chapter concludes the work of this thesis. In the first section, we elaborate onthe results and conclusions of the previous chapters. The last section proposes somefuture work for nested virtualization.

7.1 Nested virtualization and performance results

Nested virtualization on the x86 architecture can be a useful tool for the developmentof test setups for research purposes, creating test frameworks for hypervisors, etc. Inchapter 5, we investigated which techniques are the most suitable choice for the L1and L2 hypervisors. The most suitable L1 hypervisor technique is a hardware basedvirtualization solution. When comparing the results of the setups that use softwaresolutions for the L1 hypervisor with the results of the setups that use hardwaresupport as technique, we saw that the latter resulted in more working setups. Thehardware support of the processor is preferably the second generation hardwaresupport for virtualization. The use of EPT or NPT improves the performance forthe memory management and releases the L1 hypervisor from maintaining shadowtables. These shadow tables form a problem for certain nested setups when used inthe L1 hypervisor. Hypervisors take shortcuts in order to improve the performanceof the software based memory management. This can lead to failures in nestingvirtual machines. The second generation hardware support avoids these problemsby providing a hardware supported MMU and appears to be the most advisablechoice for the L1 hypervisor. Table 5.13 shows an overview of the results of allnested setups.

The best technique for the L2 hypervisor is paravirtualization. The only workingsetup in section 5.1 used paravirtualization for the L2 hypervisor and except forone, all nested setups with paravirtualization as the L2 hypervisor worked for a L1

Page 78: Nesting Virtual Machines in Virtualization Test …nyh/nested/Thesis_OlivierBergh...Nesting Virtual Machines in Virtualization Test Frameworks Dissertation submitted on May 2010 to

7.2. FUTURE WORK 67

hypervisor based on hardware support. Dynamic binary translation also performedwell on top of hardware support when the processor provided the hardware supportedMMU. Without the use of EPT or NPT, dynamic binary translation as the L2hypervisor results in two levels of shadow tables which does not work very well.The performance results in chapter 6 support the decision that paravirtualizationis the most suitable choice for the L2 hypervisor. The processor performance iscomparable to other nested setups and the memory performance is comparable to asingle layer of virtualization.

Nested hardware support is the great absentee in the whole nesting story. Noneof the hypervisors provided the hardware extensions for virtualization to their guests.This prevented the installation of a L2 hypervisor based on hardware support withinthese guests. KVM and Xen are working on nested hardware support. KVM alreadyreleased nested SVM support but the implementation is still in its infancy. Thedevelopment of nested VMX support takes more time because of the differencesbetween AMD SVM and Intel VT-x. Xen is focussing on nested hardware supportfor VMX and a proof of concept has been made that can successfully boot a nestedvirtual machine to an early stage.

For the performance results, we observed that the processor performance degra-dation was linear for nested virtualization with hardware support for the L1 hyper-visor. The memory performance decreased greatly for nested virtualization. Theonly exception is the use of paravirtualization for the L2 hypervisor. In these ex-periments, no memory overhead was introduced in the nested setups. The othernested setups suffered from a significant memory overhead and had memory accesstimes that were on average 128 times slower when compared to native. The I/Operformance results were not accurate and need more work to gain an accurate viewof the I/O performance degradation for nested virtualization.

7.2 Future work

One area of future work is testing the nested hardware support when KVM andXen release their updated versions. The release of nested hardware support allowsto test other nested setups than tested in this thesis and might provide new results.An extra task could be to check whether other virtualization software vendors havestarted to develop nested hardware support.

Throughout this thesis, we focussed on software solutions and hardware supportas techniques for a hypervisor. The first generation and second generation hardwaresupport were compared to see what impact the hardware supported MMU had.Lately, hardware vendors are working on directed I/O for virtualization and morespecifically, Intel is working on its Intel VT-d and AMD on its IOMMU. Anotherarea of feature work would be to investigate whether directed I/O can be usefulfor nested virtualization and what the performance impact of these new generationhardware support is.

Page 79: Nesting Virtual Machines in Virtualization Test …nyh/nested/Thesis_OlivierBergh...Nesting Virtual Machines in Virtualization Test Frameworks Dissertation submitted on May 2010 to

Bibliography

[1] S. Nanda and T. Chiueh, “A survey on virtualization technologies,” tech. rep.,Stony Brook University, 2005.

[2] VMware, “Virtualization History.” http://www.vmware.com/virtualization/history.html. Last accessed on May 19, 2010.

[3] VMware, “Vmware: Virtualization overview.” http://www.vmware.com/pdf/virtualization.pdf. Last accessed on May 19, 2010.

[4] S. Adabala, V. Chadha, P. Chawla, R. Figueiredo, J. Fortes, I. Krsul, A. Mat-sunaga, M. Tsugawa, J. Zhang, M. Zhao, L. Zhu, and X. Zhu, “From virtualizedresources to virtual computing grids: the In-VIGO system,” Future GenerationComputer Systems, vol. 21, no. 6, pp. 896–909, 2005.

[5] J. E. Smith and R. Nair, Virtual Machines: Versatile Platforms for Systemsand Processes. The Morgan Kaufmann Series in Computer Architecture andDesign, San Francisco, CA, USA: Morgan Kaufmann Publishers Inc., 2005.

[6] W. Stallings, Operating Systems: Internals and Design Principles. PrenticeHall, 5th ed., 2004.

[7] J. E. Smith and R. Nair, “The architecture of virtual machines,” Computer,vol. 38, pp. 32–38, May 2005.

[8] G. J. Popek and R. P. Goldberg, “Formal requirements for virtualizable thirdgeneration architectures,” Commun. ACM, vol. 17, no. 7, pp. 412–421, 1974.

[9] Intel Corporation, IntelR©64 and IA-32 Architectures Software Developer’s Man-ual, Dec. 2009. http://www.intel.com/products/processor/manuals/. Lastaccessed on May 19, 2010.

[10] VMware, “Understanding Full Virtualization, Paravirtualization,and Hardware Assist.” http://www.vmware.com/files/pdf/VMware_paravirtualization.pdf, Sept. 2007. Last accessed on May 19, 2010.

Page 80: Nesting Virtual Machines in Virtualization Test …nyh/nested/Thesis_OlivierBergh...Nesting Virtual Machines in Virtualization Test Frameworks Dissertation submitted on May 2010 to

BIBLIOGRAPHY 69

[11] K. Adams and O. Agesen, “A comparison of software and hardware techniquesfor x86 virtualization,” in ASPLOS-XII: Proceedings of the 12th internationalconference on Architectural support for programming languages and operatingsystems, (New York, NY, USA), pp. 2–13, ACM, Oct. 2006.

[12] E. Witchel and M. Rosenblum, “Embra: fast and flexible machine simulation,”in SIGMETRICS ’96: Proceedings of the 1996 ACM SIGMETRICS interna-tional conference on Measurement and modeling of computer systems, (NewYork, NY, USA), pp. 68–79, ACM, 1996.

[13] J. D. Gelas, “Hardware Virtualization: the Nuts and Bolts.” http://it.anandtech.com/printarticle.aspx?i=3263, Mar. 2008. Last accessed onMay 19, 2010.

[14] A. Vasudevan, R. Yerraballi, and A. Chawla, “A high performance Kernel-Less Operating System architecture,” in ACSC ’05: Proceedings of the Twenty-eighth Australasian conference on Computer Science, (Darlinghurst, Australia,Australia), pp. 287–296, Australian Computer Society, Inc., 2005.

[15] K. Onoue, Y. Oyama, and A. Yonezawa, “Control of system calls from outsideof virtual machines,” in SAC ’08: Proceedings of the 2008 ACM symposium onApplied computing, (New York, NY, USA), pp. 2116–1221, ACM, 2008.

[16] J. Sugerman, G. Venkitachalam, and B.-H. Lim, “Virtualizing i/o devices onvmware workstation’s hosted virtual machine monitor,” in Proceedings of theGeneral Track: 2002 USENIX Annual Technical Conference, (Berkeley, CA,USA), pp. 1–14, USENIX Association, 2001.

[17] J. Fisher-Ogden, “Hardware Support for Efficient Virtualization.”

[18] Y. Dong, J. Dai, Z. Huang, H. Guan, K. Tian, and Y. Jiang, “Towards high-quality i/o virtualization,” in SYSTOR ’09: Proceedings of SYSTOR 2009:The Israeli Experimental Systems Conference, (New York, NY, USA), pp. 1–8,ACM, 2009.

[19] Advanced Micro Devices, Inc., “AMD-V Nested Paging,” tech. rep., Ad-vanced Micro Devices, Inc., July 2008. http://developer.amd.com/assets/NPT-WP-1%201-final-TM.pdf. Last accessed on May 19, 2010.

[20] VMware, “Software and Hardware Techniques for x86 Virtualization.” http://www.vmware.com/files/pdf/software_hardware_tech_x86_virt.pdf. Lastaccessed on May 19, 2010.

[21] A. Whitaker, M. Shaw, and S. D. Gribble, “Denali: Lightweight Virtual Ma-chines for Distributed and Networked Applications,” in In Proceedings of theUSENIX Annual Technical Conference, 2002.

[22] P. Barham, B. Dragovic, K. Fraser, S. Hand, T. Harris, A. Ho, R. Neugebauer,I. Pratt, and A. Warfield, “Xen and the art of virtualization,” SIGOPS Oper.Syst. Rev., vol. 37, no. 5, pp. 164–177, 2003.

Page 81: Nesting Virtual Machines in Virtualization Test …nyh/nested/Thesis_OlivierBergh...Nesting Virtual Machines in Virtualization Test Frameworks Dissertation submitted on May 2010 to

BIBLIOGRAPHY 70

[23] J. R. Santos, Y. Turner, G. Janakiraman, and I. Pratt, “Bridging the gapbetween software and hardware techniques for i/o virtualization,” in ATC’08:USENIX 2008 Annual Technical Conference on Annual Technical Conference,(Berkeley, CA, USA), pp. 29–42, USENIX Association, 2008.

[24] VMware, “A Performance Comparison of Hypervisors.” http://www.vmware.com/pdf/hypervisor_performance.pdf, Feb. 2007. Last accessed on May 19,2010.

[25] Intel Corporation, “IntelR©Virtualization Technology for Directed I/O,”tech. rep., Intel Corporation, Sept. 2008. http://download.intel.com/technology/computing/vptech/Intel(r)_VT_for_Direct_IO.pdf. Last ac-cessed on May 19, 2010.

[26] Intel Corporation, “A superior hardware platform for server virtualiza-tion,” tech. rep., Intel Corporation, 2009. http://download.intel.com/business/resources/briefs/xeon5500/xeon_5500_virtualization.pdf.Last accessed on May 19, 2010.

[27] Intel Corporation, “IntelR©Virtualization Technology Specification for theIA-32 IntelR©Architecture,” tech. rep., Intel Corporation, Apr. 2005.http://dforeman.cs.binghamton.edu/~foreman/552pages/Readings/intel05virtualization.pdf. Last accessed on May 19, 2010.

[28] Advanced Micro Devices, Inc., AMD64 Virtualization Codenamed “Paci-fica” Technology: Secure Virtual Machine Architecture Reference Man-ual, May 2005. http://www.mimuw.edu.pl/~vincent/lecture6/sources/amd-pacifica-specification.pdf. Last accessed on May 19, 2010.

[29] G. Neiger, A. Santoni, F. Leung, D. Rodgers, and R. Uhlig,“IntelR©Virtualization Technology: Hardware Support for Efficient Pro-cessor Virtualization,” IntelR©Virtualization Technology, vol. 10, pp. 167–178,Aug. 2006.

[30] G. Gerzon, “IntelR©Virtualization Technology: Processor Virtualization Ex-tensions and IntelR©Trusted execution Technology.” http://software.intel.com/file/1024. Last accessed on May 19, 2010.

[31] VirtualBox, “Virtualbox.” http://www.virtualbox.org. Last accessed onMay 19, 2010.

[32] VirtualBox, “Virtualbox architecture.” http://www.virtualbox.org/wiki/VirtualBox_architecture. Last accessed on May 19, 2010.

[33] VMware, “Vmware.” http://www.vmware.com. Last accessed on May 19, 2010.

[34] Xen, “Xen.” http://www.xen.org/. Last accessed on May 19, 2010.

[35] Xen, “Dom0 - xen wiki.” http://wiki.xensource.com/xenwiki/Dom0. Lastaccessed on May 19, 2010.

Page 82: Nesting Virtual Machines in Virtualization Test …nyh/nested/Thesis_OlivierBergh...Nesting Virtual Machines in Virtualization Test Frameworks Dissertation submitted on May 2010 to

BIBLIOGRAPHY 71

[36] KVM, “Kvm.” http://www.linux-kvm.org. Last accessed on May 19, 2010.

[37] A. Shah, “Kernel-based virtualization with kvm,” Linux Magazine, vol. 86,pp. 37–39, 2008.

[38] F. Bellard, “Qemu, a fast and portable dynamic translator,” in ATEC ’05:Proceedings of the annual conference on USENIX Annual Technical Conference,(Berkeley, CA, USA), pp. 41–41, USENIX Association, 2005.

[39] I. Habib, “Virtualization with kvm,” Linux J., vol. 2008, no. 166, p. 8, 2008.

[40] H. C. Lauer and D. Wyeth, “A recursive virtual machine architecture,” in Pro-ceedings of the workshop on virtual computer systems, (New York, NY, USA),pp. 113–116, ACM, 1973.

[41] G. Belpaire and N.-T. Hsu, “Formal properties of recursive virtual machinearchitectures.,” in SOSP ’75: Proceedings of the fifth ACM symposium on Op-erating systems principles, (New York, NY, USA), pp. 89–96, ACM, 1975.

[42] M. Baker and R. Buyya, “Cluster computing at a glance.” Cluster Computing,Chapter 1, 1999.

[43] M. Baker, “Cluster computing white paper,” Tech. Rep. Version 2.0, IEEEComputer Scociety Task Force on Cluster Computing (TFCC), December 2000.

[44] M. L. Bote-Lorenzo, Y. A. Dimitriadis, and E. Gomez-Sanchez, “Grid char-acteristics and uses: a grid definition,” in Proceedings of the First EuropeanAcross Grids Conference (AG’03), vol. 2970 of Lecture Notes in Computer Sci-ence, (Heidelberg), pp. 291–298, Springer-Verlag, 2004.

[45] I. Foster, “What is the grid? a three point checklist,” 2002.

[46] I. Foster, Y. Zhao, I. Raicu, and S. Lu, “Cloud computing and grid computing360-degree compared,” in Proceedings of the Grid Computing EnvironmentsWorkshop, 2008. GCE ’08, pp. 1–10, Nov. 2008.

[47] R. Buyya, C. S. Yeo, S. Venugopal, J. Broberg, and I. Brandic, “Cloud com-puting and emerging IT platforms: Vision, hype, and reality for deliveringcomputing as the 5th utility,” Future Gener. Comput. Syst., vol. 25, no. 6,pp. 599–616, 2009.

[48] A. Kivity, “Avi kivity’s blog.” http://avikivity.blogspot.com/. Last ac-cessed on May 19, 2010.

[49] A. Graf and J. Roedel, “Add support for nested svm (kernel).” http://thread.gmane.org/gmane.comp.emulators.kvm.devel/21119. Last accessed on May19, 2010.

[50] Xen, “Xen summit asia at intel 2009.” http://www.xen.org/xensummit/xensummit_fall_2009.html. Last accessed on May 19, 2010.

Page 83: Nesting Virtual Machines in Virtualization Test …nyh/nested/Thesis_OlivierBergh...Nesting Virtual Machines in Virtualization Test Frameworks Dissertation submitted on May 2010 to

Appendices

Page 84: Nesting Virtual Machines in Virtualization Test …nyh/nested/Thesis_OlivierBergh...Nesting Virtual Machines in Virtualization Test Frameworks Dissertation submitted on May 2010 to

APPENDIX A

Virtualization software

A.1 VirtualBox

VirtualBox is a hypervisor that performs full virtualization. It started as proprietarysoftware but currently comes under a Personal Use and Evaluation License (PUEL).The software is free of charge for personal and educational use. VirtualBox was ini-tially created by Innotek. This company was later purchased by Sun Microsystems,which in turn was recently purchased by Oracle Corporation.

This section contains information that is almost completely extracted from thewebsite of VirtualBox. You can see [32] for more information. It is presented hereto give extra information about the internals of VirtualBox.

The host operating system runs each VirtualBox virtual machine as an applica-tion. VirtualBox takes over control over a large part of the computer, executing acomplete OS with its own guest processes, drivers, and devices inside this virtualmachine process. The host OS does not notice much of this, only that an extraprocess is started. So a virtual machine is just another process in the host operatingsystem. Thus, this implementation is an example of an hosted hypervisor. Initially,VirtualBox used dynamic binary translation as implementation approach for theirhypervisor. However, with the release of hardware support for virtualization, it alsoprovides Intel VT-x and AMD SVM support.

Upon starting VirtualBox, one extra process gets started; the VirtualBox “ser-vice” process VBoxSVC. This service is running in the background to keep track ofall the processes involved, i.e. it keeps track of which virtual machines are runningand what state they are in. It is automatically started by the first GUI process.

The guts of the VirtualBox implementation are hidden in a shared library,

Page 85: Nesting Virtual Machines in Virtualization Test …nyh/nested/Thesis_OlivierBergh...Nesting Virtual Machines in Virtualization Test Frameworks Dissertation submitted on May 2010 to

A.1. VIRTUALBOX 74

VBoxVMM.dll, or VBoxVMM.so in linux. This library contains all complicatedand messy things that make a x86 architecture. This can be considered as a static“backend”, or black box. Around this backend, many frontends can be written with-out having to mess with the gory details of x86 virtualization. VirtualBox alreadycomes with several frontends: the Qt GUI, a command-line utility VBoxManage, a“plain” GUI based on SDL and remote interfaces.

The host operating system does not need much tweaking to support virtualiza-tion. A ring 0 driver is loaded in the host operating system for VirtualBox to work.This driver does not mess the scheduling or process management of the host oper-ating system. The entire guest OS, including its own hundreds of processes, is onlyscheduled when the host OS gives the VM process a timeslice. The ring 0 driver onlyperforms a few specific tasks: allocating physical memory for the VM, saving andrestoring CPU registers and descriptor tables when a host interrupt occurs whilea guest’s ring-3 code is executing (e.g. when the host OS wants to reschedule),switching from host ring 3 to guest context, and enabling or disabling VT-x etc.support.

When running virtual machine, the computer can be in one of several states,from the processor’s point of view:

1. The CPU can be executing host ring 3 code (e.g. from other host processes),or host ring 0 code, just as it would be if VirtualBox was not running.

2. The CPU can be emulating guest code (within the ring 3 host VM process).Basically, VirtualBox tries to run as much guest code natively as possible. Butit can (slowly) emulate guest code as a fallback when it is not sure what theguest system is doing, or when the performance penalty of emulation is nottoo high. The VirtualBox emulator is based on QEMU and typically steps inwhen:

• guest code disables interrupts and VirtualBox cannot figure out whenthey will be switched back on (in these situations, VirtualBox actuallyanalyzes the guest code using its own disassembler);

• for execution of certain single instructions; this typically happens whena nasty guest instruction such as LIDT has caused a trap and needs tobe emulated;

• for any real mode code (e.g. BIOS code, a DOS guest, or any operatingsystem startup).

3. The CPU can be running guest ring 3 code natively (within the ring 3 hostVM process). With VirtualBox, we call this ”raw ring 3”. This is, of course,the most efficient way to run the guest, and hopefully we don’t leave this modetoo often. The more we do, the slower the VM is compared to a native OS,because all context switches are very expensive.

4. The CPU can be running guest ring 0 code natively. Here is where thingsget tricky: the guest only thinks it’s running ring 0 code, but VirtualBox has

Page 86: Nesting Virtual Machines in Virtualization Test …nyh/nested/Thesis_OlivierBergh...Nesting Virtual Machines in Virtualization Test Frameworks Dissertation submitted on May 2010 to

A.1. VIRTUALBOX 75

fooled the guest OS to instead enter ring 1 (which is normally unused withx86 operating systems).

The guest operating system is thus manipulated to actually execute its ring 0code in ring 1. This causes a lot of additional instruction faults, as ring 1 is notallowed to execute any privileged instructions. With each of these faults, the hyper-visor must step in and emulate the code to achieve the desired behavior. While thisnormally works perfectly well, the resulting performance would be very poor sinceCPU faults tend to be very expensive and there will be thousands and thousands ofthem per second. To make things worse, running ring 0 code in ring 1 causes somenasty occasional compatibility problems. Because of design flaws in the x86 archi-tecture that were never addressed, some system instructions that should cause faultswhen called in ring 1 unfortunately do not. Instead, they just behave differently. Itis therefore imperative that these instructions be found and replaced.

To address these two issues, VirtualBox has come up with a set of unique tech-niques that they call ”Patch Manager” (PATM) and ”Code Scanning and AnalysisManager” (CSAM). Before executing ring 0 code, the code is scanned recursivelyto discover problematic instructions. In-place patching is then performed, i.e. re-placing the instruction with a jump to hypervisor memory where an integrated codegenerator has placed a more suitable implementation. In reality, this is a very com-plex task as there are lots of odd situations to be discovered and handled correctly.So, with its current complexity, one could argue that PATM is an advanced in-siturecompiler.

In addition, every time a fault occurs, the fault’s cause is analyzed to determineif it is possible to patch the offending code to prevent it from causing more expensivefaults in the future. This turns out to work very well, and it can reduce the faultscaused by their virtualization to a rate that performs much better than a typicalrecompiler, or even VT-x technology, for that matter.

Page 87: Nesting Virtual Machines in Virtualization Test …nyh/nested/Thesis_OlivierBergh...Nesting Virtual Machines in Virtualization Test Frameworks Dissertation submitted on May 2010 to

APPENDIX B

Details of the nested virtualization in practice

This chapter gives some detailed information about error messages and warningsthat occurred during the tests in chapter 5. For each setup, information about theoperating system and the hypervisor version is given. The setups are combinedinto sections so that each section contains the setups that have the same bottomlayer hypervisor technique. Note that the setups with a nested hypervisor basedon hardware support for virtualization on x86 architectures are left out because thenested hypervisor could not be installed.

B.1 Dynamic binary translation

B.1.1 VirtualBox

VirtualBox within VirtualBoxHost operating system: Ubuntu 9.10 64bitL1 hypervisor: VirtualBox 3.1.6 r59338L1 guest: Ubuntu 9.04 32bitL2 hypervisor: VirtualBox 3.1.6 r59338L2 guest: Ubuntu 9.04 32bit

The L1 guest booted and ran correctly using dynamic binary translation. Theimage of the L2 guest was copied to the L1 guest. The L2 guest tried to start butdid not show any sign of activity. The guest showed the following output:

Boot from (hd0 , 0 ) ext3 <HDD−ID>Sta r t i ng up . . .

This screen remained for several hours without aborting or continuing.

Page 88: Nesting Virtual Machines in Virtualization Test …nyh/nested/Thesis_OlivierBergh...Nesting Virtual Machines in Virtualization Test Frameworks Dissertation submitted on May 2010 to

B.1. DYNAMIC BINARY TRANSLATION 77

VMware Workstation within VirtualBoxHost operating system: Ubuntu 9.10 64bitL1 hypervisor: VirtualBox 3.1.6 r59338L1 guest: Ubuntu 9.04 32bitL2 hypervisor: VMware Workstation 6.5.3 build-185404L2 guest: Ubuntu 9.04 32bit

The L1 guest booted and ran correctly using dynamic binary translation. Theimage of the L2 guest was copied to the L1 guest. The L2 guest attempted to startbut the L1 guest crashed and showed following error:

A c r i t i c a l e r r o r has occurred whi l e running the v i r t u a l machine and themachine execut ion has been stopped .

For help , p l e a s e s ee the Community s e c t i o n on http : //www. v i r tua lbox . org oryour support cont rac t . P lease prov ide the contents o f the l og f i l e VBox .l og and the image f i l e VBox . png , which you can f i nd in the /home/ o l i v i e r/ . VirtualBox/Machines/vbox−vmware/Logs d i r e c to ry , as we l l as ad e s c r i p t i o n o f what you were doing when t h i s e r r o r happened . Note thatyou can a l s o a c c e s s the above f i l e s by s e l e c t i n g Show Log from theMachine menu o f the main VirtualBox window .

Press OK i f you want to power o f f the machine or p r e s s Ignore i f you want tol eave i t as i s f o r debugging . P lease note that debugging r e qu i r e s

s p e c i a l knowledge and too l s , so i t i s recommended to p r e s s OK now .

The log of the L1 guest showed:

1 PATM: patmR3RefreshPatch : succeeded to r e f r e s h patch at c01526102 PATM: patmR3RefreshPatch : succeeded to r e f r e s h patch at c013f2503 PATM: patmR3RefreshPatch : succeeded to r e f r e s h patch at c05026504 PATM: Di sab l ing IDT e f patch handler c01052f05 PATM: patmR3RefreshPatch : succeeded to r e f r e s h patch at c01052f06 PIIX3 ATA: Ctl#0: RESET, DevSel=0 AIOIf=0 CmdIf0=0x20 (−1 usec ago ) CmdIf1=0

x00 (−1 usec ago )7 PIIX3 ATA: Ctl#0: f i n i s h e d p ro c e s s i ng RESET8 PATM: patmR3RefreshPatch : succeeded to r e f r e s h patch at c015b7a09 PIIX3 ATA: Ctl#1: RESET, DevSel=0 AIOIf=0 CmdIf0=0x00 (−1 usec ago ) CmdIf1=0

x00 (−1 usec ago )10 PIIX3 ATA: Ctl#1: f i n i s h e d p ro c e s s i ng RESET11 PATM: Disab le b lock at c0745963 − wr i t e c07459b5−c07459b912 PATM: Disab le b lock at c0745bab − wr i t e c0745c0f−c0745c1313 PATM: Disab le b lock at c0746d22 − wr i t e c0746d8f−c0746d9314 PATM: Disab le b lock at c0763a90 − wr i t e c0763aed−c0763af115 PATM: patmR3RefreshPatch : succeeded to r e f r e s h patch at c018459016 PCNet#0: I n i t : s s32=1 GCRDRA=0x36596000 [ 3 2 ] GCTDRA=0x36597000 [ 1 6 ]17 f a t a l e r r o r in r ecompi l e r cpu : t r i p l e f a u l t

Xen within VirtualBoxHost operating system: Ubuntu 9.10 64bitL1 hypervisor: VirtualBox 3.1.6 r59338L2 hypervisor: xen-3.0-x86 32pDomain 0 (L2): openSUSE 11.3 build 0475L2 guest: openSUSE 11.3 build 0475

The L1 guest booted and crashed almost immediately. The L1 guest showed thefollowing error:

Page 89: Nesting Virtual Machines in Virtualization Test …nyh/nested/Thesis_OlivierBergh...Nesting Virtual Machines in Virtualization Test Frameworks Dissertation submitted on May 2010 to

B.1. DYNAMIC BINARY TRANSLATION 78

A c r i t i c a l e r r o r has occurred whi l e running the v i r t u a l machine and themachine execut ion has been stopped .

For help , p l e a s e s ee the Community s e c t i o n on http : //www. v i r tua lbox . org oryour support cont rac t . P lease prov ide the contents o f the l og f i l e VBox .l og and the image f i l e VBox . png , which you can f i nd in the /home/ o l i v i e r/ . VirtualBox/Machines/vbox−vmware/Logs d i r e c to ry , as we l l as ad e s c r i p t i o n o f what you were doing when t h i s e r r o r happened . Note thatyou can a l s o a c c e s s the above f i l e s by s e l e c t i n g Show Log from theMachine menu o f the main VirtualBox window .

Press OK i f you want to power o f f the machine or p r e s s Ignore i f you want tol eave i t as i s f o r debugging . P lease note that debugging r e qu i r e s

s p e c i a l knowledge and too l s , so i t i s recommended to p r e s s OK now .

The log output was similar to the “VirtualBox within VMware Workstation”output (see subsection B.1.2).

B.1.2 VMware Workstation

VirtualBox within VMware WorkstationHost operating system: Ubuntu 9.10 64bitL1 hypervisor: VMware Workstation 7.0.1 build-227600L1 guest: Ubuntu 9.04 32bitL2 hypervisor: VirtualBox 3.1.6 r59338L2 guest: Ubuntu 9.04 32bit

The L1 guest booted and ran correctly using dynamic binary translation. Theimage of the L2 guest was copied to the L1 guest. When starting the L2 guest, thebootloader showed

Boot from (hd0 , 0 ) ext3 <HDD−ID>Sta r t i ng up . . .

and afterwards, VirtualBox aborted the start and showed following message:

1 PATM: patmR3RefreshPatch : succeeded to r e f r e s h patch at c05006702 PATM: patmR3RefreshPatch : succeeded to r e f r e s h patch at c05005603 PATM: Fa i l ed to r e f r e s h d i r t y patch at c013f370 . D i sab l ing i t .4 PATM: patmR3RefreshPatch : succeeded to r e f r e s h patch at c0154bc05 PATM: patmR3RefreshPatch : succeeded to r e f r e s h patch at c0109d006 PATM: patmR3RefreshPatch : succeeded to r e f r e s h patch at c0122d507 PATM: patmR3RefreshPatch : succeeded to r e f r e s h patch at c05006d08 PATM: patmR3RefreshPatch : succeeded to r e f r e s h patch at c0109c809 PATM: patmR3RefreshPatch : succeeded to r e f r e s h patch at c0180ac0

10 PATM: patmR3RefreshPatch : succeeded to r e f r e s h patch at c015a04011 PATM: patmR3RefreshPatch : succeeded to r e f r e s h patch at c013f98012 PATM: patmR3RefreshPatch : succeeded to r e f r e s h patch at c015488013 PATM: patmR3RefreshPatch : succeeded to r e f r e s h patch at c0154a0014 PATM: patmR3RefreshPatch : succeeded to r e f r e s h patch at c01323c015 PATM: patmR3RefreshPatch : succeeded to r e f r e s h patch at c015160016 PATM: patmR3RefreshPatch : succeeded to r e f r e s h patch at c017ed0017 PATM: Fa i l ed to r e f r e s h d i r t y patch at c013f370 . D i sab l ing i t .18 PATM: patmR3RefreshPatch : succeeded to r e f r e s h patch at c01051d019 PIT : mode=0 count=0x10000 (65536) − 18 .20 Hz ( ch=0)20

21 ! ! As s e r t i on Fa i l ed ! !22 Express ion : pOrgInstrGC23 Locat ion : /home/vbox/vbox−3.1.6/ s r c /VBox/VMM/PATM/VMMAll/PATMAll . cpp (159)

void PATMRawLeave(VM∗ , CPUMCTXCORE∗ , i n t )

Page 90: Nesting Virtual Machines in Virtualization Test …nyh/nested/Thesis_OlivierBergh...Nesting Virtual Machines in Virtualization Test Frameworks Dissertation submitted on May 2010 to

B.2. PARAVIRTUALIZATION 79

VMware Workstation within VMware WorkstationHost operating system: Ubuntu 9.10 64bitL1 hypervisor: VMware Workstation 7.0.1 build-227600L1 guest: Ubuntu 9.04 32bitL2 hypervisor: VMware Workstation 6.5.3 build-185404L2 guest: Ubuntu 9.04 32bit

The L1 guest booted and ran correctly using dynamic binary translation. Theimage of the L2 guest was copied to the L1 guest. The L2 guest tried to start butdid not show any activity. The screen stayed black and nothing happened, evenafter several hours.

Xen within VMware WorkstationHost operating system: Ubuntu 9.10 64bitL1 hypervisor: VMware Workstation 7.0.1 build-227600L2 hypervisor: xen-3.0-x86 32pDomain 0 (L2): openSUSE 11.3 build 0475L2 guest: openSUSE 11.3 build 0475

The L1 guest booted and ran the Xen hypervisor correctly using dynamic binarytranslation. The image of the L2 guest was copied to the L1 guest. The paravir-tualized L2 guest booted rather slow but a user could log in and use the nestedguest.

B.2 Paravirtualization

None of the setups with paravirtualization as bottom layer work. The configurationof the setups are given but, as explained in section 5.1.2, the problem is that nestedhypervisors need modifications.

VirtualBox within XenL1 hypervisor: xen-3.0-x86 32pDomain 0 (L1): openSUSE 11.3 build 0475L1 guest: Ubuntu 9.04 32bitL2 hypervisor: VirtualBox 3.1.6 r59338

VMware Workstation within XenL1 hypervisor: xen-3.0-x86 32pDomain 0 (L1): openSUSE 11.3 build 0475L1 guest: Ubuntu 9.04 32bitL2 hypervisor: VMware Workstation 6.5.3 build-185404

Xen within XenL1 hypervisor: xen-3.0-x86 32pDomain 0 (L1): openSUSE 11.3 build 0475L2 hypervisor: xen-3.0-x86 32p

Page 91: Nesting Virtual Machines in Virtualization Test …nyh/nested/Thesis_OlivierBergh...Nesting Virtual Machines in Virtualization Test Frameworks Dissertation submitted on May 2010 to

B.3. FIRST GENERATION HARDWARE SUPPORT 80

B.3 First generation hardware support

In this section, the results are given of the setups with a L1 hypervisor based onhardware support for x86 virtualization. All tests were conducted on a processor thatonly has first generation hardware support, namely IntelR© CoreTM2 Quad Q9550.The following subsections combine the setups that use the same nested hypervisortechnique.

B.3.1 Dynamic binary translation

VirtualBox within VirtualBoxHost operating system: Ubuntu 9.10 64bitor Ubuntu 9.04 32bitL1 hypervisor: VirtualBox 3.1.6 r59338L1 guest: Ubuntu 9.04 32bitL2 hypervisor: VirtualBox 3.1.6 r59338L2 guest: Ubuntu 9.04 32bit

The L2 guest hung in this setup. It did not crash, did not show information inthe log during several hours, and was probably just very slow.

VMware Workstation within VirtualBoxHost operating system: Ubuntu 9.10 64bit or Ubuntu 9.04 32bitL1 hypervisor: VirtualBox 3.1.6 r59338L1 guest: Ubuntu 9.04 32bitL2 hypervisor: VMware Workstation 6.5.3 build-185404L2 guest: Ubuntu 9.04 32bit

This setup was rather unstable. One configuration allowed booting the nestedguest. This working configuration used the graphical user interface for the L1 andL2 guest, while other setups used a text based environment. The setup was testedon a Ubuntu 9.10 64bit and Ubuntu 9.04 32bit operating system, with a graphi-cal user interface and with a text based environment. The test on the Ubuntu9.04 32bit operating system with the graphical user interface was the only one thatworked.

VirtualBox within VMware WorkstationHost operating system: Ubuntu 9.10 64bitL1 hypervisor: VMware Workstation 6.5.3 build-185404L1 guest: Ubuntu 9.04 32bitL2 hypervisor: VirtualBox 3.1.6 r59338L2 guest: Ubuntu 9.04 32bit

Both L1 and L2 guest were able to boot and run correctly. However, in order toget the L2 guest started, one must append the line

mon i to r cont ro l . r e s t r i c t b a c kdo o r = ”TRUE”

Page 92: Nesting Virtual Machines in Virtualization Test …nyh/nested/Thesis_OlivierBergh...Nesting Virtual Machines in Virtualization Test Frameworks Dissertation submitted on May 2010 to

B.3. FIRST GENERATION HARDWARE SUPPORT 81

to the configuration file (.vmx) of the L1 guest. This allowed to run a hypervisorwithin the L1 guest. Upon starting the L2 guest, the L1 guest displayed the followingwarning:

The v i r t u a l machine ’ s opera t ing system has attempted to enable promiscuousmode on adapter Ethernet0 . This i s not a l lowed f o r s e c u r i t y reasons .

P lease go to the Web page http :// vmware . com/ i n f o ? id=161 f o r he lp enab l ingthe promiscuous mode in the v i r t u a l machine .

One can get around this message by starting VirtualBox as root instead of anormal user.

VMware Workstation within VMware WorkstationHost operating system: Ubuntu 9.10 64bitL1 hypervisor: VMware Workstation 6.5.3 build-185404L1 guest: Ubuntu 9.04 32bitL2 hypervisor: VMware Workstation 6.5.3 build-185404L2 guest: Ubuntu 9.04 32bit

Both L1 and L2 guest were able to boot and run correctly. However, in order toget the L2 guest started, one must append the line

mon i to r cont ro l . r e s t r i c t b a c kdo o r = ”TRUE”

to the configuration file (.vmx) of the L1 guest. This allowed running a hypervisorwithin the L1 guest. Upon starting the L2 guest, the L1 guest displayed the followingwarning:

The v i r t u a l machine ’ s opera t ing system has attempted to enable promiscuousmode on adapter Ethernet0 . This i s not a l lowed f o r s e c u r i t y reasons .

P lease go to the Web page http :// vmware . com/ i n f o ? id=161 f o r he lp enab l ingthe promiscuous mode in the v i r t u a l machine .

One can get around this message by starting the L2 VMware Workstation asroot instead of a normal user.

VirtualBox within XenL1 hypervisor: xen-3.0-x86 32pDomain 0 (L1): openSUSE 11.3 build 0475L1 guest: Ubuntu 9.04 32bitL2 hypervisor: VirtualBox 3.1.6 r59338L2 guest: Ubuntu 9.04 32bit

The L2 hung while booting inside the Xen guest. The L2 guest was probablyjust very slow since it stayed unresponsive for several hours.

VMware Workstation within XenL1 hypervisor: xen-3.0-x86 32pDomain 0 (L1): openSUSE 11.3 build 0475L1 guest: Ubuntu 9.04 32bitL2 hypervisor: VMware Workstation 7.0.1 build-227600L2 guest: Ubuntu 9.04 32bit

Page 93: Nesting Virtual Machines in Virtualization Test …nyh/nested/Thesis_OlivierBergh...Nesting Virtual Machines in Virtualization Test Frameworks Dissertation submitted on May 2010 to

B.3. FIRST GENERATION HARDWARE SUPPORT 82

The L2 guest did not start because VMware Workstation checks whether thereis an underlying hypervisor. VMware Workstation noticed that there was a Xenhypervisor running and displayed the following message:

You are running VMware Workstation v ia an incompat ib le hyperv i so r . You maynot power on a v i r t u a l machine un t i l t h i s hyperv i so r i s d i s ab l ed .

VirtualBox within KVMHost operating system: Ubuntu 9.10 64bitL1 hypervisor: kvm-kmod 2.6.32-9L1 guest: Ubuntu 9.04 32bitL2 hypervisor: VirtualBox 3.1.6 r59338L2 guest: Ubuntu 9.04 32bit

The L1 guest booted and ran correctly. Upon starting the L2 guest, the followingerror was shown:

Boot from (hd0 , 0 ) ext3 <HDD−ID>Sta r t i ng up . . .

. . .[ 3 . 982884 ] Kernel panic − not sync ing : Attempted to k i l l i n i t !

VMware Workstation within KVMHost operating system: Ubuntu 9.10 64bitL1 hypervisor: kvm-kmod 2.6.32-9L1 guest: Ubuntu 9.04 32bitL2 hypervisor: VMware Workstation 6.5.3 build-185404L2 guest: Ubuntu 9.04 32bit

The setup worked for newer versions of KVM. In older versions and in the devel-oper version (kvm-88), the L2 guest hung during start-up. With the newer version,the L2 guest booted and ran successfully. Note that newer versions of VMwareWorkstation check if there is an underlying hypervisor and in those cases it willrefuse to boot a virtual machine. In the newest version, VMware Workstation 7.0.1build-227600, this setup no longer worked due to the check for an underlying hyper-visor.

B.3.2 Paravirtualization

All four setups could successfully nest a paravirtualized guest inside the L1 guest.However, the setup where Xen is nested inside VirtualBox was not very stable.

Xen within VirtualBoxHost operating system: Ubuntu 9.10 64bitor Ubuntu 9.04 32bitL1 hypervisor: VirtualBox 3.1.6 r59338L2 hypervisor: xen-3.0-x86 32pDomain 0 (L2): openSUSE 11.3 build 0475L2 guest: openSUSE 11.3 build 0475

Page 94: Nesting Virtual Machines in Virtualization Test …nyh/nested/Thesis_OlivierBergh...Nesting Virtual Machines in Virtualization Test Frameworks Dissertation submitted on May 2010 to

B.4. SECOND GENERATION HARDWARE SUPPORT 83

Sometimes during the start-up of domain 0 several segmentation faults occurred.Domain 0 was able to boot and run successfully but the creation of another paravir-tualized guest was sometimes impossible. Xen reported that the guest was created,however it did not show up in the list of virtual machines indicating that the guestcrashed immediately. The setup worked most of the times on host operating systemUbuntu 9.04 32bit. On the Ubuntu 9.10 64bit operating system, there were alwayssegmentation faults.

Xen within VMware WorkstationHost operating system: Ubuntu 9.10 64bitL1 hypervisor: VMware Workstation 7.0.1 build-227600L2 hypervisor: xen-3.0-x86 32pDomain 0 (L2): openSUSE 11.3 build 0475L2 guest: openSUSE 11.3 build 0475

Both L1 and L2 guest were able to boot and run correctly.

Xen within XenL1 hypervisor: xen-3.0-x86 32pDomain 0 (L1): openSUSE 11.3 build 0475L2 hypervisor: xen-3.0-x86 32pDomain 0 (L2): openSUSE 11.3 build 0475L2 guest: openSUSE 11.3 build 0475

Both L1 and L2 guest were able to boot and run correctly.

Xen within KVMHost operating system: Ubuntu 9.10 64bitor Ubuntu 9.04 32bitL1 hypervisor: VirtualBox 3.1.6 r59338L2 hypervisor: xen-3.0-x86 32pDomain 0 (L2): openSUSE 11.3 build 0475L2 guest: openSUSE 11.3 build 0475

Both L1 and L2 guest were able to boot and run correctly.

B.4 Second generation hardware support

This section summarizes the results of the setups with a L1 hypervisor based onhardware support for x86 virtualization. The tests were conducted on a newerprocessor with second generation hardware support, namely IntelR© CoreTM i7-860.Only the setups are given where the outcome differed from section B.3. These setupsare the setups that worked using the second generation hardware support and didnot work using the first generation. All the other setups had the same output.

Page 95: Nesting Virtual Machines in Virtualization Test …nyh/nested/Thesis_OlivierBergh...Nesting Virtual Machines in Virtualization Test Frameworks Dissertation submitted on May 2010 to

B.5. KVM’S NESTED SVM SUPPORT 84

B.4.1 Dynamic binary translation

VirtualBox within VirtualBoxHost operating system: Ubuntu 9.10 64bitor Ubuntu 9.04 32bitL1 hypervisor: VirtualBox 3.1.6 r59338L1 guest: Ubuntu 9.04 32bitL2 hypervisor: VirtualBox 3.1.6 r59338L2 guest: Ubuntu 9.04 32bit

Both L1 and L2 guest were able to boot and run correctly. In the test result ofsection B.3, the L2 guest hung.

VirtualBox within XenL1 hypervisor: xen-3.0-x86 32pDomain 0 (L1): openSUSE 11.3 build 0475L1 guest: Ubuntu 9.04 32bitL2 hypervisor: VirtualBox 3.1.6 r59338L2 guest: Ubuntu 9.04 32bit

Both L1 and L2 guest were able to boot and run correctly. With only the use offirst generation hardware support (section B.3), the L2 guest hung.

VirtualBox within KVMHost operating system: Ubuntu 9.10 64bitL1 hypervisor: kvm-kmod 2.6.32-9L1 guest: Ubuntu 9.04 32bitL2 hypervisor: VirtualBox 3.1.6 r59338L2 guest: Ubuntu 9.04 32bit

Both L1 and L2 guest were able to boot and run correctly. The L2 guest showeda kernel panic message when only using first generation hardware support, as shownin section B.3.

B.4.2 Paravirtualization

Xen within VirtualBoxHost operating system: Ubuntu 9.10 64bitor Ubuntu 9.04 32bitL1 hypervisor: VirtualBox 3.1.6 r59338L2 hypervisor: xen-3.0-x86 32pDomain 0 (L2): openSUSE 11.3 build 0475L2 guest: openSUSE 11.3 build 0475

Both L1 and L2 guest were able to boot and run correctly. In the test result ofsection B.3, domain 0 displayed some segmentation faults.

B.5 KVM’s nested SVM support

Host operating system: Ubuntu 9.04 server 64bit

Page 96: Nesting Virtual Machines in Virtualization Test …nyh/nested/Thesis_OlivierBergh...Nesting Virtual Machines in Virtualization Test Frameworks Dissertation submitted on May 2010 to

B.5. KVM’S NESTED SVM SUPPORT 85

L1 hypervisor: kvm-kmod 2.6.33L2 hypervisor: Ubuntu 9.04Domain 0 (L2): kvm-kmod 2.6.33L2 guest: Ubuntu 9.04

After installing the L1 hypervisor, it must be loaded with the arguments “nested=1”.The L1 guest booted and ran perfectly. The installation of the L2 hypervisor withinthe L1 guest was successful. No special actions were required for installing the L2hypervisor. When booting the L2 hypervisor, the L1 guest showed the followingmessages:

[ 16 . 712047 ] hand l e ex i t : unexpected e x i t i n i i n f o 0x80000008 ex i t c od e 0x60[ 31 . 432032 ] hand l e ex i t : unexpected e x i t i n i i n f o 0x80000008 ex i t c od e 0x60[ 34 . 468058 ] hand l e ex i t : unexpected e x i t i n i i n f o 0x80000008 ex i t c od e 0x60

Patches fix these messages but were not yet released because they need moretesting1.

1http://www.mail-archive.com/[email protected]/msg31096.html

Page 97: Nesting Virtual Machines in Virtualization Test …nyh/nested/Thesis_OlivierBergh...Nesting Virtual Machines in Virtualization Test Frameworks Dissertation submitted on May 2010 to

APPENDIX C

Details of the performance tests

This chapter gives some detailed information about the performance tests that wereexecuted in chapter 6. The benchmarks used for the tests are sysbench, iperfand iozone. Each section lists the tests that were executed for the correspondingbenchmark.

C.1 sysbench

1 # ! / b i n / b a s h

2 #

3 # T h e s y s b e n c h t e s t s

4 #

5 # @ a u t h o r O l i v i e r B e r g h m a n s

6 #

7

8 i f [ $ # - n e 1 ] ; t h e n

9 echo "Usage: $0 <prefix >"10 echo " with <prefix > the prefix for the output files"11 exit12 f i13

14 PREFIX=$115 OUTPUT="###"16

17 date18 echo ${OUTPUT} " sysbench CPU Test"19 sysbench --num -threads =10 --max -requests =10000 -- test =cpu --

cpu -max -prime =150000 run > ${PREFIX}_cpu.txt20 date

Page 98: Nesting Virtual Machines in Virtualization Test …nyh/nested/Thesis_OlivierBergh...Nesting Virtual Machines in Virtualization Test Frameworks Dissertation submitted on May 2010 to

C.1. SYSBENCH 87

21 echo ${OUTPUT} " sysbench Memory Test Write"22 sysbench --num -threads =10 -- test =memory --memory -block -size

=256 --memory -total -size=2G --memory -scope= l o ca l --memory -hugetlb=off --memory -oper=write --memory -access -mode=seqrun > ${PREFIX}_mem_write_seq.txt

23 sysbench --num -threads =10 -- test =memory --memory -block -size=256 --memory -total -size=2G --memory -scope= l o ca l --memory -hugetlb=off --memory -oper=write --memory -access -mode=rndrun > ${PREFIX}_mem_write_rnd.txt

24 date25 echo ${OUTPUT} " sysbench Memory Test Read"26 sysbench --num -threads =10 -- test =memory --memory -block -size

=256 --memory -total -size=2G --memory -scope= l o ca l --memory -hugetlb=off --memory -oper=read --memory -access -mode=seqrun > ${PREFIX}_mem_read_seq.txt

27 sysbench --num -threads =10 -- test =memory --memory -block -size=256 --memory -total -size=2G --memory -scope= l o ca l --memory -hugetlb=off --memory -oper=read --memory -access -mode=rndrun > ${PREFIX}_mem_read_rnd.txt

28 date29 echo ${OUTPUT} " sysbench Thread Test"30 sysbench --num -threads =1000 --max -requests =10000 -- test =

threads --thread -yields =10000 --thread -locks=4 run > ${PREFIX}_threads.txt

31 date32 echo ${OUTPUT} " sysbench Mutex Test"33 sysbench --num -threads =5000 --max -requests =10000 -- test =mutex

--mutex -num =4096 --mutex -locks =500000 --mutex -loops =10000run > ${PREFIX}_mutex.txt

34 date35 echo ${OUTPUT} " sysbench File io Test"36 sysbench --num -threads =16 -- test =fileio --file -num =256 --file

-block -size =16K --file -total -size =512M --file - test -mode=rndrw --file -io-mode=sync prepare

37 sysbench --num -threads =16 -- test =fileio --file -num =256 --file-block -size =16K --file -total -size =512M --file - test -mode=rndrw --file -io-mode=sync run > ${PREFIX}_fileio.txt

38 sysbench --num -threads =16 -- test =fileio --file -num =256 --file-block -size =16K --file -total -size =512M --file - test -mode=rndrw --file -io-mode=sync cleanup

39 date40 echo ${OUTPUT} " sysbench MySQL Test"41 sysbench --num -threads =4 --max -requests =10000 -- test =oltp --

oltp -table -size =1000000 --mysql -table -engine=innodb --mysql -user=root --mysql -password=root --oltp - test -mode=complex prepare

42 sysbench --num -threads =4 --max -requests =10000 -- test =oltp --oltp -table -size =1000000 --mysql -table -engine=innodb --mysql -user=root --mysql -password=root --oltp - test -mode=complex run > ${PREFIX}_oltp.txt

Page 99: Nesting Virtual Machines in Virtualization Test …nyh/nested/Thesis_OlivierBergh...Nesting Virtual Machines in Virtualization Test Frameworks Dissertation submitted on May 2010 to

C.2. IPERF 88

43 sysbench --num -threads =4 --max -requests =10000 -- test =oltp --oltp -table -size =1000000 --mysql -table -engine=innodb --mysql -user=root --mysql -password=root --oltp - test -mode=complex cleanup

44 date45 echo ${OUTPUT} " packing Results"46 tar czf ${PREFIX }.tgz ${PREFIX }*.txt47 rm -f ${PREFIX }*. txt

C.2 iperf

This benchmarks consists of a server and a client. The server runs on a separatecomputer in the network with the command:

iperf -s

The test machines connect with the server by running the command:

iperf -c <hostname >

C.3 iozone

The iozone benchmark tests the performance of writing and reading a file. Thecommands used for running the benchmark native, in a L1 guest and in a nestedguest are the following:

native$ iozone -a -g 16G -i 0 -i 1L1guest$ iozone -a -g 4G -i 0 -i 1L2guest$ iozone -a -g 2G -i 0 -i 1

The “-g” option specifies the maximum file size used in the tests. The test onthe native platform uses 16 Gb since the physical memory of the computer systemwas 8 Gb. The physical memory of the L1 guest was 2 Gb and the physical memoryof the L2 guest was 512 Mb.