mirage: extreme specialisation of virtual appliances
DESCRIPTION
Infrastructure-as-a-Service compute clouds provide a flexible hardware platform on which customers host applications as a set of appliances, e.g., web servers or databases. Each appliance is a VM image containing an OS kernel and userspace processes, within which applications access resources via traditional APIs such as POSIX. However, the flexibility provided by the hypervisor comes at a cost: the addition of another layer in the already complex software stack which impacts runtime performance, and increases the size of the trusted computing base. Given that modern software is generally written in high-level languages that abstract the underlying OS, we revisit how these appliances are constructed with our Mirage operating system. Mirage supports the progressive specialisation of source code, and gradually replaces traditional OS components with customisable libraries, ultimately resulting in "unikernel" VMs: sealed, fixed-purpose VMs that run directly on the hypervisor. Developers no longer need to become sysadmins, expert in the configuration of all manner of system components, to use cloud resources. At the same time, they can develop their code using their usual tools, only making the final push to the cloud once they are satisfied their code works. As they explicitly link in components that would normally be provided by the host OS, the resulting unikernels are also highly compact: facilities that are not used are simply not included in the resulting microkernel binary. This talk will describe the architecture of Mirage, and show a quick demonstration of how to build a web-server that runs as a unikernel on a standard Xen installation.TRANSCRIPT
@avsmopenmirage.org
Mirage: OCaml Appliances for Xen Clouds
Anil Madhavapeddy, University of Cambridgewith a merry crew
Haris Rotsos, Balraj Singh, Steven SmithJon Crowcroft, Steve Hand(University of Cambridge)
Richard Mortier (University of Nottingham)Thomas Gazagnaire (OCamlPro)Dave Scott (Citrix Systems R&D)
Monday, 27 August 12
openmirage.org
• Millions of lines of code & configuration• Why build for clouds as for desktops?• We can simplify!
Modern Stacks are Too Large
Hardware
Processes
OS Kernel
Threads
Application
Hypervisor
Language
Monday, 27 August 12
openmirage.org
• Millions of lines of code & configuration• Why build for clouds as for desktops?• We can simplify!
Modern Stacks are Too Large
Hardware
Processes
OS Kernel
Threads
Application
Hypervisor
Language
Hardware
Application
Xen
Language
Hardware
Application
POSIX
Language
Monday, 27 August 12
openmirage.org
• Millions of lines of code & configuration• Why build for clouds as for desktops?• We can simplify!
Modern Stacks are Too Large
Hardware
Processes
OS Kernel
Threads
Application
Hypervisor
Language
Hardware
Application
Xen
Language
Hardware
Application
POSIX
Language
Devel
opm
ent
Deplo
ymen
t
Monday, 27 August 12
0
100
200
300
400
500
600
Linux 3.2.2
glibc 2.15
Bind 9.9.0
httpd 2.4.2
OpenSSH 6.0p1
Open vSwitch 1.4.0
NOX-zaku
Mirage
Line
s of
cod
e (x
103 )
OCamlC/C++ASM
openmirage.org
How Large is Large?
Monday, 27 August 12
openmirage.org
• Critical memory bugs still occur regularly in mature software
• CVE-2012-1182 – Samba– RPC code generator overflow– Variable containing buffer length checked independently
of variable used to allocate memory for buffer– Leads to root exploit
• CVE-2012-2110 – OpenSSL– Integer conversion bug mixed with realloc wrappers– Unsigned cast to signed, but realloc’d buffer not clamped– Leads to heap corruption
Why Do We Care?
Monday, 27 August 12
openmirage.org
• Attack from without, not within: trust hypervisor, but not I/O traffic or other VMs.
• VMs are no longer multi-user, but primarily single-purpose deployments
• …and they are in a multi-tenant datacenter
• …and they are always network connected
The Cloud Threat Model
Monday, 27 August 12
Mirage Compiler
Hardware
Hypervisor
OS Kernel
User Processes
Language Runtime
Parallel Threads
Application Binary
Mirage Runtime
Hardware
Hypervisor
Application Code
Configuration Filesapplication source codeconfiguration fileshardware architecturewhole-system optimisation
specialisedunikernel}
openmirage.org
Sealed Appliances
Monday, 27 August 12
openmirage.org
• Static typing– Pragmatic functional language (OCaml)– Large set of libraries provided, some used in XCP already
Key Features of Mirage
Net.Manager.bind (fun mgr dev ! let src = ’any_addr, 53 in Dns.Server.listen dev src zones )
Monday, 27 August 12
openmirage.org
• Static typing– Pragmatic functional language (OCaml)– Large set of libraries provided, some used in XCP already
• Cooperative concurrency– Wrapped up in Lwt syntax extensions– Threads encapsulated and hidden within typed modules
Key Features of Mirage
let main () = lwt zones = read key "zones" "zone.db" in Net.Manager.bind (fun mgr dev ! let src = ’any_addr, 53 in Dns.Server.listen dev src zones)
Monday, 27 August 12
openmirage.org
• Static typing– Pragmatic functional language (OCaml)– Large set of libraries provided, some used in XCP already
• Cooperative concurrency– Wrapped up in Lwt syntax extensions– Threads encapsulated and hidden within typed modules
• Library-based operating system– Fully re-entrant functional libraries
• No dynamic loading– Configuration evaluated at compile-time, and sealed– Recompile and redeploy to reconfigure
Key Features of Mirage
Monday, 27 August 12
Deploy
Xen
μ μ μ μ μ μ
safe device drivers
link time optimisation
ubuild xen-direct
Linux
ELF
FreeBSD
ELFELF
Test
ubuild posix-direct
tuntap+safe I/O stack
x86_64 native code
Develop
ubuild posix-socket
kernel sockets
bytecode VM
Linux
ELF REPL
openmirage.org
Progressive Specialization
Monday, 27 August 12
Deploy
Xen
μ μ μ μ μ μ
safe device drivers
link time optimisation
ubuild xen-direct
Linux
ELF
FreeBSD
ELFELF
Test
ubuild posix-direct
tuntap+safe I/O stack
x86_64 native code
Develop
ubuild posix-socket
kernel sockets
bytecode VM
Linux
ELF REPL
openmirage.org
Progressive Specialization
• Development under UNIX:
– full debugging environment (bytecode, debugger, gdb)
– can use kernel sockets or tuntap networking and IO to isolate bugs
– interactive REPL for editor integration and other niceties.
Monday, 27 August 12
Deploy
Xen
μ μ μ μ μ μ
safe device drivers
link time optimisation
ubuild xen-direct
Linux
ELF
FreeBSD
ELFELF
Test
ubuild posix-direct
tuntap+safe I/O stack
x86_64 native code
Develop
ubuild posix-socket
kernel sockets
bytecode VM
Linux
ELF REPL
openmirage.org
Progressive Specialization
• Testing:
– Simulation backend using NS3/MPI
– Emulate your code at scale before deploying it
Monday, 27 August 12
Deploy
Xen
μ μ μ μ μ μ
safe device drivers
link time optimisation
ubuild xen-direct
Linux
ELF
FreeBSD
ELFELF
Test
ubuild posix-direct
tuntap+safe I/O stack
x86_64 native code
Develop
ubuild posix-socket
kernel sockets
bytecode VM
Linux
ELF REPL
openmirage.org
Progressive Specialization
• Xen output kernel is specialised:
– “operating system” is a set of libraries (e.g Netfront, Blkfront, TCP) linked with small C runtime.
– Configuration files are evaluated at this stage, and image must be recompiled to reconfigure.
– Data files can also be linked in directly, if relatively small.
– Dead-code elimination is across whole OS image.
Monday, 27 August 12
text$and$data
foreigngrants
reservedby$Xen
OCamlminor$heap
OCamlmajor$heap
IP$headerTCP$headertx$data
4kB
120T
B128T
B64Dbit$virtual$add
ress$sp
ace$
IP$header
rx$dataTCP$header
4kB
4kB
8×512kBsectors
openmirage.org
Simplified Memory Management
Monday, 27 August 12
openmirage.org
Optional VM Seal Hypercall
• Single address-space and no dynamic loading– W^X address space– Address offsets are randomized at compile-time
• Dropping page table privileges:– Added freeze hypercall called just before app starts– Subsequent page table updates are rejected by Xen.– Exception for I/O mappings if they are non-exec and do
not modify any existing mappings.
• Very easy in unikernels due to focus on compile-time specialisation instead of run-time complexity.
Monday, 27 August 12
0
20
40
60
80
100
0 0.05 0.1 0.15 0.2Cum
ulat
ive
frequ
ency
(%)
Jitter (ms)
xen-directlinux-nativelinux-pv
openmirage.org
Event Driven co-threads
The microkernel is event-driven, with no pre-emption at all. Graph shows CDF of thread wakeup latency for a Mirage VM running directly on Xen, vs native or PV Linux userspace.
Monday, 27 August 12
0
1
2
3
4
0 5 10 15 20
Exec
utio
n tim
e (s
)
Number of threads (millions)
linux-pvlinux-nativexen-direct (malloc)xen-direct (extent)
openmirage.org
Single-instance Thread Scaling
Threads are heap allocated values, so benefit from the faster garbage collection cycle in the Mirage Xen version, and the scheduler can be overridden by application-specific needs.
Monday, 27 August 12
openmirage.org
Appliance Image Size
Appliance Standard Build Dead Code Elimination
DNS 0.449 MB 0.184 MB
Web Server 0.674 MB 0.172 MB
Openflow learning switch 0.393 MB 0.164 MB
Openflow controller 0.392 MB 0.168 MB
All configuration and data compiled into the image by the toolchain, so no separate VBD required beyond the PV kernel.
Live migration is easy and fun :-)
Monday, 27 August 12
0
1
2
3
8 16 32 64 128256
5121024
20483072
Tim
e (s
)
Memory size (MiB)
linux-pv minimal-linux xen-direct
openmirage.org
Microbenchmarks: Boot Time
• Minimal Linux is a custom initrd which directly calls ioctls to send a UDP packet. The Linux-PV is more realistic.
• The Xen-Direct is a standard Mirage VM, and is limited by dom0 toolstack latency (XCP in this case).
Monday, 27 August 12
IO Page Pool
get_page make_writebuf channel output_buf reset_buf
Grant Table
Ring
recycle
free
uninitialised
fixed data
mutable data
reference
request response
openmirage.org
Zero-Copy IO Buffer Management
Monday, 27 August 12
0
200
400
600
800
1000
10 parallel
flows
1 singleflow
Thro
ughp
ut (r
eqs/
s x
103 )
linux-pv, tx & rxlinux-pv tx, xen-direct rxxen-direct tx, linux-pv rx
openmirage.org
Microbenchmarks: TCP
• Simple throughput test with all offload turned off to stress CPU
• Performance bug in TX path– …being fixed (ACKs being sent
out of order)
Monday, 27 August 12
0 200 400 600 800
1000 1200 1400 1600
1 2 4 8 16 32 64 128 256
512 1024
2048 4096
Thro
ughp
ut (M
iB/s
)
Block size (KiB)
xen-directlinux-pv, direct I/Olinux-pv, buffered I/O
openmirage.org
Microbenchmarks: Block Storage
Fast PCI-E SSD
Monday, 27 August 12
openmirage.org
Microbenchmarks: Block Storage
• Same Ring API as Net– Unlike POSIX (libaio and so on)– Caching is controlled via libraries, not API
• No reordering in the front-end– How do we know what the backend storage is?
0 10 20 30 40 50 60 70
1 2 4 8 16 32 64 128 256
512 1024
2048 4096
Thro
ughp
ut (M
iB/s
)
Block size (KiB)
Mirage
Linux domUdirect/buffered
Single spindle SATA
Monday, 27 August 12
openmirage.org
• Mirage is comparable to or exceeds Linux PV performance– I.e., the functional programming language
overheads are offset via single-address space specialization and careful buffer management.
• But what about in some more “realistic” scenarios?– DNS server appliance– HTTP server scaling across 6 vCPUs– OpenFlow Controller and Switch implementations
Microbenchmarks: Summary
Monday, 27 August 12
0 10 20 30 40 50 60 70 80
100 1000 10000
Thro
ughp
ut (r
eqs/
s x
103 )
Zone size (entries)
Bind9, LinuxNSD, LinuxNSD, MiniOS -ONSD, MiniOS -O3Mirage (no memo)Mirage (memo)
openmirage.org
DNS Server Performance
Monday, 27 August 12
openmirage.org
DNS Server Performance
let main () = lwt zones = read key "zones" "zone.db" in Net.Manager.bind (fun mgr dev → let src = ’any_addr, 53 in Dns.Server.listen dev src zones )
Monday, 27 August 12
0 500
1000 1500 2000 2500
Thro
ughp
ut (c
onns
/s)
linux-pv (1 host, 6 vcpus)linux-pv (2 hosts, 3 vcpus)linux-pv (6 hosts, 1 vcpu)xen-direct (6 unikernels)
openmirage.org
Scaling via Multiple Instances
• Apache/Linux vs. Mirage appliance
• Serving single static page
Monday, 27 August 12
0 20 40 60 80
100 120 140 160 180
Batch Single
Req
uest
s/s
(x 1
03 ) maestronox-fastxen-direct
openmirage.org
Openflow Controller performance
• Openflow controller is competitive with Nox (C++), but much more high-level. Applications can link directly against the switch to route their data.
Monday, 27 August 12
openmirage.org
• Service domains should be built this way– Debugging and moving from dom0/domU is a lot easier.– Dependencies are explicitly managed.– Resource control can be fine-grained.– Very useful for unit testing Xen features!
• Which language?– Several viable alternatives: HalVM (Haskell), GuestVM
(Java), ErlangOnXen. More code sharing worthwhile?– Protocol implementations take time to mature.– Working on Mirage/XCP integration via “project Windsor”
• Stub domains?– Less practical for stub domains, due to hardware devices.
Turn Linux into a library OS?
Design Discussion
Monday, 27 August 12
openmirage.org
• Developer Release for Q3 2012– Monolithic repository at http://github.com/avsm/mirage– Adding pkg mgmt: http://github.com/OCamlPro/opam
Mirage Roadmap
$ opam init git://github.com/mirage/opam-repository$ opam remote -add dev git://github.com/mirage/opam-repo-dev$ opam switch mirage-3.12.1-xen$ opam install mirage-www
Monday, 27 August 12
openmirage.org
• Developer Release for Q3 2012– Monolithic repository at http://github.com/avsm/mirage– Adding pkg mgmt: http://github.com/OCamlPro/opam
• New architectures ongoing:– FreeBSD kernel: http://github.com/pgj/mirage-kfreebsd– Javascript: http://ocsigen.org/js_of_ocaml– ARM/MIPs64/rPI: via FreeBSD kmod– Simulation: NS3/OpenMPI functional simulation
Mirage Roadmap
$ opam init git://github.com/mirage/opam-repository$ opam remote -add dev git://github.com/mirage/opam-repo-dev$ opam switch mirage-3.12.1-xen$ opam install mirage-www
Monday, 27 August 12
openmirage.org
• Developer Release for Q3 2012– First release will have pure-OCaml implementations of:
• Device drivers (netfront/blkfront/xenstore)• TCP/IPv4 and DHCPv4• HTTP• DNS(SEC)• SSH• OpenFlow (controller/switch)• vchan IPC• 9P :-)• NFS• FAT32• Distributed k/v store: arakoon.org
Mirage Roadmap
Monday, 27 August 12
openmirage.org
• Online resources:
– http://www.openmirage.org– http://tutorial.openmirage.org
– https://lists.cam.ac.uk/mailman/listinfo/cl-mirage
– (draft in Q4 2012) http://realworldocaml.org
• Offline resources:
– Find me at the hotel pool
Mirage Roadmap
Monday, 27 August 12