the dream is alive! running linux containers on an illumos kernel
DESCRIPTION
Presentation for #illumos day at #surgecon, 2014. Video can be found at https://www.youtube.com/watch?v=TrfD3pC0VSs Source code is at https://github.com/joyent/illumos-joyentTRANSCRIPT
![Page 1: The dream is alive! Running Linux containers on an illumos kernel](https://reader035.vdocument.in/reader035/viewer/2022081807/547e7c185806b5d15e8b469f/html5/thumbnails/1.jpg)
The dream is alive!Running Linux containers on an illumos kernel
CTO
Bryan Cantrill
@bcantrill
![Page 2: The dream is alive! Running Linux containers on an illumos kernel](https://reader035.vdocument.in/reader035/viewer/2022081807/547e7c185806b5d15e8b469f/html5/thumbnails/2.jpg)
OS emulation: An old idea
• Operating systems have long employed system call emulation to allow binaries from one operating system run on another on the same instruction set architecture
• Combines the binary footprint of the emulated system with the operational advantages of the emulating system
• Sun first did this with SunOS 4.x binaries on Solaris 2.x
• With Solaris x86, it became possible to run binaries targeted for Linux via SCO’s (open source) “lxrun”
• Packaging innovation in Linux in early 2000s + deeply differentiated technologies in Solaris 10 (e.g. ZFS, DTrace, zones) made Linux emulation more attractive
![Page 3: The dream is alive! Running Linux containers on an illumos kernel](https://reader035.vdocument.in/reader035/viewer/2022081807/547e7c185806b5d15e8b469f/html5/thumbnails/3.jpg)
Rise of zones
• While more important, the problem also became more complicated: programs became more complicated than single-process binaries
• Clear that “lxrun” would only work for applications, not systems — needed a deeper solution
• Fortunately, coincided with the rise of operating system virtualization embodied by zones
• Idea: introduce notion of a branded zone whereby an entire foreign system (a brand) could be emulated within the confines of a zone
![Page 4: The dream is alive! Running Linux containers on an illumos kernel](https://reader035.vdocument.in/reader035/viewer/2022081807/547e7c185806b5d15e8b469f/html5/thumbnails/4.jpg)
BrandZ: LX-branded zones
• In 2006, team at Sun that included Nils Nieuwejaar and Russ Blaine integrated BrandZ, a Linux branded zone (PSARC 2005/471)
• Support was a user/kernel hybrid: lx system calls bounced back to a user-level emulation library that depended on some in-kernel emulation (e.g. futexes)
• Support was for RHEL 3 (!): glibc 2.3.2 + Linux 2.4
• Remarkable amount of work was done to handle device pathing, signal handling, /proc — and arcana like TTY ioctls, ptrace, etc.
• Worked for a surprising number of binaries!
![Page 5: The dream is alive! Running Linux containers on an illumos kernel](https://reader035.vdocument.in/reader035/viewer/2022081807/547e7c185806b5d15e8b469f/html5/thumbnails/5.jpg)
What was missing?
• Support was only for 2.4 kernels
• Support for 2.6 required adding new, Linux-only mechanisms that had native analogues (e.g., epoll)
• Only 32-bit was supported
• XVM (the Xen-on-Solaris effort inside of Sun) had much more managerial support and was thought to be a “more supportable” solution
![Page 6: The dream is alive! Running Linux containers on an illumos kernel](https://reader035.vdocument.in/reader035/viewer/2022081807/547e7c185806b5d15e8b469f/html5/thumbnails/6.jpg)
The decline of the lx brand
After cresting in 2007, contributions to lx dwindled:
0
10
20
30
2006 2007 2008 2009 2010
Push
es t
o us
r/sr
c/lib
/bra
nd/lx
![Page 7: The dream is alive! Running Linux containers on an illumos kernel](https://reader035.vdocument.in/reader035/viewer/2022081807/547e7c185806b5d15e8b469f/html5/thumbnails/7.jpg)
Clinically dead
The lx brand was removed on June 11, 2010...
0
10
20
30
2006 2007 2008 2009 2010 2011 2012 2013
Push
es t
o us
r/sr
c/lib
/bra
nd/lx
![Page 8: The dream is alive! Running Linux containers on an illumos kernel](https://reader035.vdocument.in/reader035/viewer/2022081807/547e7c185806b5d15e8b469f/html5/thumbnails/8.jpg)
The organ donation years
• Joyent customers asked for SmartOS to support htop, a colorful Linux program for system process monitoring
• htop is very, very specific to Linux /proc — and porting it to use illumos /proc seemed arduous and pointless…
• ...but a relatively complete Linux /proc had integrated with the LX brand!
• In April 2012, the /proc portion of the LX brand was extracted, cleaned up, and separately integrated
• Mounted at /system/lxproc in SmartOS zones; htop modified to look for this path on illumos
![Page 9: The dream is alive! Running Linux containers on an illumos kernel](https://reader035.vdocument.in/reader035/viewer/2022081807/547e7c185806b5d15e8b469f/html5/thumbnails/9.jpg)
Exhumed!
• In January 2014, David Mackay, an illumos community member, announced that he was able to resurrect the lx brand —and that it appeared to work!
Linked below is a webrev which restores LX branded zones support to Illumos:
http://cr.illumos.org/~webrev/DavidJX8P/lx-zones-restoration/
I have been running OpenIndiana, using it daily on my workstation for over a month with the above webrev applied to the illumos-gate and built by myself.
It would definitely raise interest in Illumos. Indeed, I have seen many people who are extremely interested in LX zones.
The LX zones code is minimally invasive on Illumos itself, and is mostly segregated out.
I hope you find this of interest.
![Page 10: The dream is alive! Running Linux containers on an illumos kernel](https://reader035.vdocument.in/reader035/viewer/2022081807/547e7c185806b5d15e8b469f/html5/thumbnails/10.jpg)
Could it be revived?
• David’s work inspired us to rethink LX-branded zones...
• It seemed that the reasons for the discontinuation of LX brand support might not still be valid...
• ...and it seemed that the engineering challenges might not be as structurally daunting
![Page 11: The dream is alive! Running Linux containers on an illumos kernel](https://reader035.vdocument.in/reader035/viewer/2022081807/547e7c185806b5d15e8b469f/html5/thumbnails/11.jpg)
Has Linux made it easier?
• Linux is moving much more slowly: pace of development of new user-visible kernel abstraction has slowed
• Torvalds discovered religion on ABI compatibility
• The need to run on older kernels has dissuaded software from using the more obscure Linux-isms
• The glibc/kernel disconnect means that glibc (and apps!) must reasonably be able to process ENOSYS
• Easier support model: the rise of the cloud has replaced shrink-wrapped software with open source + SaaS
• Server focus: Mac OS X gave us Unix — and relegated “Linux on the desktop” to “Duke Nukem Forever” status
![Page 12: The dream is alive! Running Linux containers on an illumos kernel](https://reader035.vdocument.in/reader035/viewer/2022081807/547e7c185806b5d15e8b469f/html5/thumbnails/12.jpg)
Have motivations changed?
• Originally, LX branded zones were about bringing Linux applications into established Solaris environments for purposes of hardware consolidation
• Port of KVM to illumos circa 2011 solved this problem
• ...but KVM has unresolvable performance and resource limitations, and Linux on KVM only gets indirect benefit from ZFS, DTrace and zones
• At the same time, enthusiasm for containers and OS-based virtualization have blossomed (ht: Docker)
• There seems to be desire for a best-of-all worlds system that combines Linux strengths (binary footprint) with illumos technical differentiators (ZFS, zones, DTrace)
![Page 13: The dream is alive! Running Linux containers on an illumos kernel](https://reader035.vdocument.in/reader035/viewer/2022081807/547e7c185806b5d15e8b469f/html5/thumbnails/13.jpg)
Reviving LX-branded zones
• Encouraged that the body might not have decomposed, Joyent engineer Jerry Jelinek exhumed the LX brand and reintegrated it into SmartOS on March 20, 2014
• Guiding principles:
• Do it all in the open
• Do it all on SmartOS master (illumos-joyent)
• Add base illumos facilities wherever possible
• Aim to upstream to illumos when we’re done
• Thanks to Jerry grinding out many, many LX bug fixes, got Ubuntu 10.04 booting in April, Ubuntu 12.04 booting in May and Ubuntu 14.04 booting in July
![Page 14: The dream is alive! Running Linux containers on an illumos kernel](https://reader035.vdocument.in/reader035/viewer/2022081807/547e7c185806b5d15e8b469f/html5/thumbnails/14.jpg)
IT’S ALIVE!
Contributions to the lx brand since March:
0
25
50
75
100
2006 2007 2008 2009 2010 2011 2012 2013 2014
Push
es t
o us
r/sr
c/lib
/bra
nd/lx
![Page 15: The dream is alive! Running Linux containers on an illumos kernel](https://reader035.vdocument.in/reader035/viewer/2022081807/547e7c185806b5d15e8b469f/html5/thumbnails/15.jpg)
So what have we done?
• Fixed a ton of bugs (ht: LTP)
• Added native epoll(5) — though not in terms of event ports but rather in terms of poll(7D)
• Added exclusive IP stacks for LX-branded zones
• Added support for netlink (RFC 3549) — but restricted that support to the lx brand
• Added support for thunk-less native binaries within an LX branded zone
• Added native inotify(5)
• Added initial 64-bit support
![Page 16: The dream is alive! Running Linux containers on an illumos kernel](https://reader035.vdocument.in/reader035/viewer/2022081807/547e7c185806b5d15e8b469f/html5/thumbnails/16.jpg)
What is left to do?
• vsyscall support (needed for 64-bit)
• Anything else for 64-bit
• Stack switching (needed for Go)
• Multi-threaded ptrace support
• Lots of using it and figuring out what breaks!
![Page 17: The dream is alive! Running Linux containers on an illumos kernel](https://reader035.vdocument.in/reader035/viewer/2022081807/547e7c185806b5d15e8b469f/html5/thumbnails/17.jpg)
How can you get involved?
• SmartOS contains latest-and-greatest bits; first step is to get SmartOS running
• We have a 32-bit Ubuntu 14.04 image that can be used to create a zone via vmadm:
b7493690-f019-4612-958b-bab5f844283e
• Will need to configure a VM with “kernel-version” set to 3.13.0 and “brand” to “lx” in the vmadm JSON payload
• If you find that something is boken, create an issue on the illumos-joyent github repo
• Once 64-bit is working, we will be very actively seeking community engagement; stay tuned!
![Page 18: The dream is alive! Running Linux containers on an illumos kernel](https://reader035.vdocument.in/reader035/viewer/2022081807/547e7c185806b5d15e8b469f/html5/thumbnails/18.jpg)
Thanks!
• The original BrandZ team at Sun for a remarkable amount of work: Nils Nieuwejaar and Russ Blaine
• The illumos community — especially David Mackay! — for inspiring the revival
• Jerry Jelinek for leading the charge — and doing the vast majority of the work!
• @rmustacc for thunk-less native binary support
• @jmclulow for stack switching
• @djhoffma for his work on ptrace
• @joshwilsdon for vmadm support for LX brands