Slide 1

Slide 1 text

The dream is alive! Running Linux containers on an illumos kernel CTO [email protected] Bryan Cantrill @bcantrill

Slide 2

Slide 2 text

OS emulation: An old idea • Operating systems have long employed system call emulation to allow binaries from one operating system run on another on the same instruction set architecture • Combines the binary footprint of the emulated system with the operational advantages of the emulating system • Sun first did this with SunOS 4.x binaries on Solaris 2.x • With Solaris x86, it became possible to run binaries targeted for Linux via SCO’s (open source) “lxrun” • Packaging innovation in Linux in early 2000s + deeply differentiated technologies in Solaris 10 (e.g. ZFS, DTrace, zones) made Linux emulation more attractive

Slide 3

Slide 3 text

Rise of zones • While more important, the problem also became more complicated: programs became more complicated than single-process binaries • Clear that “lxrun” would only work for applications, not systems — needed a deeper solution • Fortunately, coincided with the rise of operating system virtualization embodied by zones • Idea: introduce notion of a branded zone whereby an entire foreign system (a brand) could be emulated within the confines of a zone

Slide 4

Slide 4 text

BrandZ: LX-branded zones • In 2006, team at Sun that included Nils Nieuwejaar and Russ Blaine integrated BrandZ, a Linux branded zone (PSARC 2005/471) • Support was a user/kernel hybrid: lx system calls bounced back to a user-level emulation library that depended on some in-kernel emulation (e.g. futexes) • Support was for RHEL 3 (!): glibc 2.3.2 + Linux 2.4 • Remarkable amount of work was done to handle device pathing, signal handling, /proc — and arcana like TTY ioctls, ptrace, etc. • Worked for a surprising number of binaries!

Slide 5

Slide 5 text

What was missing? • Support was only for 2.4 kernels • Support for 2.6 required adding new, Linux-only mechanisms that had native analogues (e.g., epoll) • Only 32-bit was supported • XVM (the Xen-on-Solaris effort inside of Sun) had much more managerial support and was thought to be a “more supportable” solution

Slide 6

Slide 6 text

The decline of the lx brand After cresting in 2007, contributions to lx dwindled: 0 10 20 30 2006 2007 2008 2009 2010 Pushes to usr/src/lib/brand/lx

Slide 7

Slide 7 text

Clinically dead The lx brand was removed on June 11, 2010... 0 10 20 30 2006 2007 2008 2009 2010 2011 2012 2013 Pushes to usr/src/lib/brand/lx

Slide 8

Slide 8 text

The organ donation years • Joyent customers asked for SmartOS to support htop, a colorful Linux program for system process monitoring • htop is very, very specific to Linux /proc — and porting it to use illumos /proc seemed arduous and pointless… • ...but a relatively complete Linux /proc had integrated with the LX brand! • In April 2012, the /proc portion of the LX brand was extracted, cleaned up, and separately integrated • Mounted at /system/lxproc in SmartOS zones; htop modified to look for this path on illumos

Slide 9

Slide 9 text

Exhumed! • In January 2014, David Mackay, an illumos community member, announced that he was able to resurrect the lx brand —and that it appeared to work! Linked below is a webrev which restores LX branded zones support to Illumos: http://cr.illumos.org/~webrev/DavidJX8P/lx-zones-restoration/ I have been running OpenIndiana, using it daily on my workstation for over a month with the above webrev applied to the illumos-gate and built by myself. It would definitely raise interest in Illumos. Indeed, I have seen many people who are extremely interested in LX zones. The LX zones code is minimally invasive on Illumos itself, and is mostly segregated out. I hope you find this of interest.

Slide 10

Slide 10 text

Could it be revived? • David’s work inspired us to rethink LX-branded zones... • It seemed that the reasons for the discontinuation of LX brand support might not still be valid... • ...and it seemed that the engineering challenges might not be as structurally daunting

Slide 11

Slide 11 text

Has Linux made it easier? • Linux is moving much more slowly: pace of development of new user-visible kernel abstraction has slowed • Torvalds discovered religion on ABI compatibility • The need to run on older kernels has dissuaded software from using the more obscure Linux-isms • The glibc/kernel disconnect means that glibc (and apps!) must reasonably be able to process ENOSYS • Easier support model: the rise of the cloud has replaced shrink-wrapped software with open source + SaaS • Server focus: Mac OS X gave us Unix — and relegated “Linux on the desktop” to “Duke Nukem Forever” status

Slide 12

Slide 12 text

Have motivations changed? • Originally, LX branded zones were about bringing Linux applications into established Solaris environments for purposes of hardware consolidation • Port of KVM to illumos circa 2011 solved this problem • ...but KVM has unresolvable performance and resource limitations, and Linux on KVM only gets indirect benefit from ZFS, DTrace and zones • At the same time, enthusiasm for containers and OS- based virtualization have blossomed (ht: Docker) • There seems to be desire for a best-of-all worlds system that combines Linux strengths (binary footprint) with illumos technical differentiators (ZFS, zones, DTrace)

Slide 13

Slide 13 text

Reviving LX-branded zones • Encouraged that the body might not have decomposed, Joyent engineer Jerry Jelinek exhumed the LX brand and reintegrated it into SmartOS on March 20, 2014 • Guiding principles: • Do it all in the open • Do it all on SmartOS master (illumos-joyent) • Add base illumos facilities wherever possible • Aim to upstream to illumos when we’re done • Thanks to Jerry grinding out many, many LX bug fixes, got Ubuntu 10.04 booting in April, Ubuntu 12.04 booting in May and Ubuntu 14.04 booting in July

Slide 14

Slide 14 text

IT’S ALIVE! Contributions to the lx brand since March: 0 25 50 75 100 2006 2007 2008 2009 2010 2011 2012 2013 2014 Pushes to usr/src/lib/brand/lx

Slide 15

Slide 15 text

So what have we done? • Fixed a ton of bugs (ht: LTP) • Added native epoll(5) — though not in terms of event ports but rather in terms of poll(7D) • Added exclusive IP stacks for LX-branded zones • Added support for netlink (RFC 3549) — but restricted that support to the lx brand • Added support for thunk-less native binaries within an LX branded zone • Added native inotify(5) • Added initial 64-bit support

Slide 16

Slide 16 text

What is left to do? • vsyscall support (needed for 64-bit) • Anything else for 64-bit • Stack switching (needed for Go) • Multi-threaded ptrace support • Lots of using it and figuring out what breaks!

Slide 17

Slide 17 text

How can you get involved? • SmartOS contains latest-and-greatest bits; first step is to get SmartOS running • We have a 32-bit Ubuntu 14.04 image that can be used to create a zone via vmadm: b7493690-f019-4612-958b-bab5f844283e • Will need to configure a VM with “kernel-version” set to 3.13.0 and “brand” to “lx” in the vmadm JSON payload • If you find that something is boken, create an issue on the illumos-joyent github repo • Once 64-bit is working, we will be very actively seeking community engagement; stay tuned!

Slide 18

Slide 18 text

Thanks! • The original BrandZ team at Sun for a remarkable amount of work: Nils Nieuwejaar and Russ Blaine • The illumos community — especially David Mackay! — for inspiring the revival • Jerry Jelinek for leading the charge — and doing the vast majority of the work! • @rmustacc for thunk-less native binary support • @jmclulow for stack switching • @djhoffma for his work on ptrace • @joshwilsdon for vmadm support for LX brands