Upgrade to Pro — share decks privately, control downloads, hide ads and more …

The dream is alive! Running Linux containers on...

Bryan Cantrill
September 27, 2014
79

The dream is alive! Running Linux containers on an illumos kernel

Presentation for #illumos day at #surgecon, 2014. Video: https://www.youtube.com/watch?v=TrfD3pC0VSs

Bryan Cantrill

September 27, 2014
Tweet

Transcript

  1. OS emulation: An old idea • Operating systems have long

    employed system call emulation to allow binaries from one operating system run on another on the same instruction set architecture • Combines the binary footprint of the emulated system with the operational advantages of the emulating system • Sun first did this with SunOS 4.x binaries on Solaris 2.x • With Solaris x86, it became possible to run binaries targeted for Linux via SCO’s (open source) “lxrun” • Packaging innovation in Linux in early 2000s + deeply differentiated technologies in Solaris 10 (e.g. ZFS, DTrace, zones) made Linux emulation more attractive
  2. Rise of zones • While more important, the problem also

    became more complicated: programs became more complicated than single-process binaries • Clear that “lxrun” would only work for applications, not systems — needed a deeper solution • Fortunately, coincided with the rise of operating system virtualization embodied by zones • Idea: introduce notion of a branded zone whereby an entire foreign system (a brand) could be emulated within the confines of a zone
  3. BrandZ: LX-branded zones • In 2006, team at Sun that

    included Nils Nieuwejaar and Russ Blaine integrated BrandZ, a Linux branded zone (PSARC 2005/471) • Support was a user/kernel hybrid: lx system calls bounced back to a user-level emulation library that depended on some in-kernel emulation (e.g. futexes) • Support was for RHEL 3 (!): glibc 2.3.2 + Linux 2.4 • Remarkable amount of work was done to handle device pathing, signal handling, /proc — and arcana like TTY ioctls, ptrace, etc. • Worked for a surprising number of binaries!
  4. What was missing? • Support was only for 2.4 kernels

    • Support for 2.6 required adding new, Linux-only mechanisms that had native analogues (e.g., epoll) • Only 32-bit was supported • XVM (the Xen-on-Solaris effort inside of Sun) had much more managerial support and was thought to be a “more supportable” solution
  5. The decline of the lx brand After cresting in 2007,

    contributions to lx dwindled: 0 10 20 30 2006 2007 2008 2009 2010 Pushes to usr/src/lib/brand/lx
  6. Clinically dead The lx brand was removed on June 11,

    2010... 0 10 20 30 2006 2007 2008 2009 2010 2011 2012 2013 Pushes to usr/src/lib/brand/lx
  7. The organ donation years • Joyent customers asked for SmartOS

    to support htop, a colorful Linux program for system process monitoring • htop is very, very specific to Linux /proc — and porting it to use illumos /proc seemed arduous and pointless… • ...but a relatively complete Linux /proc had integrated with the LX brand! • In April 2012, the /proc portion of the LX brand was extracted, cleaned up, and separately integrated • Mounted at /system/lxproc in SmartOS zones; htop modified to look for this path on illumos
  8. Exhumed! • In January 2014, David Mackay, an illumos community

    member, announced that he was able to resurrect the lx brand —and that it appeared to work! Linked below is a webrev which restores LX branded zones support to Illumos: http://cr.illumos.org/~webrev/DavidJX8P/lx-zones-restoration/ I have been running OpenIndiana, using it daily on my workstation for over a month with the above webrev applied to the illumos-gate and built by myself. It would definitely raise interest in Illumos. Indeed, I have seen many people who are extremely interested in LX zones. The LX zones code is minimally invasive on Illumos itself, and is mostly segregated out. I hope you find this of interest.
  9. Could it be revived? • David’s work inspired us to

    rethink LX-branded zones... • It seemed that the reasons for the discontinuation of LX brand support might not still be valid... • ...and it seemed that the engineering challenges might not be as structurally daunting
  10. Has Linux made it easier? • Linux is moving much

    more slowly: pace of development of new user-visible kernel abstraction has slowed • Torvalds discovered religion on ABI compatibility • The need to run on older kernels has dissuaded software from using the more obscure Linux-isms • The glibc/kernel disconnect means that glibc (and apps!) must reasonably be able to process ENOSYS • Easier support model: the rise of the cloud has replaced shrink-wrapped software with open source + SaaS • Server focus: Mac OS X gave us Unix — and relegated “Linux on the desktop” to “Duke Nukem Forever” status
  11. Have motivations changed? • Originally, LX branded zones were about

    bringing Linux applications into established Solaris environments for purposes of hardware consolidation • Port of KVM to illumos circa 2011 solved this problem • ...but KVM has unresolvable performance and resource limitations, and Linux on KVM only gets indirect benefit from ZFS, DTrace and zones • At the same time, enthusiasm for containers and OS- based virtualization have blossomed (ht: Docker) • There seems to be desire for a best-of-all worlds system that combines Linux strengths (binary footprint) with illumos technical differentiators (ZFS, zones, DTrace)
  12. Reviving LX-branded zones • Encouraged that the body might not

    have decomposed, Joyent engineer Jerry Jelinek exhumed the LX brand and reintegrated it into SmartOS on March 20, 2014 • Guiding principles: • Do it all in the open • Do it all on SmartOS master (illumos-joyent) • Add base illumos facilities wherever possible • Aim to upstream to illumos when we’re done • Thanks to Jerry grinding out many, many LX bug fixes, got Ubuntu 10.04 booting in April, Ubuntu 12.04 booting in May and Ubuntu 14.04 booting in July
  13. IT’S ALIVE! Contributions to the lx brand since March: 0

    25 50 75 100 2006 2007 2008 2009 2010 2011 2012 2013 2014 Pushes to usr/src/lib/brand/lx
  14. So what have we done? • Fixed a ton of

    bugs (ht: LTP) • Added native epoll(5) — though not in terms of event ports but rather in terms of poll(7D) • Added exclusive IP stacks for LX-branded zones • Added support for netlink (RFC 3549) — but restricted that support to the lx brand • Added support for thunk-less native binaries within an LX branded zone • Added native inotify(5) • Added initial 64-bit support
  15. What is left to do? • vsyscall support (needed for

    64-bit) • Anything else for 64-bit • Stack switching (needed for Go) • Multi-threaded ptrace support • Lots of using it and figuring out what breaks!
  16. How can you get involved? • SmartOS contains latest-and-greatest bits;

    first step is to get SmartOS running • We have a 32-bit Ubuntu 14.04 image that can be used to create a zone via vmadm: b7493690-f019-4612-958b-bab5f844283e • Will need to configure a VM with “kernel-version” set to 3.13.0 and “brand” to “lx” in the vmadm JSON payload • If you find that something is boken, create an issue on the illumos-joyent github repo • Once 64-bit is working, we will be very actively seeking community engagement; stay tuned!
  17. Thanks! • The original BrandZ team at Sun for a

    remarkable amount of work: Nils Nieuwejaar and Russ Blaine • The illumos community — especially David Mackay! — for inspiring the revival • Jerry Jelinek for leading the charge — and doing the vast majority of the work! • @rmustacc for thunk-less native binary support • @jmclulow for stack switching • @djhoffma for his work on ptrace • @joshwilsdon for vmadm support for LX brands