Upgrade to Pro — share decks privately, control downloads, hide ads and more …

The dream is alive! Running Linux containers on an illumos kernel

Bryan Cantrill
September 27, 2014
47

The dream is alive! Running Linux containers on an illumos kernel

Presentation for #illumos day at #surgecon, 2014. Video: https://www.youtube.com/watch?v=TrfD3pC0VSs

Bryan Cantrill

September 27, 2014
Tweet

Transcript

  1. The dream is alive!
    Running Linux containers
    on an illumos kernel
    CTO
    [email protected]
    Bryan Cantrill
    @bcantrill

    View full-size slide

  2. OS emulation: An old idea
    • Operating systems have long employed system call
    emulation to allow binaries from one operating system
    run on another on the same instruction set architecture
    • Combines the binary footprint of the emulated system
    with the operational advantages of the emulating system
    • Sun first did this with SunOS 4.x binaries on Solaris 2.x
    • With Solaris x86, it became possible to run binaries
    targeted for Linux via SCO’s (open source) “lxrun”
    • Packaging innovation in Linux in early 2000s + deeply
    differentiated technologies in Solaris 10 (e.g. ZFS,
    DTrace, zones) made Linux emulation more attractive

    View full-size slide

  3. Rise of zones
    • While more important, the problem also became more
    complicated: programs became more complicated than
    single-process binaries
    • Clear that “lxrun” would only work for applications, not
    systems — needed a deeper solution
    • Fortunately, coincided with the rise of operating system
    virtualization embodied by zones
    • Idea: introduce notion of a branded zone whereby an
    entire foreign system (a brand) could be emulated within
    the confines of a zone

    View full-size slide

  4. BrandZ: LX-branded zones
    • In 2006, team at Sun that included Nils Nieuwejaar and
    Russ Blaine integrated BrandZ, a Linux branded zone
    (PSARC 2005/471)
    • Support was a user/kernel hybrid: lx system calls
    bounced back to a user-level emulation library that
    depended on some in-kernel emulation (e.g. futexes)
    • Support was for RHEL 3 (!): glibc 2.3.2 + Linux 2.4
    • Remarkable amount of work was done to handle device
    pathing, signal handling, /proc — and arcana like TTY
    ioctls, ptrace, etc.
    • Worked for a surprising number of binaries!

    View full-size slide

  5. What was missing?
    • Support was only for 2.4 kernels
    • Support for 2.6 required adding new, Linux-only
    mechanisms that had native analogues (e.g., epoll)
    • Only 32-bit was supported
    • XVM (the Xen-on-Solaris effort inside of Sun) had much
    more managerial support and was thought to be a “more
    supportable” solution

    View full-size slide

  6. The decline of the lx brand
    After cresting in 2007, contributions to lx dwindled:
    0
    10
    20
    30
    2006 2007 2008 2009 2010
    Pushes to usr/src/lib/brand/lx

    View full-size slide

  7. Clinically dead
    The lx brand was removed on June 11, 2010...
    0
    10
    20
    30
    2006 2007 2008 2009 2010 2011 2012 2013
    Pushes to usr/src/lib/brand/lx

    View full-size slide

  8. The organ donation years
    • Joyent customers asked for SmartOS to support htop, a
    colorful Linux program for system process monitoring
    • htop is very, very specific to Linux /proc — and porting it
    to use illumos /proc seemed arduous and pointless…
    • ...but a relatively complete Linux /proc had integrated
    with the LX brand!
    • In April 2012, the /proc portion of the LX brand was
    extracted, cleaned up, and separately integrated
    • Mounted at /system/lxproc in SmartOS zones; htop
    modified to look for this path on illumos

    View full-size slide

  9. Exhumed!
    • In January 2014, David Mackay, an illumos community
    member, announced that he was able to resurrect the lx
    brand —and that it appeared to work!
    Linked below is a webrev which restores LX branded zones
    support to Illumos:
    http://cr.illumos.org/~webrev/DavidJX8P/lx-zones-restoration/
    I have been running OpenIndiana, using it daily on my
    workstation for over a month with the above webrev applied to
    the illumos-gate and built by myself.
    It would definitely raise interest in Illumos. Indeed, I have
    seen many people who are extremely interested in LX zones.
    The LX zones code is minimally invasive on Illumos itself, and
    is mostly segregated out.
    I hope you find this of interest.

    View full-size slide

  10. Could it be revived?
    • David’s work inspired us to rethink LX-branded zones...
    • It seemed that the reasons for the discontinuation of LX
    brand support might not still be valid...
    • ...and it seemed that the engineering challenges might
    not be as structurally daunting

    View full-size slide

  11. Has Linux made it easier?
    • Linux is moving much more slowly: pace of development
    of new user-visible kernel abstraction has slowed
    • Torvalds discovered religion on ABI compatibility
    • The need to run on older kernels has dissuaded
    software from using the more obscure Linux-isms
    • The glibc/kernel disconnect means that glibc (and apps!)
    must reasonably be able to process ENOSYS
    • Easier support model: the rise of the cloud has replaced
    shrink-wrapped software with open source + SaaS
    • Server focus: Mac OS X gave us Unix — and relegated
    “Linux on the desktop” to “Duke Nukem Forever” status

    View full-size slide

  12. Have motivations changed?
    • Originally, LX branded zones were about bringing Linux
    applications into established Solaris environments for
    purposes of hardware consolidation
    • Port of KVM to illumos circa 2011 solved this problem
    • ...but KVM has unresolvable performance and resource
    limitations, and Linux on KVM only gets indirect benefit
    from ZFS, DTrace and zones
    • At the same time, enthusiasm for containers and OS-
    based virtualization have blossomed (ht: Docker)
    • There seems to be desire for a best-of-all worlds system
    that combines Linux strengths (binary footprint) with
    illumos technical differentiators (ZFS, zones, DTrace)

    View full-size slide

  13. Reviving LX-branded zones
    • Encouraged that the body might not have decomposed,
    Joyent engineer Jerry Jelinek exhumed the LX brand
    and reintegrated it into SmartOS on March 20, 2014
    • Guiding principles:
    • Do it all in the open
    • Do it all on SmartOS master (illumos-joyent)
    • Add base illumos facilities wherever possible
    • Aim to upstream to illumos when we’re done
    • Thanks to Jerry grinding out many, many LX bug fixes,
    got Ubuntu 10.04 booting in April, Ubuntu 12.04 booting
    in May and Ubuntu 14.04 booting in July

    View full-size slide

  14. IT’S ALIVE!
    Contributions to the lx brand since March:
    0
    25
    50
    75
    100
    2006 2007 2008 2009 2010 2011 2012 2013 2014
    Pushes to usr/src/lib/brand/lx

    View full-size slide

  15. So what have we done?
    • Fixed a ton of bugs (ht: LTP)
    • Added native epoll(5) — though not in terms of event
    ports but rather in terms of poll(7D)
    • Added exclusive IP stacks for LX-branded zones
    • Added support for netlink (RFC 3549) — but restricted
    that support to the lx brand
    • Added support for thunk-less native binaries within an
    LX branded zone
    • Added native inotify(5)
    • Added initial 64-bit support

    View full-size slide

  16. What is left to do?
    • vsyscall support (needed for 64-bit)
    • Anything else for 64-bit
    • Stack switching (needed for Go)
    • Multi-threaded ptrace support
    • Lots of using it and figuring out what breaks!

    View full-size slide

  17. How can you get involved?
    • SmartOS contains latest-and-greatest bits; first step is to
    get SmartOS running
    • We have a 32-bit Ubuntu 14.04 image that can be used
    to create a zone via vmadm:
    b7493690-f019-4612-958b-bab5f844283e
    • Will need to configure a VM with “kernel-version” set to
    3.13.0 and “brand” to “lx” in the vmadm JSON payload
    • If you find that something is boken, create an issue on
    the illumos-joyent github repo
    • Once 64-bit is working, we will be very actively seeking
    community engagement; stay tuned!

    View full-size slide

  18. Thanks!
    • The original BrandZ team at Sun for a remarkable
    amount of work: Nils Nieuwejaar and Russ Blaine
    • The illumos community — especially David Mackay! —
    for inspiring the revival
    • Jerry Jelinek for leading the charge — and doing the
    vast majority of the work!
    • @rmustacc for thunk-less native binary support
    • @jmclulow for stack switching
    • @djhoffma for his work on ptrace
    • @joshwilsdon for vmadm support for LX brands

    View full-size slide