Upgrade to Pro — share decks privately, control downloads, hide ads and more …

The dream is alive! Running Linux containers on an illumos kernel

Bryan Cantrill
September 27, 2014
22

The dream is alive! Running Linux containers on an illumos kernel

Presentation for #illumos day at #surgecon, 2014. Video: https://www.youtube.com/watch?v=TrfD3pC0VSs

Bryan Cantrill

September 27, 2014
Tweet

Transcript

  1. The dream is alive!
    Running Linux containers
    on an illumos kernel
    CTO
    [email protected]
    Bryan Cantrill
    @bcantrill

    View Slide

  2. OS emulation: An old idea
    • Operating systems have long employed system call
    emulation to allow binaries from one operating system
    run on another on the same instruction set architecture
    • Combines the binary footprint of the emulated system
    with the operational advantages of the emulating system
    • Sun first did this with SunOS 4.x binaries on Solaris 2.x
    • With Solaris x86, it became possible to run binaries
    targeted for Linux via SCO’s (open source) “lxrun”
    • Packaging innovation in Linux in early 2000s + deeply
    differentiated technologies in Solaris 10 (e.g. ZFS,
    DTrace, zones) made Linux emulation more attractive

    View Slide

  3. Rise of zones
    • While more important, the problem also became more
    complicated: programs became more complicated than
    single-process binaries
    • Clear that “lxrun” would only work for applications, not
    systems — needed a deeper solution
    • Fortunately, coincided with the rise of operating system
    virtualization embodied by zones
    • Idea: introduce notion of a branded zone whereby an
    entire foreign system (a brand) could be emulated within
    the confines of a zone

    View Slide

  4. BrandZ: LX-branded zones
    • In 2006, team at Sun that included Nils Nieuwejaar and
    Russ Blaine integrated BrandZ, a Linux branded zone
    (PSARC 2005/471)
    • Support was a user/kernel hybrid: lx system calls
    bounced back to a user-level emulation library that
    depended on some in-kernel emulation (e.g. futexes)
    • Support was for RHEL 3 (!): glibc 2.3.2 + Linux 2.4
    • Remarkable amount of work was done to handle device
    pathing, signal handling, /proc — and arcana like TTY
    ioctls, ptrace, etc.
    • Worked for a surprising number of binaries!

    View Slide

  5. What was missing?
    • Support was only for 2.4 kernels
    • Support for 2.6 required adding new, Linux-only
    mechanisms that had native analogues (e.g., epoll)
    • Only 32-bit was supported
    • XVM (the Xen-on-Solaris effort inside of Sun) had much
    more managerial support and was thought to be a “more
    supportable” solution

    View Slide

  6. The decline of the lx brand
    After cresting in 2007, contributions to lx dwindled:
    0
    10
    20
    30
    2006 2007 2008 2009 2010
    Pushes to usr/src/lib/brand/lx

    View Slide

  7. Clinically dead
    The lx brand was removed on June 11, 2010...
    0
    10
    20
    30
    2006 2007 2008 2009 2010 2011 2012 2013
    Pushes to usr/src/lib/brand/lx

    View Slide

  8. The organ donation years
    • Joyent customers asked for SmartOS to support htop, a
    colorful Linux program for system process monitoring
    • htop is very, very specific to Linux /proc — and porting it
    to use illumos /proc seemed arduous and pointless…
    • ...but a relatively complete Linux /proc had integrated
    with the LX brand!
    • In April 2012, the /proc portion of the LX brand was
    extracted, cleaned up, and separately integrated
    • Mounted at /system/lxproc in SmartOS zones; htop
    modified to look for this path on illumos

    View Slide

  9. Exhumed!
    • In January 2014, David Mackay, an illumos community
    member, announced that he was able to resurrect the lx
    brand —and that it appeared to work!
    Linked below is a webrev which restores LX branded zones
    support to Illumos:
    http://cr.illumos.org/~webrev/DavidJX8P/lx-zones-restoration/
    I have been running OpenIndiana, using it daily on my
    workstation for over a month with the above webrev applied to
    the illumos-gate and built by myself.
    It would definitely raise interest in Illumos. Indeed, I have
    seen many people who are extremely interested in LX zones.
    The LX zones code is minimally invasive on Illumos itself, and
    is mostly segregated out.
    I hope you find this of interest.

    View Slide

  10. Could it be revived?
    • David’s work inspired us to rethink LX-branded zones...
    • It seemed that the reasons for the discontinuation of LX
    brand support might not still be valid...
    • ...and it seemed that the engineering challenges might
    not be as structurally daunting

    View Slide

  11. Has Linux made it easier?
    • Linux is moving much more slowly: pace of development
    of new user-visible kernel abstraction has slowed
    • Torvalds discovered religion on ABI compatibility
    • The need to run on older kernels has dissuaded
    software from using the more obscure Linux-isms
    • The glibc/kernel disconnect means that glibc (and apps!)
    must reasonably be able to process ENOSYS
    • Easier support model: the rise of the cloud has replaced
    shrink-wrapped software with open source + SaaS
    • Server focus: Mac OS X gave us Unix — and relegated
    “Linux on the desktop” to “Duke Nukem Forever” status

    View Slide

  12. Have motivations changed?
    • Originally, LX branded zones were about bringing Linux
    applications into established Solaris environments for
    purposes of hardware consolidation
    • Port of KVM to illumos circa 2011 solved this problem
    • ...but KVM has unresolvable performance and resource
    limitations, and Linux on KVM only gets indirect benefit
    from ZFS, DTrace and zones
    • At the same time, enthusiasm for containers and OS-
    based virtualization have blossomed (ht: Docker)
    • There seems to be desire for a best-of-all worlds system
    that combines Linux strengths (binary footprint) with
    illumos technical differentiators (ZFS, zones, DTrace)

    View Slide

  13. Reviving LX-branded zones
    • Encouraged that the body might not have decomposed,
    Joyent engineer Jerry Jelinek exhumed the LX brand
    and reintegrated it into SmartOS on March 20, 2014
    • Guiding principles:
    • Do it all in the open
    • Do it all on SmartOS master (illumos-joyent)
    • Add base illumos facilities wherever possible
    • Aim to upstream to illumos when we’re done
    • Thanks to Jerry grinding out many, many LX bug fixes,
    got Ubuntu 10.04 booting in April, Ubuntu 12.04 booting
    in May and Ubuntu 14.04 booting in July

    View Slide

  14. IT’S ALIVE!
    Contributions to the lx brand since March:
    0
    25
    50
    75
    100
    2006 2007 2008 2009 2010 2011 2012 2013 2014
    Pushes to usr/src/lib/brand/lx

    View Slide

  15. So what have we done?
    • Fixed a ton of bugs (ht: LTP)
    • Added native epoll(5) — though not in terms of event
    ports but rather in terms of poll(7D)
    • Added exclusive IP stacks for LX-branded zones
    • Added support for netlink (RFC 3549) — but restricted
    that support to the lx brand
    • Added support for thunk-less native binaries within an
    LX branded zone
    • Added native inotify(5)
    • Added initial 64-bit support

    View Slide

  16. What is left to do?
    • vsyscall support (needed for 64-bit)
    • Anything else for 64-bit
    • Stack switching (needed for Go)
    • Multi-threaded ptrace support
    • Lots of using it and figuring out what breaks!

    View Slide

  17. How can you get involved?
    • SmartOS contains latest-and-greatest bits; first step is to
    get SmartOS running
    • We have a 32-bit Ubuntu 14.04 image that can be used
    to create a zone via vmadm:
    b7493690-f019-4612-958b-bab5f844283e
    • Will need to configure a VM with “kernel-version” set to
    3.13.0 and “brand” to “lx” in the vmadm JSON payload
    • If you find that something is boken, create an issue on
    the illumos-joyent github repo
    • Once 64-bit is working, we will be very actively seeking
    community engagement; stay tuned!

    View Slide

  18. Thanks!
    • The original BrandZ team at Sun for a remarkable
    amount of work: Nils Nieuwejaar and Russ Blaine
    • The illumos community — especially David Mackay! —
    for inspiring the revival
    • Jerry Jelinek for leading the charge — and doing the
    vast majority of the work!
    • @rmustacc for thunk-less native binary support
    • @jmclulow for stack switching
    • @djhoffma for his work on ptrace
    • @joshwilsdon for vmadm support for LX brands

    View Slide