$30 off During Our Annual Pro Sale. View Details »

The functional innards of Docker for Mac and Windows

The functional innards of Docker for Mac and Windows

Most developers use a Mac or Windows host to develop Docker Linux containers. This normally requires the installation of a Linux virtual machine as well as a complicated setup that includes a local networked filesystem for sharing data between the host and the Linux container, including UID mapping and case sensitivity, with inotify often being unreliable into the container; replicating Linux networking configuration locally to reflect the structure of the deployed microservices on the local laptop; and maintaining a separate Linux virtual machine and hypervisor such as VirtualBox, leading to heavyweight resource usage on a developer laptop.

I describe the architecture of Docker for Mac and Windows, which ships a lightweight hypervisor and user-level networking and filesystem functionality to greatly improve the developer experience with Docker on popular platforms.

Anil Madhavapeddy

June 16, 2016
Tweet

More Decks by Anil Madhavapeddy

Other Decks in Technology

Transcript

  1. The functional innards of

    Docker for Mac and
    Windows
    Anil Madhavapeddy, @avsm
    Docker Inc, @docker
    Jane Street London, Functional Meetup, June 2016

    with thanks to the Docker for Mac and Windows
    teams for extensive contributions.

    View Slide

  2. Transforming the Development Landscape
    2
    Loosely
    Coupled
    Many Small 

    Servers/Devices
    ~2000 Today
    Monolithic
    Big Iron
    Change
    Slowly
    Rapidly
    Updated

    View Slide

  3. View Slide

  4. • All the Linux tools collected in one installer:
    • Bundle includes a full VirtualBox installation
    • Boot2Docker Virtual Machine
    • The Kitematic UI controlled these pieces.
    • A relatively loose collection of components:
    • Installation and lack of integrated updates caused numerous
    user issues.
    • Performance not ideal due to the layering, especially for file
    sharing.
    • Yet most Docker users use a Mac or Windows host as their
    development environment.
    Docker Toolbox

    View Slide

  5. • Easy drag and drop installation, and
    autoupdates to get latest Docker.
    • Secure, sandboxed virtualisation
    architecture without elevated privileges.
    • Native networking support, with VPN and
    network sharing compatibility.
    • File sharing between container and host:
    uid mapping, inotify events, etc.
    Docker for Mac
    Aiming for a native OSX experience
    that works with existing developer
    workflows.
    Sign up at
    beta.docker.com

    View Slide

  6. View Slide

  7. Virtualisation

    View Slide

  8. • Uses the new HyperKit framework, which is in turn
    based on xHyve and FreeBSD's bHyve.
    • Sandbox friendly: processes largely run as non-
    root, with privileges of the local user.
    Virtualisation

    View Slide

  9. • Uses the new HyperKit framework, which is in turn
    based on xHyve and FreeBSD's bHyve.
    • Sandbox friendly: processes largely run as non-
    root, with privileges of the local user.
    Virtualisation
    OSX Kernel
    Hypervisor.
    framework
    Hardware
    virt: VMX,
    nested
    paging

    View Slide

  10. • Uses the new HyperKit framework, which is in turn
    based on xHyve and FreeBSD's bHyve.
    • Sandbox friendly: processes largely run as non-
    root, with privileges of the local user.
    Virtualisation
    OSX Kernel Userspace
    Hypervisor.
    framework
    User Process
    Thread/vCPU
    Traps on I/O pages
    Manages ACPI, PCI
    devices
    Hardware
    virt: VMX,
    nested
    paging

    View Slide

  11. • Uses the new HyperKit framework, which is in turn
    based on xHyve and FreeBSD's bHyve.
    • Sandbox friendly: processes largely run as non-
    root, with privileges of the local user.
    Virtualisation
    OSX Kernel Userspace
    Hypervisor.
    framework
    User Process
    Hardware
    virt: VMX,
    nested
    paging
    Process
    Linux Kernel
    VirtIO IPC
    VirtIO Block
    VirtIO Net
    Alpine Linux
    Userspace
    Latest Docker
    preconfigured
    QCow2
    VPNKit
    Logs redirected to
    OSX host

    View Slide

  12. • Uses the new HyperKit framework, which is in turn
    based on xHyve and FreeBSD's bHyve.
    • Embeds Linux: includes an embedded
    lightweight Alpine Linux distribution optimised for
    fast boot and stateless operation for containers.
    Virtualisation
    $ docker info
    Containers: 358
    Running: 13
    Paused: 0
    Stopped: 345
    Images: 485
    Server Version: 1.11.1
    Storage Driver: aufs
    Root Dir: /var/lib/docker/aufs
    Backing Filesystem: extfs
    Dirperm1 Supported: true
    Logging Driver: json-file
    Cgroup Driver: cgroupfs
    Plugins:
    Volume: local
    Network: bridge null host
    Kernel Version: 4.4.9-moby
    Operating System: Alpine Linux v3.3
    OSType: linux
    Architecture: x86_64
    CPUs: 2
    Total Memory: 3.858 GiB

    View Slide

  13. • Uses the new HyperKit framework, which is in turn based
    on xHyve and FreeBSD's bHyve.
    • Sandbox friendly: processes largely run as non-root,
    with privileges of the local user.
    • Embeds Linux: includes an embedded lightweight
    Alpine Linux distribution optimised for fast boot and
    stateless operation for containers.
    • Drag 'n drop installation: Docker.app is self-contained,
    installs symlinks from app bundle into /usr/local,
    and autoupdates.
    Virtualisation

    View Slide

  14. • Performance: The CPU performance of a Linux container is
    largely the same as when running the same compute on the
    Mac, since we use the hardware CPU virtualisation extensions.
    • Battery life: Some battery life hit due to running containers
    instead of MacOS X native processes, but not adverse for
    normal use.
    • Disk usage: The app manages disk usage via a qcow2 file in
    its data directory. This is a sparse file that is allocated on
    demand, up to a (current) maximum of 64GB of disk space.
    Can be excluded from Time Machine backups.
    Virtualisation

    View Slide

  15. Networking
    Notworking

    View Slide

  16. • Want to hide the gory details of virtualisation from
    the user. The Linux VM should be "invisible".
    • Not solving this leads to many user complaints:
    • VPN software and corporate installations do not
    like bridged virtual machines or custom routing.

    Result: container traffic cannot connect to Internet.
    • Services cannot be exposed on localhost or
    the external interface and are instead on the Linux
    VM IP address.

    Result: breaks common web oAuth workflows.
    Networking

    View Slide

  17. • Challenge: Deal with custom VPN software on the
    host that makes it difficult to bridge.
    • Solution: VPNKit, efficiently reconstructs container
    traffic into separate TCP/IP flows and translates
    them into native OSX/Windows sockets.
    OSX Host Linux Host Container
    RUN <...>
    com.docker.hyperkit-net
    Reconstruct traffic
    TCP flows
    Translate to OSX
    socket calls
    Ethernet bridge
    DHCPv4
    NTP
    Networking

    View Slide

  18. • Challenge: Deal with custom VPN software on the
    host that makes it difficult to bridge.
    • Solution: VPNKit, efficiently reconstructs container
    traffic into separate TCP/IP flows and translates
    them into native OSX/Windows sockets.
    • Benefits:
    • All network traffic is generated from normal socket
    calls (e.g. gethostbyaddr) on the Mac, so
    interacts well with firewalls, VPNs, and any local
    security policies.
    Networking

    View Slide

  19. OSX Host Linux Host
    Privileged Port
    Service
    Container
    EXPOSE
    Port Service
    VSock Binder
    RUN <...>
    VSock Listener
    Userland Proxy
    • Challenge: Services publishing ports should be
    exposed on localhost without needing VM info.
    • Solution: VPNKit forwards container port requests
    to a OSX service which binds them natively on its
    external interface.
    Networking

    View Slide

  20. • Challenge: Services publishing ports should be
    exposed on localhost without needing VM info.
    • Solution: VPNKit forwards container port requests
    to a OSX service which binds them natively on its
    external interface.
    • Benefits:
    • docker run -P on the Mac now works without
    requiring any knowledge of the VM innards.
    • External oAuth workflows operate with web apps.
    Networking

    View Slide

  21. Filesystems

    View Slide

  22. • Challenge: Share arbitrary OSX directory tree into
    Linux container without requiring extensive
    modification of either side.
    • Solution: Use a FUSE forwarding layer and
    translate Linux filesystem calls to OSX equivalents.
    OSX Host Linux Host Container
    VOLUME
    com.docker.osxfs
    Track extra
    metadata
    Translate to OSX
    filesystem calls
    FUSE
    Filesystem Sharing

    View Slide

  23. • Challenge: Need filesystem activation so events on
    the Mac wake up container servers and vice-versa.
    • Solution: osxfs uses FSEvents API and injects
    inotify activation events into container.
    OSX Host Linux Host Container
    VOLUME
    com.docker.osxfs
    FSEvents watches
    open files
    Events from Linux
    causes OSX apps
    to wake up
    FUSE
    Filesystem Sharing

    View Slide

  24. •New osxfs engine that bind mounts OSX filesystem trees into
    Docker containers.
    •Daemon that listens bidirectionally on shared volumes and
    translates between OSX and Linux. Includes notifications, via
    FSEvents on Mac and inotify on Linux.
    •Runs as user and so cannot access system files on OSX host.
    Planning to further restrict host access in future.
    •Mount points for /Users, /Volumes, /private and /tmp from
    the Mac exist.
    •All requesting processes are treated as owners and group
    members on all bind mounted resources. User/group changes
    are persisted but not discriminated on.
    Filesystem Sharing

    View Slide

  25. $ docker run resin/armv7hf-debian uname -a
    Linux 7ed2fca7a3f0 4.1.12 #1 SMP Tue Jan 12 10:51:00
    UTC 2016 armv7l GNU/Linux
    $ docker run justincormack/ppc64le-debian uname -a
    Linux edd13885f316 4.1.12 #1 SMP Tue Jan 12 10:51:00
    UTC 2016 ppc64le GNU/Linux
    Multi-CPU architectures

    View Slide

  26. Questions?
    Sign up at
    beta.docker.com
    Twitter: @avsm
    https://github.com/docker/hyperkit
    https://github.com/docker/vpnkit
    https://github.com/docker/datakit

    View Slide

  27. Cambridge hackathons coming up:
    MirageOS in mid-July
    for 2 days
    OCaml Compiler
    Hacking in August
    Contact Gemma Gordon

    Beginners welcome!

    View Slide