Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Deep Dive into firecracker-containerd (re:Invent 2019, CON408)

Samuel Karp
December 02, 2019

Deep Dive into firecracker-containerd (re:Invent 2019, CON408)

Last year, we released the Firecracker virtual machine monitor (VMM) built on top of the Linux KVM subsystem, which is optimized for lightweight, container-like “microVMs.” In this session, we dive deep into the architecture of the firecracker-containerd project, which aims to allow portability between standard OCI container images and the larger container ecosystem with Firecracker microVMs. Topics covered include the standard containerd architecture with the reference OCI runtime (runc), challenges adapting containers into microVMs, and the firecracker-containerd suite.

Samuel Karp

December 02, 2019
Tweet

More Decks by Samuel Karp

Other Decks in Programming

Transcript

  1. View Slide

  2. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
    Deep dive into firecracker-containerd
    Samuel Karp
    C O N 4 0 8 - R
    Senior Software Development Engineer
    Amazon Web Services

    View Slide

  3. Related breakouts
    CON409 Security and monitoring in a serverless world on AWS Fargate
    CON423 AWS Fargate under the hood

    View Slide

  4. Agenda
    Linux containers, virtual machines, and isolation
    What is a container runtime? What is containerd?
    The Firecracker virtual machine monitor (VMM)
    Adapting containerd to Firecracker
    Current status and roadmap
    Q&A

    View Slide

  5. A really brief overview of containers
    • A mechanism for running software
    • With some isolation
    • With some repeatability
    • With a standard format for distribution
    • With common tooling

    View Slide

  6. Linux container primitives
    • Namespaces – Visibility restrictions
    • Control groups (cgroups) – Resource limits
    • Capabilities – Permission restrictions
    • Seccomp – Syscall allow/deny lists
    • Linux Security Modules – Resource access control
    • Union filesystems – Image layers

    View Slide

  7. What don’t containers give you?
    • Independent kernel behavior (kernel tuning)
    • Security isolation from other containers

    View Slide

  8. Containers and VMs
    Containers
    • Use Linux primitives to separate
    processes
    • Share a Linux kernel
    • Fast starts, minimal overhead
    • Flexible configuration
    Virtual Machines
    • Virtualize or emulate hardware
    components
    • Completely separate kernels
    (maybe not Linux)
    • Slower starts: must boot kernel
    and set up hardware
    Hardware
    Linux kernel
    namespaces
    cgroups ...
    Container Container
    Hardware
    Linux kernel KVM
    Virtual hardware Virtual hardware
    VM guest
    VM guest

    View Slide

  9. Why use VMs?
    • Independent Linux kernel in each VM
    • Virtual machine monitor (VMM) is an additional isolation boundary
    • Interface between VM and VMM (hypercalls) is defined by the
    hypervisor
    • Hardware interfaces are standardized
    • Good for defining trust and resource boundaries
    • Isolating multi-tenant workloads
    • Isolating non-trusted workloads

    View Slide

  10. What do we mean by isolation?
    • Prevent customers from
    affecting each other
    • Prevent customers from
    affecting the infrastructure
    • Defense in depth
    • Container security
    • seccomp
    • Linux security modules
    • Capabilities
    • Hypervisor
    • Emulation, virtualization, or
    pass-through

    View Slide

  11. Common container tooling
    • Docker UX (docker build, docker run)
    • Images and registries for software distribution (docker push,
    docker pull)
    • Container orchestrators
    • Amazon Elastic Container Service (Amazon ECS)
    • Kubernetes
    • Mesos
    • Open Containers Initiative (OCI)
    • Image standard
    • Runtime standard
    • Distribution standard

    View Slide

  12. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.

    View Slide

  13. How you run containers
    Part of stack Example components
    Cluster orchestrator Amazon ECS, Kubernetes, Mesos
    Local management Docker or containerd
    Container runtime runc or Firecracker

    View Slide

  14. How you run containers
    Part of stack Example components
    Cluster orchestrator Amazon ECS, Kubernetes, Mesos
    Local management Docker or containerd
    Container runtime runc or Firecracker

    View Slide

  15. Container runtimes
    • Mechanism for starting and
    managing container workloads
    • (Linux containers) Set up
    cgroups, namespaces,
    filesystems, capabilities,
    seccomp, etc.
    • OCI runtime specification
    • Command-line interface for setting up
    a container
    • On-disk “bundle”
    • Root filesystem
    • JSON file describing configuration
    • runc
    • Reference implementation
    • Split out from Docker

    View Slide

  16. How you run containers
    Part of stack Example components
    Cluster orchestrator Amazon ECS, Kubernetes, Mesos
    Local management Docker or containerd
    Container runtime runc or Firecracker

    View Slide

  17. containerd
    • Daemon for managing containers
    • Modular framework for container
    lifecycle workflows
    • Integrates with OCI runtimes and
    containerd v2 runtimes

    View Slide

  18. The containerd stack
    • gRPC API and services
    • Storage services
    • Content store
    • Snapshotters
    • Runtime (OCI/runc, v2)
    gRPC Metrics
    Storage
    Content Snapshot Diff
    Metadata
    Images Containers Tasks Events
    Runtimes

    View Slide

  19. The containerd stack
    • gRPC API and services
    • Storage services
    • Content store
    • Snapshotters
    • Runtime (OCI/runc, v2)
    gRPC Metrics
    Storage
    Content Snapshot Diff
    Metadata
    Images Containers Tasks Events
    Runtimes

    View Slide

  20. The containerd stack
    • gRPC API and services
    • Storage services
    • Content store
    • Snapshotters
    • Runtime (OCI/runc, v2)
    gRPC Metrics
    Storage
    Content Snapshot Diff
    Metadata
    Images Containers Tasks Events
    Runtimes

    View Slide

  21. The containerd stack
    • gRPC API and services
    • Storage services
    • Content store
    • Snapshotters
    • Runtime (OCI/runc, v2)
    gRPC Metrics
    Storage
    Content Snapshot Diff
    Metadata
    Images Containers Tasks Events
    Runtimes

    View Slide

  22. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.

    View Slide

  23. Firecracker virtual machine monitor (VMM)
    • KVM-based VMM in Rust
    • Open source
    • Targeted at serverless workloads
    • Not a general-purpose VMM

    View Slide

  24. Firecracker design goals
    Security
    • Very limited device model
    • Very limited feature set
    • Eliminate guest interactions
    with host kernel
    • Sandbox/jail the VMM
    • Memory-safe programming
    language
    • Single VM per Firecracker
    process
    Efficiency
    • Fast boot time
    • Low memory and CPU overhead
    • API driven

    View Slide

  25. firecracker-containerd goals
    Containers
    • Compatible images
    • Familiar tooling
    • Support existing workflows
    • Allow composition of containers
    • Integrate with orchestrators
    • Minimal additional overhead
    Security
    • Hypervisor-based isolation
    • Limited access to the host

    View Slide

  26. Adapting containers to Firecracker
    Firecracker VMM considerations
    • No filesystem sharing
    • No dynamic device attachments
    • Limited networking options
    (tap, not veth)
    • Cross-boundary communication
    with vsock
    Adapting to containerd
    • Block-device snapshotter
    • API to manage VMM lifecycle
    • Split the “shim” into two parts
    • “Runtime” on the host
    • “Agent” inside the VM
    • …that runs containers via runc
    • Network
    • tap device
    • Usable with Container Network
    Interface (CNI) plugins

    View Slide

  27. firecracker-containerd architecture
    microVM
    containerd
    runc
    Content
    store
    Disk
    Kernel
    Internal
    FC agent
    Container
    FC control plugin
    Block-device
    snapshotter
    Container
    root fs
    Firecracker VMM
    FC runtime

    View Slide

  28. firecracker-containerd architecture
    microVM
    containerd
    runc
    Content
    store
    Disk
    Kernel
    Internal
    FC agent
    Container
    FC control plugin
    Block-device
    snapshotter
    Container
    root fs
    Firecracker VMM
    FC runtime

    View Slide

  29. What is a “block device” snapshotter?
    • Store the container image layers
    • Manage writable space for each container
    • Compose container image layers and writable space (snapshot)
    • Treat each snapshot as a device to attach to the VM
    • Inside the VM, mount the device to expose its filesystem

    View Slide

  30. firecracker-containerd architecture
    microVM
    containerd
    runc
    Content
    store
    Disk
    Kernel
    Internal
    FC agent
    Container
    FC control plugin
    Block-device
    snapshotter
    Container
    root fs
    Firecracker VMM
    FC runtime

    View Slide

  31. What does the “firecracker-control” plugin do?
    • First-class VM construct and API
    for Firecracker
    • Specify VM-related parameters
    like the kernel and VM root
    filesystem
    • Allocate and manage VM
    resources: block devices,
    network interfaces, etc.
    • Manages the VM lifecycle
    • Compiled-in plugin
    • gRPC API over the same socket
    • Specific to Firecracker for now
    • Looking for a better generic
    solution

    View Slide

  32. firecracker-containerd architecture
    microVM
    containerd
    runc
    Content
    store
    Disk
    Kernel
    Internal
    FC agent
    Container
    FC control plugin
    Block-device
    snapshotter
    Container
    root fs
    Firecracker VMM
    FC runtime

    View Slide

  33. What does the “runtime” do?
    • Proxies container management
    commands to agent inside VM
    • Proxies I/O streams
    • Proxies events and metrics from
    inside the VM back out to
    containerd
    • Built on containerd’s V2 API
    • ttrpc API for communication
    • Many containers per VM, one
    runtime

    View Slide

  34. firecracker-containerd architecture
    microVM
    containerd
    runc
    Content
    store
    Disk
    Kernel
    Internal
    FC agent
    Container
    FC control plugin
    Block-device
    snapshotter
    Container
    root fs
    Firecracker VMM
    FC runtime

    View Slide

  35. What does the “agent” do?
    • Manages the lifecycle of the
    containers inside the VM
    • Communicates over vsock
    • Receives container management
    commands from runtime
    • Proxies I/O streams
    • Proxies events and metrics from
    inside the VM back out to the
    runtime
    • Associates each container with
    the appropriate block device
    • Mounts container filesystems
    • Uses runc to set up cgroups,
    namespaces, etc.
    • Looks like a standard Linux container to
    the workload inside

    View Slide

  36. How we run containers with firecracker-containerd
    orchestrator containerd snapshotter
    control
    plugin
    Firecracker
    runtime agent
    for each container
    container
    process
    for each container

    View Slide

  37. How we run containers with firecracker-containerd
    orchestrator containerd snapshotter
    control
    plugin
    Firecracker
    runtime agent
    for each container
    container
    process
    for each container

    View Slide

  38. How we run containers with firecracker-containerd
    orchestrator containerd snapshotter
    control
    plugin
    Firecracker
    runtime agent
    for each container
    container
    process
    for each container
    prepare snapshot prepare snapshot

    View Slide

  39. How we run containers with firecracker-containerd
    orchestrator containerd snapshotter
    control
    plugin
    Firecracker
    runtime agent
    for each container
    container
    process
    for each container
    prepare snapshot prepare snapshot
    snapshot $snap1
    snapshot $snap1

    View Slide

  40. How we run containers with firecracker-containerd
    orchestrator containerd snapshotter
    control
    plugin
    Firecracker
    runtime agent
    for each container
    container
    process
    for each container
    prepare snapshot prepare snapshot
    create VM start start & boot

    View Slide

  41. How we run containers with firecracker-containerd
    orchestrator containerd snapshotter
    control
    plugin
    Firecracker
    runtime agent
    for each container
    container
    process
    for each container
    prepare snapshot prepare snapshot
    VM $vm1

    View Slide

  42. How we run containers with firecracker-containerd
    orchestrator containerd snapshotter
    control
    plugin
    Firecracker
    runtime agent
    for each container
    container
    process
    for each container
    prepare snapshot prepare snapshot

    View Slide

  43. How we run containers with firecracker-containerd
    orchestrator containerd snapshotter
    control
    plugin
    Firecracker
    runtime agent
    for each container
    container
    process
    for each container
    prepare snapshot prepare snapshot
    run container
    $c1 from $snap1 run container $c1 from $snap1

    View Slide

  44. How we run containers with firecracker-containerd
    orchestrator containerd snapshotter
    control
    plugin
    Firecracker
    runtime agent
    for each container
    container
    process
    for each container
    prepare snapshot prepare snapshot
    attach $snap1

    View Slide

  45. How we run containers with firecracker-containerd
    orchestrator containerd snapshotter
    control
    plugin
    Firecracker
    runtime agent
    for each container
    container
    process
    for each container
    prepare snapshot prepare snapshot
    mount $snap1

    View Slide

  46. How we run containers with firecracker-containerd
    orchestrator containerd snapshotter
    control
    plugin
    Firecracker
    runtime agent
    for each container
    container
    process
    for each container
    prepare snapshot prepare snapshot
    run container $c1 start process

    View Slide

  47. How we run containers with firecracker-containerd
    orchestrator containerd snapshotter
    control
    plugin
    Firecracker
    runtime agent
    for each container
    container
    process
    for each container
    prepare snapshot prepare snapshot
    container $c1 running
    container
    $c1 running

    View Slide

  48. How we run containers with firecracker-containerd
    orchestrator containerd snapshotter
    control
    plugin
    Firecracker
    runtime agent
    for each container
    container
    process
    for each container
    prepare snapshot prepare snapshot
    subscribe events
    for $c1

    View Slide

  49. How we run containers with firecracker-containerd
    orchestrator containerd snapshotter
    control
    plugin
    Firecracker
    runtime agent
    for each container
    container
    process
    for each container
    After some
    time, container
    process exits
    prepare snapshot prepare snapshot

    View Slide

  50. How we run containers with firecracker-containerd
    orchestrator containerd snapshotter
    control
    plugin
    Firecracker
    runtime agent
    for each container
    container
    process
    for each container
    After some
    time, container
    process exits
    prepare snapshot prepare snapshot
    container $c1 exited
    container $c1 exited
    container
    $c1 exited

    View Slide

  51. How we run containers with firecracker-containerd
    orchestrator containerd snapshotter
    control
    plugin
    Firecracker
    runtime agent
    for each container
    container
    process
    for each container
    After some
    time, container
    process exits
    prepare snapshot prepare snapshot
    stop VM $vm1 stop VM

    View Slide

  52. How we run containers with firecracker-containerd
    orchestrator containerd snapshotter
    control
    plugin
    Firecracker
    runtime agent
    for each container
    container
    process
    for each container
    After some
    time, container
    process exits
    prepare snapshot prepare snapshot

    View Slide

  53. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.

    View Slide

  54. Current status
    • Working toward production readiness
    • Features
    • Multiple containers per VM & a VM-management API
    • Networking with CNI plugins
    • Compatibility with existing container images
    • Active work
    • “Jailing” the Firecracker VMM
    • Still to come
    • Integration with Kubernetes Container Runtime Interface (CRI)
    • Removing or replacing the custom containerd build
    • Polishing (documentation, bug bash, releases)

    View Slide

  55. Kubernetes CRI
    • Need to automatically model VM lifecycle and container grouping
    • Most straightforward approach is to infer from the “pause” container
    • Need to determine the size of the VM
    • CRI doesn’t convey total pod resources at creation time
    • Need to determine a path for volumes
    • Kubernetes expects to mount volumes to containers with things like secrets
    • Need to determine if we can reuse cri-containerd
    • containerd’s CRI implementation can’t currently talk to our VM API

    View Slide

  56. firecracker-control plugin
    • firecracker-control plugin (VM API) needs a custom build
    • Needed to expose VM API over containerd’s domain socket
    • Want to eliminate this requirement
    • VM API is specific to Firecracker
    • Requires integrators to know details about Firecracker

    View Slide

  57. How to get involved
    GitHub: https://github.com/firecracker-microvm/firecracker-containerd
    Slack: See https://github.com/firecracker-microvm/firecracker for link
    Or come work with us!

    View Slide

  58. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.

    View Slide

  59. A brief note before we finish
    Session surveys provide valuable information to speakers
    Feedback that is very helpful
    • Topics you were excited to learn about
    • Suggestions for improving understanding and clarity
    Feedback that is extremely unhelpful
    • Comments unrelated to talk content (please refer to the AWS Live Events
    Code of Conduct)
    The “hallway track” is always open!
    Feedback and questions welcome ([email protected], @samuelkarp)
    For support, use the AWS forums or contact AWS Support

    View Slide

  60. Thank you!
    © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
    Samuel Karp
    [email protected]
    @samuelkarp

    View Slide

  61. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.

    View Slide