Slide 1

Slide 1 text

No content

Slide 2

Slide 2 text

© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Deep dive into firecracker-containerd Samuel Karp C O N 4 0 8 - R Senior Software Development Engineer Amazon Web Services

Slide 3

Slide 3 text

Related breakouts CON409 Security and monitoring in a serverless world on AWS Fargate CON423 AWS Fargate under the hood

Slide 4

Slide 4 text

Agenda Linux containers, virtual machines, and isolation What is a container runtime? What is containerd? The Firecracker virtual machine monitor (VMM) Adapting containerd to Firecracker Current status and roadmap Q&A

Slide 5

Slide 5 text

A really brief overview of containers • A mechanism for running software • With some isolation • With some repeatability • With a standard format for distribution • With common tooling

Slide 6

Slide 6 text

Linux container primitives • Namespaces – Visibility restrictions • Control groups (cgroups) – Resource limits • Capabilities – Permission restrictions • Seccomp – Syscall allow/deny lists • Linux Security Modules – Resource access control • Union filesystems – Image layers

Slide 7

Slide 7 text

What don’t containers give you? • Independent kernel behavior (kernel tuning) • Security isolation from other containers

Slide 8

Slide 8 text

Containers and VMs Containers • Use Linux primitives to separate processes • Share a Linux kernel • Fast starts, minimal overhead • Flexible configuration Virtual Machines • Virtualize or emulate hardware components • Completely separate kernels (maybe not Linux) • Slower starts: must boot kernel and set up hardware Hardware Linux kernel namespaces cgroups ... Container Container Hardware Linux kernel KVM Virtual hardware Virtual hardware VM guest VM guest

Slide 9

Slide 9 text

Why use VMs? • Independent Linux kernel in each VM • Virtual machine monitor (VMM) is an additional isolation boundary • Interface between VM and VMM (hypercalls) is defined by the hypervisor • Hardware interfaces are standardized • Good for defining trust and resource boundaries • Isolating multi-tenant workloads • Isolating non-trusted workloads

Slide 10

Slide 10 text

What do we mean by isolation? • Prevent customers from affecting each other • Prevent customers from affecting the infrastructure • Defense in depth • Container security • seccomp • Linux security modules • Capabilities • Hypervisor • Emulation, virtualization, or pass-through

Slide 11

Slide 11 text

Common container tooling • Docker UX (docker build, docker run) • Images and registries for software distribution (docker push, docker pull) • Container orchestrators • Amazon Elastic Container Service (Amazon ECS) • Kubernetes • Mesos • Open Containers Initiative (OCI) • Image standard • Runtime standard • Distribution standard

Slide 12

Slide 12 text

© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.

Slide 13

Slide 13 text

How you run containers Part of stack Example components Cluster orchestrator Amazon ECS, Kubernetes, Mesos Local management Docker or containerd Container runtime runc or Firecracker

Slide 14

Slide 14 text

How you run containers Part of stack Example components Cluster orchestrator Amazon ECS, Kubernetes, Mesos Local management Docker or containerd Container runtime runc or Firecracker

Slide 15

Slide 15 text

Container runtimes • Mechanism for starting and managing container workloads • (Linux containers) Set up cgroups, namespaces, filesystems, capabilities, seccomp, etc. • OCI runtime specification • Command-line interface for setting up a container • On-disk “bundle” • Root filesystem • JSON file describing configuration • runc • Reference implementation • Split out from Docker

Slide 16

Slide 16 text

How you run containers Part of stack Example components Cluster orchestrator Amazon ECS, Kubernetes, Mesos Local management Docker or containerd Container runtime runc or Firecracker

Slide 17

Slide 17 text

containerd • Daemon for managing containers • Modular framework for container lifecycle workflows • Integrates with OCI runtimes and containerd v2 runtimes

Slide 18

Slide 18 text

The containerd stack • gRPC API and services • Storage services • Content store • Snapshotters • Runtime (OCI/runc, v2) gRPC Metrics Storage Content Snapshot Diff Metadata Images Containers Tasks Events Runtimes

Slide 19

Slide 19 text

The containerd stack • gRPC API and services • Storage services • Content store • Snapshotters • Runtime (OCI/runc, v2) gRPC Metrics Storage Content Snapshot Diff Metadata Images Containers Tasks Events Runtimes

Slide 20

Slide 20 text

The containerd stack • gRPC API and services • Storage services • Content store • Snapshotters • Runtime (OCI/runc, v2) gRPC Metrics Storage Content Snapshot Diff Metadata Images Containers Tasks Events Runtimes

Slide 21

Slide 21 text

The containerd stack • gRPC API and services • Storage services • Content store • Snapshotters • Runtime (OCI/runc, v2) gRPC Metrics Storage Content Snapshot Diff Metadata Images Containers Tasks Events Runtimes

Slide 22

Slide 22 text

© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.

Slide 23

Slide 23 text

Firecracker virtual machine monitor (VMM) • KVM-based VMM in Rust • Open source • Targeted at serverless workloads • Not a general-purpose VMM

Slide 24

Slide 24 text

Firecracker design goals Security • Very limited device model • Very limited feature set • Eliminate guest interactions with host kernel • Sandbox/jail the VMM • Memory-safe programming language • Single VM per Firecracker process Efficiency • Fast boot time • Low memory and CPU overhead • API driven

Slide 25

Slide 25 text

firecracker-containerd goals Containers • Compatible images • Familiar tooling • Support existing workflows • Allow composition of containers • Integrate with orchestrators • Minimal additional overhead Security • Hypervisor-based isolation • Limited access to the host

Slide 26

Slide 26 text

Adapting containers to Firecracker Firecracker VMM considerations • No filesystem sharing • No dynamic device attachments • Limited networking options (tap, not veth) • Cross-boundary communication with vsock Adapting to containerd • Block-device snapshotter • API to manage VMM lifecycle • Split the “shim” into two parts • “Runtime” on the host • “Agent” inside the VM • …that runs containers via runc • Network • tap device • Usable with Container Network Interface (CNI) plugins

Slide 27

Slide 27 text

firecracker-containerd architecture microVM containerd runc Content store Disk Kernel Internal FC agent Container FC control plugin Block-device snapshotter Container root fs Firecracker VMM FC runtime

Slide 28

Slide 28 text

firecracker-containerd architecture microVM containerd runc Content store Disk Kernel Internal FC agent Container FC control plugin Block-device snapshotter Container root fs Firecracker VMM FC runtime

Slide 29

Slide 29 text

What is a “block device” snapshotter? • Store the container image layers • Manage writable space for each container • Compose container image layers and writable space (snapshot) • Treat each snapshot as a device to attach to the VM • Inside the VM, mount the device to expose its filesystem

Slide 30

Slide 30 text

firecracker-containerd architecture microVM containerd runc Content store Disk Kernel Internal FC agent Container FC control plugin Block-device snapshotter Container root fs Firecracker VMM FC runtime

Slide 31

Slide 31 text

What does the “firecracker-control” plugin do? • First-class VM construct and API for Firecracker • Specify VM-related parameters like the kernel and VM root filesystem • Allocate and manage VM resources: block devices, network interfaces, etc. • Manages the VM lifecycle • Compiled-in plugin • gRPC API over the same socket • Specific to Firecracker for now • Looking for a better generic solution

Slide 32

Slide 32 text

firecracker-containerd architecture microVM containerd runc Content store Disk Kernel Internal FC agent Container FC control plugin Block-device snapshotter Container root fs Firecracker VMM FC runtime

Slide 33

Slide 33 text

What does the “runtime” do? • Proxies container management commands to agent inside VM • Proxies I/O streams • Proxies events and metrics from inside the VM back out to containerd • Built on containerd’s V2 API • ttrpc API for communication • Many containers per VM, one runtime

Slide 34

Slide 34 text

firecracker-containerd architecture microVM containerd runc Content store Disk Kernel Internal FC agent Container FC control plugin Block-device snapshotter Container root fs Firecracker VMM FC runtime

Slide 35

Slide 35 text

What does the “agent” do? • Manages the lifecycle of the containers inside the VM • Communicates over vsock • Receives container management commands from runtime • Proxies I/O streams • Proxies events and metrics from inside the VM back out to the runtime • Associates each container with the appropriate block device • Mounts container filesystems • Uses runc to set up cgroups, namespaces, etc. • Looks like a standard Linux container to the workload inside

Slide 36

Slide 36 text

How we run containers with firecracker-containerd orchestrator containerd snapshotter control plugin Firecracker runtime agent for each container container process for each container

Slide 37

Slide 37 text

How we run containers with firecracker-containerd orchestrator containerd snapshotter control plugin Firecracker runtime agent for each container container process for each container

Slide 38

Slide 38 text

How we run containers with firecracker-containerd orchestrator containerd snapshotter control plugin Firecracker runtime agent for each container container process for each container prepare snapshot prepare snapshot

Slide 39

Slide 39 text

How we run containers with firecracker-containerd orchestrator containerd snapshotter control plugin Firecracker runtime agent for each container container process for each container prepare snapshot prepare snapshot snapshot $snap1 snapshot $snap1

Slide 40

Slide 40 text

How we run containers with firecracker-containerd orchestrator containerd snapshotter control plugin Firecracker runtime agent for each container container process for each container prepare snapshot prepare snapshot create VM start start & boot

Slide 41

Slide 41 text

How we run containers with firecracker-containerd orchestrator containerd snapshotter control plugin Firecracker runtime agent for each container container process for each container prepare snapshot prepare snapshot VM $vm1

Slide 42

Slide 42 text

How we run containers with firecracker-containerd orchestrator containerd snapshotter control plugin Firecracker runtime agent for each container container process for each container prepare snapshot prepare snapshot

Slide 43

Slide 43 text

How we run containers with firecracker-containerd orchestrator containerd snapshotter control plugin Firecracker runtime agent for each container container process for each container prepare snapshot prepare snapshot run container $c1 from $snap1 run container $c1 from $snap1

Slide 44

Slide 44 text

How we run containers with firecracker-containerd orchestrator containerd snapshotter control plugin Firecracker runtime agent for each container container process for each container prepare snapshot prepare snapshot attach $snap1

Slide 45

Slide 45 text

How we run containers with firecracker-containerd orchestrator containerd snapshotter control plugin Firecracker runtime agent for each container container process for each container prepare snapshot prepare snapshot mount $snap1

Slide 46

Slide 46 text

How we run containers with firecracker-containerd orchestrator containerd snapshotter control plugin Firecracker runtime agent for each container container process for each container prepare snapshot prepare snapshot run container $c1 start process

Slide 47

Slide 47 text

How we run containers with firecracker-containerd orchestrator containerd snapshotter control plugin Firecracker runtime agent for each container container process for each container prepare snapshot prepare snapshot container $c1 running container $c1 running

Slide 48

Slide 48 text

How we run containers with firecracker-containerd orchestrator containerd snapshotter control plugin Firecracker runtime agent for each container container process for each container prepare snapshot prepare snapshot subscribe events for $c1

Slide 49

Slide 49 text

How we run containers with firecracker-containerd orchestrator containerd snapshotter control plugin Firecracker runtime agent for each container container process for each container After some time, container process exits prepare snapshot prepare snapshot

Slide 50

Slide 50 text

How we run containers with firecracker-containerd orchestrator containerd snapshotter control plugin Firecracker runtime agent for each container container process for each container After some time, container process exits prepare snapshot prepare snapshot container $c1 exited container $c1 exited container $c1 exited

Slide 51

Slide 51 text

How we run containers with firecracker-containerd orchestrator containerd snapshotter control plugin Firecracker runtime agent for each container container process for each container After some time, container process exits prepare snapshot prepare snapshot stop VM $vm1 stop VM

Slide 52

Slide 52 text

How we run containers with firecracker-containerd orchestrator containerd snapshotter control plugin Firecracker runtime agent for each container container process for each container After some time, container process exits prepare snapshot prepare snapshot

Slide 53

Slide 53 text

© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.

Slide 54

Slide 54 text

Current status • Working toward production readiness • Features • Multiple containers per VM & a VM-management API • Networking with CNI plugins • Compatibility with existing container images • Active work • “Jailing” the Firecracker VMM • Still to come • Integration with Kubernetes Container Runtime Interface (CRI) • Removing or replacing the custom containerd build • Polishing (documentation, bug bash, releases)

Slide 55

Slide 55 text

Kubernetes CRI • Need to automatically model VM lifecycle and container grouping • Most straightforward approach is to infer from the “pause” container • Need to determine the size of the VM • CRI doesn’t convey total pod resources at creation time • Need to determine a path for volumes • Kubernetes expects to mount volumes to containers with things like secrets • Need to determine if we can reuse cri-containerd • containerd’s CRI implementation can’t currently talk to our VM API

Slide 56

Slide 56 text

firecracker-control plugin • firecracker-control plugin (VM API) needs a custom build • Needed to expose VM API over containerd’s domain socket • Want to eliminate this requirement • VM API is specific to Firecracker • Requires integrators to know details about Firecracker

Slide 57

Slide 57 text

How to get involved GitHub: https://github.com/firecracker-microvm/firecracker-containerd Slack: See https://github.com/firecracker-microvm/firecracker for link Or come work with us!

Slide 58

Slide 58 text

© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.

Slide 59

Slide 59 text

A brief note before we finish Session surveys provide valuable information to speakers Feedback that is very helpful • Topics you were excited to learn about • Suggestions for improving understanding and clarity Feedback that is extremely unhelpful • Comments unrelated to talk content (please refer to the AWS Live Events Code of Conduct) The “hallway track” is always open! Feedback and questions welcome ([email protected], @samuelkarp) For support, use the AWS forums or contact AWS Support

Slide 60

Slide 60 text

Thank you! © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Samuel Karp [email protected] @samuelkarp

Slide 61

Slide 61 text

© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.