The Enemy Within: Running untrusted code with gVisor

The Enemy Within: Running untrusted code with gVisor

Containers are a great way to deploy and isolate application resources but they can fall short when it comes to security isolation. How do you improve the security of a container while maintaining the flexible and dynamic resource usage of a container? There are many options for sandbox containers but which is right for you?

In this talk we will explore gVisor sandbox runtime in depth. gVisor is a unique open-source sandbox runtime that allows you to run unmodified applications in containers with a higher level of isolation and low overhead. It implements the OCI runtime specification and integrates well with containerd and Kubernetes. In this talk I will dive into the container security model and use cases for sandbox pods. I will discuss various approaches and their tradeoffs before diving into the architecture of gVisor and how it differs from virtual machine based sandboxes.

01dc8e954957a10b428aa60b28c89d52?s=128

Ian Lewis

June 25, 2019
Tweet

Transcript

  1. 3.

    3 gVisor • Running untrusted code • User uploaded code

    • Third-party code • Complex code/Complex user input • Code you wrote but you don't trust yourself…. So you want to run some code...
  2. 9.

    9 gVisor • Protects attackers from escaping the runtime environment

    • Code running in the sandbox is untrusted Container Sandboxes
  3. 10.

    10 gVisor • Goal of the sandbox is to reduce

    execution of trusted, privileged code (e.g. kernel code) • Achieved through abstraction/virtualization of host. • Don't want to expose the system to risk of any single bug ◦ Need two layers of isolation Sandbox Isolation
  4. 13.

    13 gVisor • Containers ◦ They aren't good security isolation

    boundaries ◦ Only one layer of isolation ◦ Any one bug in the host kernel could lead to a full host compromise • Unikernels ◦ Can't bring your own container (must be specially crafted) Containers & Unikernels are cool but...
  5. 17.

    17 gVisor • We want more container-like properties ◦ Flexible

    resource usage ▪ Don't want to assign full sets of memory or CPU to the sandbox ▪ Want to be able to reclaim memory if possible ◦ Quick X0ms startup time ▪ Don't want to have a lot of guest OS boot time. ◦ Easier maintenance and integration into container infrastructure VMs are cool but...
  6. 18.
  7. 23.

    23 gVisor • Two layers of isolation • Uses the

    same principle of virtualization as VMs ◦ Virtualization at the OS; Linux Syscall layer • Reduces the host attack surface ◦ Calls to the host OS are controlled by the Sentry ◦ Most syscall logic handled by Sentry ◦ No syscalls are "passed through". Applications cannot pass arbitrary arguments to the host kernel. gVisor
  8. 25.
  9. 26.

    26 gVisor gVisor Architecture KVM/ptrace Gofer Host Linux Kernel Sentry

    Sandbox User Kernel 9P runsc OCI Kubernetes seccomp + ns seccomp + ns
  10. 27.

    27 gVisor gVisor Architecture KVM/ptrace Gofer Host Linux Kernel Container

    Sentry Sandbox User Kernel 9P runsc OCI Kubernetes seccomp + ns seccomp + ns
  11. 28.

    28 gVisor gVisor Architecture KVM/ptrace Gofer Gofer Gofers Containers Containers

    Host Linux Kernel Containers Sentry Sandbox User Kernel 9P runsc OCI Kubernetes seccomp + ns seccomp + ns
  12. 29.

    29 gVisor • Two security layers • Minimal access to

    host ◦ No syscall is passed thru the host ◦ Limited host syscalls allowed ◦ User mode • Pure Go ◦ No cgo allowed • Unsafe code is carefully reviewed • Statically linked, few external dependencies • Trust nobody Design Principles
  13. 30.

    30 gVisor • Sentry is first layer of defense ◦

    Assume it will be compromised ◦ User mode • Pod cgroup • Namespaces • Terminal chroot • uid/gid: nobody ◦ Drop all capabilities • Seccomp ◦ # of syscalls is the wrong metric Defense in Depth
  14. 31.

    31 gVisor • Sentry is first layer of defense ◦

    Assume it will be compromised ◦ User mode • Pod cgroup • Namespaces • Terminal chroot • uid/gid: nobody ◦ Drop all capabilities • Seccomp ◦ # of syscalls is the wrong metric Defense in Depth: Sandbox Sandbox
  15. 32.

    32 gVisor • Sentry is first layer of defense ◦

    Assume it will be compromised ◦ User mode • Pod cgroup • Namespaces • Terminal chroot • uid/gid: nobody ◦ Drop all capabilities • Seccomp ◦ # of syscalls is the wrong metric Defense in Depth: Sandbox cgroup Sandbox
  16. 33.

    33 gVisor • Sentry is first layer of defense ◦

    Assume it will be compromised ◦ User mode • Pod cgroup • Namespaces • Terminal chroot • uid/gid: nobody ◦ Drop all capabilities • Seccomp ◦ # of syscalls is the wrong metric Defense in Depth: Sandbox cgroup namespace Sandbox
  17. 34.

    34 gVisor • Sentry is first layer of defense ◦

    Assume it will be compromised ◦ User mode • Pod cgroup • Namespaces • Terminal chroot • uid/gid: nobody ◦ Drop all capabilities • Seccomp ◦ # of syscalls is the wrong metric Defense in Depth: Sandbox cgroup namespace chroot Sandbox
  18. 35.

    35 gVisor • Sentry is first layer of defense ◦

    Assume it will be compromised ◦ User mode • Pod cgroup • Namespaces • Terminal chroot • uid/gid: nobody ◦ Drop all capabilities • Seccomp ◦ # of syscalls is the wrong metric Defense in Depth: Sandbox cgroup namespace chroot user / group / capabilities Sandbox
  19. 36.

    36 gVisor • Sentry is first layer of defense ◦

    Assume it will be compromised ◦ User mode • Pod cgroup • Namespaces • Terminal chroot • uid/gid: nobody ◦ Drop all capabilities • Seccomp ◦ # of syscalls is the wrong metric Defense in Depth: Sandbox cgroup namespace chroot user / group / capabilities seccomp Sandbox
  20. 37.

    37 gVisor • Isolated from user code • Pod cgroup

    • Caller’s user namespace • Chroot to rootfs ◦ Bind mounts • Runs as root ◦ Similar to “docker run” as root ◦ Drop non-FS capabilities • seccomp Defense In Depth: Gofer
  21. 38.

    38 gVisor • Isolated from user code • Pod cgroup

    • Caller’s user namespace • Chroot to rootfs ◦ Bind mounts • Runs as root ◦ Similar to “docker run” as root ◦ Drop non-FS capabilities • seccomp Defense In Depth: Gofer Gofer
  22. 39.

    39 gVisor • Isolated from user code • Pod cgroup

    • Caller’s user namespace • Chroot to rootfs ◦ Bind mounts • Runs as root ◦ Similar to “docker run” as root ◦ Drop non-FS capabilities • seccomp Defense In Depth: Gofer cgroup Gofer
  23. 40.

    40 gVisor • Isolated from user code • Pod cgroup

    • Caller’s user namespace • Chroot to rootfs ◦ Bind mounts • Runs as root ◦ Similar to “docker run” as root ◦ Drop non-FS capabilities • seccomp Defense In Depth: Gofer cgroup reduced namespaces Gofer
  24. 41.

    41 gVisor • Isolated from user code • Pod cgroup

    • Caller’s user namespace • Chroot to rootfs ◦ Bind mounts • Runs as root ◦ Similar to “docker run” as root ◦ Drop non-FS capabilities • seccomp Defense In Depth: Gofer cgroup reduced namespaces chroot Gofer
  25. 42.

    42 gVisor • Isolated from user code • Pod cgroup

    • Caller’s user namespace • Chroot to rootfs ◦ Bind mounts • Runs as root ◦ Similar to “docker run” as root ◦ Drop non-FS capabilities • seccomp Defense In Depth: Gofer cgroup reduced namespaces chroot reduced capabilities Gofer
  26. 43.

    43 gVisor • Isolated from user code • Pod cgroup

    • Caller’s user namespace • Chroot to rootfs ◦ Bind mounts • Runs as root ◦ Similar to “docker run” as root ◦ Drop non-FS capabilities • seccomp Defense In Depth: Gofer cgroup reduced namespaces chroot reduced capabilities seccomp Gofer
  27. 44.

    44 gVisor • Be aware of defaults ◦ K8s is

    optimized for ease-of-use, not security ◦ CPU/Memory/Disk limits • Network/Disk isolation ◦ Network access: Use NetworkPolicy ◦ Arbitrary packet injection: Sentry provides isolation ◦ File writes/permissions: Use read-only filesystems ◦ No throttling mechanism: use cgroups What's not protected?
  28. 45.

    45 gVisor • Integrated with RuntimeClass ◦ RuntimeClassName: gvisor •

    Minikube ◦ minikube addons enable gvisor ◦ github.com/kubernetes/minikube/tree/master/deploy/addons/gvisor • GKE SandboxBETA ◦ cloud.google.com/kubernetes-engine/sandbox • gvisor-containerd-shim ◦ github.com/google/gvisor-containerd-shim gVisor &