Upgrade to Pro — share decks privately, control downloads, hide ads and more …

The Enemy Within: Running untrusted code with gVisor

The Enemy Within: Running untrusted code with gVisor

Containers are a great way to deploy and isolate application resources but they can fall short when it comes to security isolation. How do you improve the security of a container while maintaining the flexible and dynamic resource usage of a container? There are many options for sandbox containers but which is right for you?

In this talk we will explore gVisor sandbox runtime in depth. gVisor is a unique open-source sandbox runtime that allows you to run unmodified applications in containers with a higher level of isolation and low overhead. It implements the OCI runtime specification and integrates well with containerd and Kubernetes. In this talk I will dive into the container security model and use cases for sandbox pods. I will discuss various approaches and their tradeoffs before diving into the architecture of gVisor and how it differs from virtual machine based sandboxes.

Ian Lewis

June 25, 2019
Tweet

More Decks by Ian Lewis

Other Decks in Technology

Transcript

  1. 3 gVisor • Running untrusted code • User uploaded code

    • Third-party code • Complex code/Complex user input • Code you wrote but you don't trust yourself…. So you want to run some code...
  2. 9 gVisor • Protects attackers from escaping the runtime environment

    • Code running in the sandbox is untrusted Container Sandboxes
  3. 10 gVisor • Goal of the sandbox is to reduce

    execution of trusted, privileged code (e.g. kernel code) • Achieved through abstraction/virtualization of host. • Don't want to expose the system to risk of any single bug ◦ Need two layers of isolation Sandbox Isolation
  4. 13 gVisor • Containers ◦ They aren't good security isolation

    boundaries ◦ Only one layer of isolation ◦ Any one bug in the host kernel could lead to a full host compromise • Unikernels ◦ Can't bring your own container (must be specially crafted) Containers & Unikernels are cool but...
  5. 17 gVisor • We want more container-like properties ◦ Flexible

    resource usage ▪ Don't want to assign full sets of memory or CPU to the sandbox ▪ Want to be able to reclaim memory if possible ◦ Quick X0ms startup time ▪ Don't want to have a lot of guest OS boot time. ◦ Easier maintenance and integration into container infrastructure VMs are cool but...
  6. 23 gVisor • Two layers of isolation • Uses the

    same principle of virtualization as VMs ◦ Virtualization at the OS; Linux Syscall layer • Reduces the host attack surface ◦ Calls to the host OS are controlled by the Sentry ◦ Most syscall logic handled by Sentry ◦ No syscalls are "passed through". Applications cannot pass arbitrary arguments to the host kernel. gVisor
  7. 26 gVisor gVisor Architecture KVM/ptrace Gofer Host Linux Kernel Sentry

    Sandbox User Kernel 9P runsc OCI Kubernetes seccomp + ns seccomp + ns
  8. 27 gVisor gVisor Architecture KVM/ptrace Gofer Host Linux Kernel Container

    Sentry Sandbox User Kernel 9P runsc OCI Kubernetes seccomp + ns seccomp + ns
  9. 28 gVisor gVisor Architecture KVM/ptrace Gofer Gofer Gofers Containers Containers

    Host Linux Kernel Containers Sentry Sandbox User Kernel 9P runsc OCI Kubernetes seccomp + ns seccomp + ns
  10. 29 gVisor • Two security layers • Minimal access to

    host ◦ No syscall is passed thru the host ◦ Limited host syscalls allowed ◦ User mode • Pure Go ◦ No cgo allowed • Unsafe code is carefully reviewed • Statically linked, few external dependencies • Trust nobody Design Principles
  11. 30 gVisor • Sentry is first layer of defense ◦

    Assume it will be compromised ◦ User mode • Pod cgroup • Namespaces • Terminal chroot • uid/gid: nobody ◦ Drop all capabilities • Seccomp ◦ # of syscalls is the wrong metric Defense in Depth
  12. 31 gVisor • Sentry is first layer of defense ◦

    Assume it will be compromised ◦ User mode • Pod cgroup • Namespaces • Terminal chroot • uid/gid: nobody ◦ Drop all capabilities • Seccomp ◦ # of syscalls is the wrong metric Defense in Depth: Sandbox Sandbox
  13. 32 gVisor • Sentry is first layer of defense ◦

    Assume it will be compromised ◦ User mode • Pod cgroup • Namespaces • Terminal chroot • uid/gid: nobody ◦ Drop all capabilities • Seccomp ◦ # of syscalls is the wrong metric Defense in Depth: Sandbox cgroup Sandbox
  14. 33 gVisor • Sentry is first layer of defense ◦

    Assume it will be compromised ◦ User mode • Pod cgroup • Namespaces • Terminal chroot • uid/gid: nobody ◦ Drop all capabilities • Seccomp ◦ # of syscalls is the wrong metric Defense in Depth: Sandbox cgroup namespace Sandbox
  15. 34 gVisor • Sentry is first layer of defense ◦

    Assume it will be compromised ◦ User mode • Pod cgroup • Namespaces • Terminal chroot • uid/gid: nobody ◦ Drop all capabilities • Seccomp ◦ # of syscalls is the wrong metric Defense in Depth: Sandbox cgroup namespace chroot Sandbox
  16. 35 gVisor • Sentry is first layer of defense ◦

    Assume it will be compromised ◦ User mode • Pod cgroup • Namespaces • Terminal chroot • uid/gid: nobody ◦ Drop all capabilities • Seccomp ◦ # of syscalls is the wrong metric Defense in Depth: Sandbox cgroup namespace chroot user / group / capabilities Sandbox
  17. 36 gVisor • Sentry is first layer of defense ◦

    Assume it will be compromised ◦ User mode • Pod cgroup • Namespaces • Terminal chroot • uid/gid: nobody ◦ Drop all capabilities • Seccomp ◦ # of syscalls is the wrong metric Defense in Depth: Sandbox cgroup namespace chroot user / group / capabilities seccomp Sandbox
  18. 37 gVisor • Isolated from user code • Pod cgroup

    • Caller’s user namespace • Chroot to rootfs ◦ Bind mounts • Runs as root ◦ Similar to “docker run” as root ◦ Drop non-FS capabilities • seccomp Defense In Depth: Gofer
  19. 38 gVisor • Isolated from user code • Pod cgroup

    • Caller’s user namespace • Chroot to rootfs ◦ Bind mounts • Runs as root ◦ Similar to “docker run” as root ◦ Drop non-FS capabilities • seccomp Defense In Depth: Gofer Gofer
  20. 39 gVisor • Isolated from user code • Pod cgroup

    • Caller’s user namespace • Chroot to rootfs ◦ Bind mounts • Runs as root ◦ Similar to “docker run” as root ◦ Drop non-FS capabilities • seccomp Defense In Depth: Gofer cgroup Gofer
  21. 40 gVisor • Isolated from user code • Pod cgroup

    • Caller’s user namespace • Chroot to rootfs ◦ Bind mounts • Runs as root ◦ Similar to “docker run” as root ◦ Drop non-FS capabilities • seccomp Defense In Depth: Gofer cgroup reduced namespaces Gofer
  22. 41 gVisor • Isolated from user code • Pod cgroup

    • Caller’s user namespace • Chroot to rootfs ◦ Bind mounts • Runs as root ◦ Similar to “docker run” as root ◦ Drop non-FS capabilities • seccomp Defense In Depth: Gofer cgroup reduced namespaces chroot Gofer
  23. 42 gVisor • Isolated from user code • Pod cgroup

    • Caller’s user namespace • Chroot to rootfs ◦ Bind mounts • Runs as root ◦ Similar to “docker run” as root ◦ Drop non-FS capabilities • seccomp Defense In Depth: Gofer cgroup reduced namespaces chroot reduced capabilities Gofer
  24. 43 gVisor • Isolated from user code • Pod cgroup

    • Caller’s user namespace • Chroot to rootfs ◦ Bind mounts • Runs as root ◦ Similar to “docker run” as root ◦ Drop non-FS capabilities • seccomp Defense In Depth: Gofer cgroup reduced namespaces chroot reduced capabilities seccomp Gofer
  25. 44 gVisor • Be aware of defaults ◦ K8s is

    optimized for ease-of-use, not security ◦ CPU/Memory/Disk limits • Network/Disk isolation ◦ Network access: Use NetworkPolicy ◦ Arbitrary packet injection: Sentry provides isolation ◦ File writes/permissions: Use read-only filesystems ◦ No throttling mechanism: use cgroups What's not protected?
  26. 45 gVisor • Integrated with RuntimeClass ◦ RuntimeClassName: gvisor •

    Minikube ◦ minikube addons enable gvisor ◦ github.com/kubernetes/minikube/tree/master/deploy/addons/gvisor • GKE SandboxBETA ◦ cloud.google.com/kubernetes-engine/sandbox • gvisor-containerd-shim ◦ github.com/google/gvisor-containerd-shim gVisor &