The Enemy Within: Running untrusted code with gVisor

The Enemy Within: Running untrusted code with gVisor

Containers are a great way to deploy and isolate application resources but they can fall short when it comes to security isolation. How do you improve the security of a container while maintaining the flexible and dynamic resource usage of a container? There are many options for sandbox containers but which is right for you?

In this talk we will explore gVisor sandbox runtime in depth. gVisor is a unique open-source sandbox runtime that allows you to run unmodified applications in containers with a higher level of isolation and low overhead. It implements the OCI runtime specification and integrates well with containerd and Kubernetes. In this talk I will dive into the container security model and use cases for sandbox pods. I will discuss various approaches and their tradeoffs before diving into the architecture of gVisor and how it differs from virtual machine based sandboxes.

01dc8e954957a10b428aa60b28c89d52?s=128

Ian Lewis

June 25, 2019
Tweet

Transcript

  1. Ian Lewis Developer Advocate, Google Cloud Platform The Enemy Within

    Running Untrusted Code with gVisor
  2. 2 gVisor Ian Lewis (@IanMLewis) Developer Advocate, Google

  3. 3 gVisor • Running untrusted code • User uploaded code

    • Third-party code • Complex code/Complex user input • Code you wrote but you don't trust yourself…. So you want to run some code...
  4. 4 gVisor • SaaS/Serverless • Video/Image transcoding • Machine learning

    Use Cases
  5. 5 gVisor Too much privileged code Application Host Kernel

  6. 6 gVisor Too much privileged code Application Host Kernel open("/path/to/file",

    O_RDWR)
  7. 7 gVisor Too much privileged code Application Host Kernel

  8. 8 gVisor Too much privileged code Application Host Kernel file

    descriptor
  9. 9 gVisor • Protects attackers from escaping the runtime environment

    • Code running in the sandbox is untrusted Container Sandboxes
  10. 10 gVisor • Goal of the sandbox is to reduce

    execution of trusted, privileged code (e.g. kernel code) • Achieved through abstraction/virtualization of host. • Don't want to expose the system to risk of any single bug ◦ Need two layers of isolation Sandbox Isolation
  11. 11 gVisor OS-Level Virtualization (containers) Application Host Kernel Namespace

  12. 12 gVisor Unikernels Application Host Kernel Guest OS Hypervisor

  13. 13 gVisor • Containers ◦ They aren't good security isolation

    boundaries ◦ Only one layer of isolation ◦ Any one bug in the host kernel could lead to a full host compromise • Unikernels ◦ Can't bring your own container (must be specially crafted) Containers & Unikernels are cool but...
  14. 14 gVisor Virtual Machines Application Host Kernel Guest OS Hypervisor

    Hardware
  15. 15 gVisor (Type 2) Virtual Machines Application Host Kernel Guest

    OS Hypervisor Hardware
  16. 16 gVisor (Type 1) Virtual Machines Application Host Kernel Guest

    OS Hypervisor Hardware
  17. 17 gVisor • We want more container-like properties ◦ Flexible

    resource usage ▪ Don't want to assign full sets of memory or CPU to the sandbox ▪ Want to be able to reclaim memory if possible ◦ Quick X0ms startup time ▪ Don't want to have a lot of guest OS boot time. ◦ Easier maintenance and integration into container infrastructure VMs are cool but...
  18. 18 gVisor

  19. 19 gVisor Virtual Machines Application OS Virtualized Hardware

  20. 20 gVisor gVisor Virtualization Application Virtualized OS

  21. 21 gVisor gVisor: Two Layers of Isolation Application Guest OS

    (Sentry) Host Kernel Namespace
  22. 22 gVisor gVisor: Two Layers of Isolation Application Guest OS

    (Sentry) Host Kernel Namespace
  23. 23 gVisor • Two layers of isolation • Uses the

    same principle of virtualization as VMs ◦ Virtualization at the OS; Linux Syscall layer • Reduces the host attack surface ◦ Calls to the host OS are controlled by the Sentry ◦ Most syscall logic handled by Sentry ◦ No syscalls are "passed through". Applications cannot pass arbitrary arguments to the host kernel. gVisor
  24. 24 gVisor gVisor Architecture Host Linux Kernel User Kernel runsc

    OCI Kubernetes
  25. 25 gVisor gVisor Architecture KVM/ptrace Gofer Host Linux Kernel Sentry

    Sandbox User Kernel 9P runsc OCI Kubernetes
  26. 26 gVisor gVisor Architecture KVM/ptrace Gofer Host Linux Kernel Sentry

    Sandbox User Kernel 9P runsc OCI Kubernetes seccomp + ns seccomp + ns
  27. 27 gVisor gVisor Architecture KVM/ptrace Gofer Host Linux Kernel Container

    Sentry Sandbox User Kernel 9P runsc OCI Kubernetes seccomp + ns seccomp + ns
  28. 28 gVisor gVisor Architecture KVM/ptrace Gofer Gofer Gofers Containers Containers

    Host Linux Kernel Containers Sentry Sandbox User Kernel 9P runsc OCI Kubernetes seccomp + ns seccomp + ns
  29. 29 gVisor • Two security layers • Minimal access to

    host ◦ No syscall is passed thru the host ◦ Limited host syscalls allowed ◦ User mode • Pure Go ◦ No cgo allowed • Unsafe code is carefully reviewed • Statically linked, few external dependencies • Trust nobody Design Principles
  30. 30 gVisor • Sentry is first layer of defense ◦

    Assume it will be compromised ◦ User mode • Pod cgroup • Namespaces • Terminal chroot • uid/gid: nobody ◦ Drop all capabilities • Seccomp ◦ # of syscalls is the wrong metric Defense in Depth
  31. 31 gVisor • Sentry is first layer of defense ◦

    Assume it will be compromised ◦ User mode • Pod cgroup • Namespaces • Terminal chroot • uid/gid: nobody ◦ Drop all capabilities • Seccomp ◦ # of syscalls is the wrong metric Defense in Depth: Sandbox Sandbox
  32. 32 gVisor • Sentry is first layer of defense ◦

    Assume it will be compromised ◦ User mode • Pod cgroup • Namespaces • Terminal chroot • uid/gid: nobody ◦ Drop all capabilities • Seccomp ◦ # of syscalls is the wrong metric Defense in Depth: Sandbox cgroup Sandbox
  33. 33 gVisor • Sentry is first layer of defense ◦

    Assume it will be compromised ◦ User mode • Pod cgroup • Namespaces • Terminal chroot • uid/gid: nobody ◦ Drop all capabilities • Seccomp ◦ # of syscalls is the wrong metric Defense in Depth: Sandbox cgroup namespace Sandbox
  34. 34 gVisor • Sentry is first layer of defense ◦

    Assume it will be compromised ◦ User mode • Pod cgroup • Namespaces • Terminal chroot • uid/gid: nobody ◦ Drop all capabilities • Seccomp ◦ # of syscalls is the wrong metric Defense in Depth: Sandbox cgroup namespace chroot Sandbox
  35. 35 gVisor • Sentry is first layer of defense ◦

    Assume it will be compromised ◦ User mode • Pod cgroup • Namespaces • Terminal chroot • uid/gid: nobody ◦ Drop all capabilities • Seccomp ◦ # of syscalls is the wrong metric Defense in Depth: Sandbox cgroup namespace chroot user / group / capabilities Sandbox
  36. 36 gVisor • Sentry is first layer of defense ◦

    Assume it will be compromised ◦ User mode • Pod cgroup • Namespaces • Terminal chroot • uid/gid: nobody ◦ Drop all capabilities • Seccomp ◦ # of syscalls is the wrong metric Defense in Depth: Sandbox cgroup namespace chroot user / group / capabilities seccomp Sandbox
  37. 37 gVisor • Isolated from user code • Pod cgroup

    • Caller’s user namespace • Chroot to rootfs ◦ Bind mounts • Runs as root ◦ Similar to “docker run” as root ◦ Drop non-FS capabilities • seccomp Defense In Depth: Gofer
  38. 38 gVisor • Isolated from user code • Pod cgroup

    • Caller’s user namespace • Chroot to rootfs ◦ Bind mounts • Runs as root ◦ Similar to “docker run” as root ◦ Drop non-FS capabilities • seccomp Defense In Depth: Gofer Gofer
  39. 39 gVisor • Isolated from user code • Pod cgroup

    • Caller’s user namespace • Chroot to rootfs ◦ Bind mounts • Runs as root ◦ Similar to “docker run” as root ◦ Drop non-FS capabilities • seccomp Defense In Depth: Gofer cgroup Gofer
  40. 40 gVisor • Isolated from user code • Pod cgroup

    • Caller’s user namespace • Chroot to rootfs ◦ Bind mounts • Runs as root ◦ Similar to “docker run” as root ◦ Drop non-FS capabilities • seccomp Defense In Depth: Gofer cgroup reduced namespaces Gofer
  41. 41 gVisor • Isolated from user code • Pod cgroup

    • Caller’s user namespace • Chroot to rootfs ◦ Bind mounts • Runs as root ◦ Similar to “docker run” as root ◦ Drop non-FS capabilities • seccomp Defense In Depth: Gofer cgroup reduced namespaces chroot Gofer
  42. 42 gVisor • Isolated from user code • Pod cgroup

    • Caller’s user namespace • Chroot to rootfs ◦ Bind mounts • Runs as root ◦ Similar to “docker run” as root ◦ Drop non-FS capabilities • seccomp Defense In Depth: Gofer cgroup reduced namespaces chroot reduced capabilities Gofer
  43. 43 gVisor • Isolated from user code • Pod cgroup

    • Caller’s user namespace • Chroot to rootfs ◦ Bind mounts • Runs as root ◦ Similar to “docker run” as root ◦ Drop non-FS capabilities • seccomp Defense In Depth: Gofer cgroup reduced namespaces chroot reduced capabilities seccomp Gofer
  44. 44 gVisor • Be aware of defaults ◦ K8s is

    optimized for ease-of-use, not security ◦ CPU/Memory/Disk limits • Network/Disk isolation ◦ Network access: Use NetworkPolicy ◦ Arbitrary packet injection: Sentry provides isolation ◦ File writes/permissions: Use read-only filesystems ◦ No throttling mechanism: use cgroups What's not protected?
  45. 45 gVisor • Integrated with RuntimeClass ◦ RuntimeClassName: gvisor •

    Minikube ◦ minikube addons enable gvisor ◦ github.com/kubernetes/minikube/tree/master/deploy/addons/gvisor • GKE SandboxBETA ◦ cloud.google.com/kubernetes-engine/sandbox • gvisor-containerd-shim ◦ github.com/google/gvisor-containerd-shim gVisor &
  46. 46 gVisor • https://gvisor.dev/ • https://github.com/google/gvisor • Gitter: https://gitter.im/gvisor/community •

    Mailing lists: gvisor-users, gvisor-dev gVisor is Open Source & Thanks!