Slide 1

Slide 1 text

Ian Lewis Developer Advocate, Google Cloud Platform The Enemy Within Running Untrusted Code with gVisor

Slide 2

Slide 2 text

2 gVisor Ian Lewis (@IanMLewis) Developer Advocate, Google

Slide 3

Slide 3 text

3 gVisor ● Running untrusted code ● User uploaded code ● Third-party code ● Complex code/Complex user input ● Code you wrote but you don't trust yourself…. So you want to run some code...

Slide 4

Slide 4 text

4 gVisor ● SaaS/Serverless ● Video/Image transcoding ● Machine learning Use Cases

Slide 5

Slide 5 text

5 gVisor Too much privileged code Application Host Kernel

Slide 6

Slide 6 text

6 gVisor Too much privileged code Application Host Kernel open("/path/to/file", O_RDWR)

Slide 7

Slide 7 text

7 gVisor Too much privileged code Application Host Kernel

Slide 8

Slide 8 text

8 gVisor Too much privileged code Application Host Kernel file descriptor

Slide 9

Slide 9 text

9 gVisor ● Protects attackers from escaping the runtime environment ● Code running in the sandbox is untrusted Container Sandboxes

Slide 10

Slide 10 text

10 gVisor ● Goal of the sandbox is to reduce execution of trusted, privileged code (e.g. kernel code) ● Achieved through abstraction/virtualization of host. ● Don't want to expose the system to risk of any single bug ○ Need two layers of isolation Sandbox Isolation

Slide 11

Slide 11 text

11 gVisor OS-Level Virtualization (containers) Application Host Kernel Namespace

Slide 12

Slide 12 text

12 gVisor Unikernels Application Host Kernel Guest OS Hypervisor

Slide 13

Slide 13 text

13 gVisor ● Containers ○ They aren't good security isolation boundaries ○ Only one layer of isolation ○ Any one bug in the host kernel could lead to a full host compromise ● Unikernels ○ Can't bring your own container (must be specially crafted) Containers & Unikernels are cool but...

Slide 14

Slide 14 text

14 gVisor Virtual Machines Application Host Kernel Guest OS Hypervisor Hardware

Slide 15

Slide 15 text

15 gVisor (Type 2) Virtual Machines Application Host Kernel Guest OS Hypervisor Hardware

Slide 16

Slide 16 text

16 gVisor (Type 1) Virtual Machines Application Host Kernel Guest OS Hypervisor Hardware

Slide 17

Slide 17 text

17 gVisor ● We want more container-like properties ○ Flexible resource usage ■ Don't want to assign full sets of memory or CPU to the sandbox ■ Want to be able to reclaim memory if possible ○ Quick X0ms startup time ■ Don't want to have a lot of guest OS boot time. ○ Easier maintenance and integration into container infrastructure VMs are cool but...

Slide 18

Slide 18 text

18 gVisor

Slide 19

Slide 19 text

19 gVisor Virtual Machines Application OS Virtualized Hardware

Slide 20

Slide 20 text

20 gVisor gVisor Virtualization Application Virtualized OS

Slide 21

Slide 21 text

21 gVisor gVisor: Two Layers of Isolation Application Guest OS (Sentry) Host Kernel Namespace

Slide 22

Slide 22 text

22 gVisor gVisor: Two Layers of Isolation Application Guest OS (Sentry) Host Kernel Namespace

Slide 23

Slide 23 text

23 gVisor ● Two layers of isolation ● Uses the same principle of virtualization as VMs ○ Virtualization at the OS; Linux Syscall layer ● Reduces the host attack surface ○ Calls to the host OS are controlled by the Sentry ○ Most syscall logic handled by Sentry ○ No syscalls are "passed through". Applications cannot pass arbitrary arguments to the host kernel. gVisor

Slide 24

Slide 24 text

24 gVisor gVisor Architecture Host Linux Kernel User Kernel runsc OCI Kubernetes

Slide 25

Slide 25 text

25 gVisor gVisor Architecture KVM/ptrace Gofer Host Linux Kernel Sentry Sandbox User Kernel 9P runsc OCI Kubernetes

Slide 26

Slide 26 text

26 gVisor gVisor Architecture KVM/ptrace Gofer Host Linux Kernel Sentry Sandbox User Kernel 9P runsc OCI Kubernetes seccomp + ns seccomp + ns

Slide 27

Slide 27 text

27 gVisor gVisor Architecture KVM/ptrace Gofer Host Linux Kernel Container Sentry Sandbox User Kernel 9P runsc OCI Kubernetes seccomp + ns seccomp + ns

Slide 28

Slide 28 text

28 gVisor gVisor Architecture KVM/ptrace Gofer Gofer Gofers Containers Containers Host Linux Kernel Containers Sentry Sandbox User Kernel 9P runsc OCI Kubernetes seccomp + ns seccomp + ns

Slide 29

Slide 29 text

29 gVisor ● Two security layers ● Minimal access to host ○ No syscall is passed thru the host ○ Limited host syscalls allowed ○ User mode ● Pure Go ○ No cgo allowed ● Unsafe code is carefully reviewed ● Statically linked, few external dependencies ● Trust nobody Design Principles

Slide 30

Slide 30 text

30 gVisor ● Sentry is first layer of defense ○ Assume it will be compromised ○ User mode ● Pod cgroup ● Namespaces ● Terminal chroot ● uid/gid: nobody ○ Drop all capabilities ● Seccomp ○ # of syscalls is the wrong metric Defense in Depth

Slide 31

Slide 31 text

31 gVisor ● Sentry is first layer of defense ○ Assume it will be compromised ○ User mode ● Pod cgroup ● Namespaces ● Terminal chroot ● uid/gid: nobody ○ Drop all capabilities ● Seccomp ○ # of syscalls is the wrong metric Defense in Depth: Sandbox Sandbox

Slide 32

Slide 32 text

32 gVisor ● Sentry is first layer of defense ○ Assume it will be compromised ○ User mode ● Pod cgroup ● Namespaces ● Terminal chroot ● uid/gid: nobody ○ Drop all capabilities ● Seccomp ○ # of syscalls is the wrong metric Defense in Depth: Sandbox cgroup Sandbox

Slide 33

Slide 33 text

33 gVisor ● Sentry is first layer of defense ○ Assume it will be compromised ○ User mode ● Pod cgroup ● Namespaces ● Terminal chroot ● uid/gid: nobody ○ Drop all capabilities ● Seccomp ○ # of syscalls is the wrong metric Defense in Depth: Sandbox cgroup namespace Sandbox

Slide 34

Slide 34 text

34 gVisor ● Sentry is first layer of defense ○ Assume it will be compromised ○ User mode ● Pod cgroup ● Namespaces ● Terminal chroot ● uid/gid: nobody ○ Drop all capabilities ● Seccomp ○ # of syscalls is the wrong metric Defense in Depth: Sandbox cgroup namespace chroot Sandbox

Slide 35

Slide 35 text

35 gVisor ● Sentry is first layer of defense ○ Assume it will be compromised ○ User mode ● Pod cgroup ● Namespaces ● Terminal chroot ● uid/gid: nobody ○ Drop all capabilities ● Seccomp ○ # of syscalls is the wrong metric Defense in Depth: Sandbox cgroup namespace chroot user / group / capabilities Sandbox

Slide 36

Slide 36 text

36 gVisor ● Sentry is first layer of defense ○ Assume it will be compromised ○ User mode ● Pod cgroup ● Namespaces ● Terminal chroot ● uid/gid: nobody ○ Drop all capabilities ● Seccomp ○ # of syscalls is the wrong metric Defense in Depth: Sandbox cgroup namespace chroot user / group / capabilities seccomp Sandbox

Slide 37

Slide 37 text

37 gVisor ● Isolated from user code ● Pod cgroup ● Caller’s user namespace ● Chroot to rootfs ○ Bind mounts ● Runs as root ○ Similar to “docker run” as root ○ Drop non-FS capabilities ● seccomp Defense In Depth: Gofer

Slide 38

Slide 38 text

38 gVisor ● Isolated from user code ● Pod cgroup ● Caller’s user namespace ● Chroot to rootfs ○ Bind mounts ● Runs as root ○ Similar to “docker run” as root ○ Drop non-FS capabilities ● seccomp Defense In Depth: Gofer Gofer

Slide 39

Slide 39 text

39 gVisor ● Isolated from user code ● Pod cgroup ● Caller’s user namespace ● Chroot to rootfs ○ Bind mounts ● Runs as root ○ Similar to “docker run” as root ○ Drop non-FS capabilities ● seccomp Defense In Depth: Gofer cgroup Gofer

Slide 40

Slide 40 text

40 gVisor ● Isolated from user code ● Pod cgroup ● Caller’s user namespace ● Chroot to rootfs ○ Bind mounts ● Runs as root ○ Similar to “docker run” as root ○ Drop non-FS capabilities ● seccomp Defense In Depth: Gofer cgroup reduced namespaces Gofer

Slide 41

Slide 41 text

41 gVisor ● Isolated from user code ● Pod cgroup ● Caller’s user namespace ● Chroot to rootfs ○ Bind mounts ● Runs as root ○ Similar to “docker run” as root ○ Drop non-FS capabilities ● seccomp Defense In Depth: Gofer cgroup reduced namespaces chroot Gofer

Slide 42

Slide 42 text

42 gVisor ● Isolated from user code ● Pod cgroup ● Caller’s user namespace ● Chroot to rootfs ○ Bind mounts ● Runs as root ○ Similar to “docker run” as root ○ Drop non-FS capabilities ● seccomp Defense In Depth: Gofer cgroup reduced namespaces chroot reduced capabilities Gofer

Slide 43

Slide 43 text

43 gVisor ● Isolated from user code ● Pod cgroup ● Caller’s user namespace ● Chroot to rootfs ○ Bind mounts ● Runs as root ○ Similar to “docker run” as root ○ Drop non-FS capabilities ● seccomp Defense In Depth: Gofer cgroup reduced namespaces chroot reduced capabilities seccomp Gofer

Slide 44

Slide 44 text

44 gVisor ● Be aware of defaults ○ K8s is optimized for ease-of-use, not security ○ CPU/Memory/Disk limits ● Network/Disk isolation ○ Network access: Use NetworkPolicy ○ Arbitrary packet injection: Sentry provides isolation ○ File writes/permissions: Use read-only filesystems ○ No throttling mechanism: use cgroups What's not protected?

Slide 45

Slide 45 text

45 gVisor ● Integrated with RuntimeClass ○ RuntimeClassName: gvisor ● Minikube ○ minikube addons enable gvisor ○ github.com/kubernetes/minikube/tree/master/deploy/addons/gvisor ● GKE SandboxBETA ○ cloud.google.com/kubernetes-engine/sandbox ● gvisor-containerd-shim ○ github.com/google/gvisor-containerd-shim gVisor &

Slide 46

Slide 46 text

46 gVisor • https://gvisor.dev/ • https://github.com/google/gvisor • Gitter: https://gitter.im/gvisor/community • Mailing lists: gvisor-users, gvisor-dev gVisor is Open Source & Thanks!