Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Container security

Alex Khaerov
September 24, 2019

Container security

Alex Khaerov

September 24, 2019
Tweet

More Decks by Alex Khaerov

Other Decks in Programming

Transcript

  1. @hayorov Alex Khaerov company who I am Development Lead doing

    software development in the recent decade junior speaker - Python, Kubernetes committee member (Moscow Python, Helm Summit) a huge fan of laptop stickers and a cyclist @hayorov
  2. @hayorov Chainstack multi-cloud and multi-blockchain platform as a service based

    in Singapore # and hiring Alex Khaerov company who I am Development Lead doing software development in the recent decade junior speaker - Python, Kubernetes committee member (Moscow Python, Helm Summit) a huge fan of laptop stickers and a cyclist @hayorov
  3. @hayorov I am NOT - Linux kernel developer; - Security

    researcher; - DevSecOps. I am a typical customer of containers.
  4. @hayorov cs-gcp-apac-1 Company A S P hub.docker.com I I I

    from anywhere Ext Ext Ext Company B S P Company C S P
  5. @hayorov cs-gcp-apac-1 Company A S P hub.docker.com I I I

    from anywhere Ext Ext Ext Company B S P Company C S P
  6. @hayorov cs-gcp-apac-1 Company A S P hub.docker.com I I I

    from anywhere Ext Ext Ext Company B S P Company C S P Ext
  7. @hayorov cs-gcp-apac-1 Company A S P hub.docker.com I I I

    from anywhere Ext Ext Ext Company B S P Company C S P Ext
  8. @hayorov cs-gcp-apac-1 Company A S P hub.docker.com I I I

    from anywhere Ext Ext Ext Company B S P Company C S P Ext Ext
  9. @hayorov cs-gcp-apac-1 Company A S P hub.docker.com I I I

    from anywhere Ext Ext Ext Company B S P Company C S P Ext Ext S
  10. @hayorov cs-gcp-apac-1 Company A S P hub.docker.com I I I

    from anywhere Ext Ext Ext Company B S P Company C S P Ext P Ext S
  11. @hayorov cs-gcp-apac-1 Company A S P hub.docker.com I I I

    from anywhere Ext Ext Ext Company B S P Company C S P Ext P Ext S
  12. @hayorov KEEP CALM https://www.twistlock.com/labs-blog/breaking-docker-via-runc-explaining-cve-2019-5736/ ... allows a malicious container to

    (with minimal user interaction) overwrite the host runc binary and thus gain root-level code execution on the host. The level of user interaction is being able to run any command ... as root ... 2019-02-11 CVE-2019-5736 Breaking out of Docker via runC Score 9.3 2019-08-28 CVE-2019-11245 Containers attempt to run as uid 0 Score 7.8
  13. @hayorov KEEP CALM https://www.twistlock.com/labs-blog/breaking-docker-via-runc-explaining-cve-2019-5736/ ... allows a malicious container to

    (with minimal user interaction) overwrite the host runc binary and thus gain root-level code execution on the host. The level of user interaction is being able to run any command ... as root ... affected: Debian, Docker, Debian, Red Hat, Ubuntu, AWS, GCP, Azure … 2019-02-11 CVE-2019-5736 Breaking out of Docker via runC Score 9.3 2019-08-28 CVE-2019-11245 Containers attempt to run as uid 0 Score 7.8
  14. @hayorov KEEP CALM https://www.twistlock.com/labs-blog/breaking-docker-via-runc-explaining-cve-2019-5736/ ... allows a malicious container to

    (with minimal user interaction) overwrite the host runc binary and thus gain root-level code execution on the host. The level of user interaction is being able to run any command ... as root ... affected: Debian, Docker, Debian, Red Hat, Ubuntu, AWS, GCP, Azure … 2019-02-11 CVE-2019-5736 Breaking out of Docker via runC Score 9.3 ...for pods that do not specify an explicit runAsUser attempt to run as uid 0 (root) on container restart, or if the image was previously pulled to the node. If the pod specified mustRunAsNonRoot: true, the kubelet will refuse to start the container as root. If the pod did not specify mustRunAsNonRoot: true, the kubelet will run the container as uid 0. 2019-08-28 CVE-2019-11245 Containers attempt to run as uid 0 Score 7.8
  15. @hayorov KEEP CALM https://www.twistlock.com/labs-blog/breaking-docker-via-runc-explaining-cve-2019-5736/ ... allows a malicious container to

    (with minimal user interaction) overwrite the host runc binary and thus gain root-level code execution on the host. The level of user interaction is being able to run any command ... as root ... affected: Debian, Docker, Debian, Red Hat, Ubuntu, AWS, GCP, Azure … 2019-02-11 CVE-2019-5736 Breaking out of Docker via runC Score 9.3 ...for pods that do not specify an explicit runAsUser attempt to run as uid 0 (root) on container restart, or if the image was previously pulled to the node. If the pod specified mustRunAsNonRoot: true, the kubelet will refuse to start the container as root. If the pod did not specify mustRunAsNonRoot: true, the kubelet will run the container as uid 0. affected: kubernetes v1.13.6 and v1.14.2 2019-08-28 CVE-2019-11245 Containers attempt to run as uid 0 Score 7.8
  16. @hayorov The History 1970s Unix v7, chroot 2000 FreeBSD, Jails

    2004 Solaris, Zones 2005 Open VZ 2006 Linux cgroups
  17. @hayorov The History 2008 namespaces, LXC 1970s Unix v7, chroot

    2000 FreeBSD, Jails 2004 Solaris, Zones 2005 Open VZ 2006 Linux cgroups
  18. @hayorov The History 2008 namespaces, LXC 1970s Unix v7, chroot

    2000 FreeBSD, Jails 2004 Solaris, Zones 2005 Open VZ 2006 Linux cgroups 2013 Docker
  19. @hayorov The History 2008 namespaces, LXC 1970s Unix v7, chroot

    2000 FreeBSD, Jails 2004 Solaris, Zones 2005 Open VZ 2006 Linux cgroups 2013 Docker
  20. @hayorov Hardware cgroups namespaces capabilities AppArmor SELinux seccomp FS Hardware

    Linux Kernel runC shim containerd dockerd namespaces provide a layer of isolation
 PID for managing network interfaces. IPC for managing access to IPC resources. MNT for managing filesystem mount points. UTS for isolating kernel and version identifiers.
  21. @hayorov Hardware cgroups namespaces capabilities AppArmor SELinux seccomp FS Hardware

    Linux Kernel runC shim containerd dockerd namespaces provide a layer of isolation
 PID for managing network interfaces. IPC for managing access to IPC resources. MNT for managing filesystem mount points. UTS for isolating kernel and version identifiers. cgroups share available hardware resources to containers
 
 Memory CPU Block IO Devices Network
  22. @hayorov Hardware cgroups namespaces capabilities AppArmor SELinux seccomp FS Hardware

    Linux Kernel runC shim containerd dockerd namespaces provide a layer of isolation
 PID for managing network interfaces. IPC for managing access to IPC resources. MNT for managing filesystem mount points. UTS for isolating kernel and version identifiers. cgroups share available hardware resources to containers
 
 Memory CPU Block IO Devices Network
  23. @hayorov Hardware cgroups namespaces capabilities AppArmor SELinux seccomp FS Hardware

    Linux Kernel runC shim containerd dockerd namespaces provide a layer of isolation
 PID for managing network interfaces. IPC for managing access to IPC resources. MNT for managing filesystem mount points. UTS for isolating kernel and version identifiers. cgroups share available hardware resources to containers
 
 Memory CPU Block IO Devices Network AppArmor
 allows to restrict programs capabilities 
 with per-program profiles.
 seccomp used for filtering syscalls 
 issued by a program.
 capabilties
 for performing permission checks
  24. @hayorov The History 2008 namespaces, LXC 1970s Unix v7, chroot

    2000 FreeBSD, Jails 2004 Solaris, Zones 2005 Open VZ 2006 Linux cgroups 2013 Docker
  25. @hayorov The History 2008 namespaces, LXC 1970s Unix v7, chroot

    2000 FreeBSD, Jails 2004 Solaris, Zones 2005 Open VZ 2006 Linux cgroups 2013 Docker 2015 OCI, runC
  26. @hayorov The History 2008 namespaces, LXC 1970s Unix v7, chroot

    2000 FreeBSD, Jails 2004 Solaris, Zones 2005 Open VZ 2006 Linux cgroups 2013 Docker 2015 OCI, runC 2016 CRI-O
  27. @hayorov The History 2008 namespaces, LXC 1970s Unix v7, chroot

    2000 FreeBSD, Jails 2004 Solaris, Zones 2005 Open VZ 2006 Linux cgroups 2013 Docker 2015 OCI, runC 2016 CRI-O 2016 rkt
  28. @hayorov The History 2008 namespaces, LXC 1970s Unix v7, chroot

    2000 FreeBSD, Jails 2004 Solaris, Zones 2005 Open VZ 2006 Linux cgroups 2013 Docker 2015 OCI, runC 2016 CRI-O 2016 rkt
  29. @hayorov The History 2008 namespaces, LXC 1970s Unix v7, chroot

    2000 FreeBSD, Jails 2004 Solaris, Zones 2005 Open VZ 2006 Linux cgroups 2013 Docker 2015 OCI, runC 2017 Kata Containers 2016 CRI-O 2016 rkt
  30. @hayorov The History 2008 namespaces, LXC 1970s Unix v7, chroot

    2000 FreeBSD, Jails 2004 Solaris, Zones 2005 Open VZ 2006 Linux cgroups 2013 Docker 2015 OCI, runC 2017 Kata Containers 2018 gVisor 2016 CRI-O 2016 rkt
  31. @hayorov The History 2008 namespaces, LXC 1970s Unix v7, chroot

    2000 FreeBSD, Jails 2004 Solaris, Zones 2005 Open VZ 2006 Linux cgroups 2013 Docker 2015 OCI, runC 2017 Kata Containers 2018 gVisor 2016 CRI-O 2018 Firecracker 2016 rkt
  32. @hayorov gVisor Sandbox for Containers Independent user space kernel Container

    gVisor Kernel Independent user Hardware Limited System Calls System Calls Strong Isolation
  33. @hayorov Sandbox Architecture Container Sentry (emulated Linux Kernel) KVM seccomp

    + ns Host Linux Kernel runsc User Kernel OCI runtime powered by gVisor OCI
  34. @hayorov Sandbox Architecture Container Sentry (emulated Linux Kernel) KVM seccomp

    + ns Host Linux Kernel runsc User Kernel OCI runtime powered by gVisor OCI
  35. @hayorov Sandbox Architecture Container Sentry (emulated Linux Kernel) KVM seccomp

    + ns Host Linux Kernel runsc User Kernel Gofer 9P OCI runtime powered by gVisor OCI
  36. @hayorov Sandbox Architecture Container Sentry (emulated Linux Kernel) KVM seccomp

    + ns Host Linux Kernel runsc User Kernel Gofer 9P OCI runtime powered by gVisor OCI
  37. @hayorov How to start • Locally (macOS Docker) $ wget

    https://storage.googleapis.com/gvisor/releases/nightly/latest/runsc
  38. @hayorov How to start • Locally (macOS Docker) $ wget

    https://storage.googleapis.com/gvisor/releases/nightly/latest/runsc $ cat ~/.docker/daemon.json (taskbar > Preferences > Daemon > Advanced)
  39. @hayorov How to start • Locally (macOS Docker) $ wget

    https://storage.googleapis.com/gvisor/releases/nightly/latest/runsc "default-runtime": "runc", "runtimes": { "runsc": { "path": “/usr/allexx/foo/runsc“ } } $ cat ~/.docker/daemon.json (taskbar > Preferences > Daemon > Advanced)
  40. @hayorov How to start • Locally (macOS Docker) $ wget

    https://storage.googleapis.com/gvisor/releases/nightly/latest/runsc "default-runtime": "runc", "runtimes": { "runsc": { "path": “/usr/allexx/foo/runsc“ } } $ docker run --rm --runtime=runsc -it alpine $ cat ~/.docker/daemon.json (taskbar > Preferences > Daemon > Advanced)
  41. @hayorov How to start • GKE (managed Kubernetes) Create a

    new node pool gcloud beta container node-pools create [NODE_POOL_NAME] \ --cluster=[CLUSTER_NAME] \ --node-version=[NODE_VERSION] \ --image-type=cos_containerd \ --sandbox type=gvisor \
  42. @hayorov How to start • GKE (managed Kubernetes) Create a

    new node pool gcloud beta container node-pools create [NODE_POOL_NAME] \ --cluster=[CLUSTER_NAME] \ --node-version=[NODE_VERSION] \ --image-type=cos_containerd \ --sandbox type=gvisor \ $ kubectl get runtimeclasses NAME AGE gvisor 19s
  43. @hayorov How to start • GKE (managed Kubernetes) Running an

    application kind: Deployment metadata: name: httpd spec: replicas: 1 selector: matchLabels: app: httpd template: metadata: labels: app: httpd spec: runtimeClassName: gvisor containers: - name: httpd image: httpd https: //cloud.google.com/kubernetes-engine/docs/how-to/sandbox-pods
  44. @hayorov How to start • GKE (managed Kubernetes) Running an

    application kind: Deployment metadata: name: httpd spec: replicas: 1 selector: matchLabels: app: httpd template: metadata: labels: app: httpd spec: runtimeClassName: gvisor containers: - name: httpd image: httpd Enable raw sockets spec: containers: - name: my-container securityContext: capabilities: add: ["NET_RAW"] https: //cloud.google.com/kubernetes-engine/docs/how-to/sandbox-pods
  45. @hayorov Applicability and performance • Of 330 syscalls, 233 syscalls

    have a full or partial implementation. https: // www.usenix.org/system/files/hotcloud19-paper-young.pdf
  46. @hayorov Applicability and performance • Of 330 syscalls, 233 syscalls

    have a full or partial implementation. elasticsearch golang java8 jenkins mariadb memcached mongo nginx node php postgres prometheus python https: // www.usenix.org/system/files/hotcloud19-paper-young.pdf
  47. @hayorov Applicability and performance • Of 330 syscalls, 233 syscalls

    have a full or partial implementation. elasticsearch golang java8 jenkins mariadb memcached mongo nginx node php postgres prometheus python elasticsearch golang java8 jenkins mariadb memcached mongo nginx node php postgres prometheus python https: // www.usenix.org/system/files/hotcloud19-paper-young.pdf
  48. @hayorov Applicability and performance • Of 330 syscalls, 233 syscalls

    have a full or partial implementation. • Performance 
 
 CPU (events/sec) no diff Startup time (ms) no diff Mem (usage, MB) 35Mb Net (rps) -50%
 … small operations (I/O) impose a large overhead. elasticsearch golang java8 jenkins mariadb memcached mongo nginx node php postgres prometheus python elasticsearch golang java8 jenkins mariadb memcached mongo nginx node php postgres prometheus python https: // www.usenix.org/system/files/hotcloud19-paper-young.pdf
  49. @hayorov Applicability and performance • Of 330 syscalls, 233 syscalls

    have a full or partial implementation. • Performance 
 
 CPU (events/sec) no diff Startup time (ms) no diff Mem (usage, MB) 35Mb Net (rps) -50%
 … small operations (I/O) impose a large overhead. elasticsearch golang java8 jenkins mariadb memcached mongo nginx node php postgres prometheus python elasticsearch golang java8 jenkins mariadb memcached mongo nginx node php postgres prometheus python https: // www.usenix.org/system/files/hotcloud19-paper-young.pdf • NO direct access to hardware or virtualization (no GPU)
  50. @hayorov So Now What? • Configure a security context (runAsUser

    != 0) • Keep your software Up-to-date (OS, runtime, Kubernetes)
  51. @hayorov So Now What? • Configure a security context (runAsUser

    != 0) • Discover Falco to start monitoring abnormal activities of your (GKE-compatible) • Keep your software Up-to-date (OS, runtime, Kubernetes)
  52. @hayorov So Now What? • Setup “sandboxed nodepool” with gVisor

    for the riskiest workload • Configure a security context (runAsUser != 0) • Discover Falco to start monitoring abnormal activities of your (GKE-compatible) • Keep your software Up-to-date (OS, runtime, Kubernetes)
  53. @hayorov So Now What? • Setup “sandboxed nodepool” with gVisor

    for the riskiest workload • Configure a security context (runAsUser != 0) • Discover Falco to start monitoring abnormal activities of your (GKE-compatible) • Learn about alternatives: Kata containers and Firecracker MicroVMs • Keep your software Up-to-date (OS, runtime, Kubernetes)
  54. @hayorov So Now What? • Setup “sandboxed nodepool” with gVisor

    for the riskiest workload • Configure a security context (runAsUser != 0) • Discover Falco to start monitoring abnormal activities of your (GKE-compatible) • Learn about alternatives: Kata containers and Firecracker MicroVMs • Use dedicated instances (VMs, Bare Metal) or services in special cases • Keep your software Up-to-date (OS, runtime, Kubernetes)