Container security

320f3b558c773592bba16c976d1b28d1?s=47 Alex Khaerov
September 24, 2019

Container security

320f3b558c773592bba16c976d1b28d1?s=128

Alex Khaerov

September 24, 2019
Tweet

Transcript

  1. быть безопасными? Могут ли контейнеры hayorov Alex Khaerov

  2. @hayorov @hayorov Привет ✋

  3. @hayorov Alex Khaerov company who I am Development Lead @hayorov

  4. @hayorov Alex Khaerov company who I am Development Lead doing

    software development in the recent decade junior speaker - Python, Kubernetes committee member (Moscow Python, Helm Summit) a huge fan of laptop stickers and a cyclist @hayorov
  5. @hayorov Chainstack multi-cloud and multi-blockchain platform as a service based

    in Singapore # and hiring Alex Khaerov company who I am Development Lead doing software development in the recent decade junior speaker - Python, Kubernetes committee member (Moscow Python, Helm Summit) a huge fan of laptop stickers and a cyclist @hayorov
  6. @hayorov

  7. @hayorov I am NOT

  8. @hayorov I am NOT - Linux kernel developer; - Security

    researcher; - DevSecOps.
  9. @hayorov I am NOT - Linux kernel developer; - Security

    researcher; - DevSecOps. I am a typical customer of containers.
  10. @hayorov My Cluster Company A Container Company B Container Company

    C Container
  11. @hayorov My Cluster Company A Container Company B Container Company

    C Container
  12. @hayorov My Cluster Company A Container Company B Container Company

    C Container
  13. @hayorov My Cluster Company A Container Company B Container Company

    C Container Container
  14. @hayorov My Cluster Company A Container Company B Container Company

    C Container Container
  15. @hayorov

  16. @hayorov cs-gcp-apac-1

  17. @hayorov cs-gcp-apac-1 Company A S P

  18. @hayorov cs-gcp-apac-1 Company A S P Company B S P

    Company C S P
  19. @hayorov cs-gcp-apac-1 Company A S P hub.docker.com I I I

    Company B S P Company C S P
  20. @hayorov cs-gcp-apac-1 Company A S P hub.docker.com I I I

    from anywhere Ext Ext Ext Company B S P Company C S P
  21. @hayorov cs-gcp-apac-1 Company A S P hub.docker.com I I I

    from anywhere Ext Ext Ext Company B S P Company C S P
  22. @hayorov cs-gcp-apac-1 Company A S P hub.docker.com I I I

    from anywhere Ext Ext Ext Company B S P Company C S P Ext
  23. @hayorov cs-gcp-apac-1 Company A S P hub.docker.com I I I

    from anywhere Ext Ext Ext Company B S P Company C S P Ext
  24. @hayorov cs-gcp-apac-1 Company A S P hub.docker.com I I I

    from anywhere Ext Ext Ext Company B S P Company C S P Ext Ext
  25. @hayorov cs-gcp-apac-1 Company A S P hub.docker.com I I I

    from anywhere Ext Ext Ext Company B S P Company C S P Ext Ext S
  26. @hayorov cs-gcp-apac-1 Company A S P hub.docker.com I I I

    from anywhere Ext Ext Ext Company B S P Company C S P Ext P Ext S
  27. @hayorov cs-gcp-apac-1 Company A S P hub.docker.com I I I

    from anywhere Ext Ext Ext Company B S P Company C S P Ext P Ext S
  28. @hayorov KEEP CALM https://www.twistlock.com/labs-blog/breaking-docker-via-runc-explaining-cve-2019-5736/

  29. @hayorov KEEP CALM https://www.twistlock.com/labs-blog/breaking-docker-via-runc-explaining-cve-2019-5736/ 2019-02-11 CVE-2019-5736 Breaking out of Docker

    via runC Score 9.3
  30. @hayorov KEEP CALM https://www.twistlock.com/labs-blog/breaking-docker-via-runc-explaining-cve-2019-5736/ 2019-02-11 CVE-2019-5736 Breaking out of Docker

    via runC Score 9.3 2019-08-28 CVE-2019-11245 Containers attempt to run as uid 0 Score 7.8
  31. @hayorov KEEP CALM https://www.twistlock.com/labs-blog/breaking-docker-via-runc-explaining-cve-2019-5736/ ... allows a malicious container to

    (with minimal user interaction) overwrite the host runc binary and thus gain root-level code execution on the host. The level of user interaction is being able to run any command ... as root ... 2019-02-11 CVE-2019-5736 Breaking out of Docker via runC Score 9.3 2019-08-28 CVE-2019-11245 Containers attempt to run as uid 0 Score 7.8
  32. @hayorov KEEP CALM https://www.twistlock.com/labs-blog/breaking-docker-via-runc-explaining-cve-2019-5736/ ... allows a malicious container to

    (with minimal user interaction) overwrite the host runc binary and thus gain root-level code execution on the host. The level of user interaction is being able to run any command ... as root ... affected: Debian, Docker, Debian, Red Hat, Ubuntu, AWS, GCP, Azure … 2019-02-11 CVE-2019-5736 Breaking out of Docker via runC Score 9.3 2019-08-28 CVE-2019-11245 Containers attempt to run as uid 0 Score 7.8
  33. @hayorov KEEP CALM https://www.twistlock.com/labs-blog/breaking-docker-via-runc-explaining-cve-2019-5736/ ... allows a malicious container to

    (with minimal user interaction) overwrite the host runc binary and thus gain root-level code execution on the host. The level of user interaction is being able to run any command ... as root ... affected: Debian, Docker, Debian, Red Hat, Ubuntu, AWS, GCP, Azure … 2019-02-11 CVE-2019-5736 Breaking out of Docker via runC Score 9.3 ...for pods that do not specify an explicit runAsUser attempt to run as uid 0 (root) on container restart, or if the image was previously pulled to the node. If the pod specified mustRunAsNonRoot: true, the kubelet will refuse to start the container as root. If the pod did not specify mustRunAsNonRoot: true, the kubelet will run the container as uid 0. 2019-08-28 CVE-2019-11245 Containers attempt to run as uid 0 Score 7.8
  34. @hayorov KEEP CALM https://www.twistlock.com/labs-blog/breaking-docker-via-runc-explaining-cve-2019-5736/ ... allows a malicious container to

    (with minimal user interaction) overwrite the host runc binary and thus gain root-level code execution on the host. The level of user interaction is being able to run any command ... as root ... affected: Debian, Docker, Debian, Red Hat, Ubuntu, AWS, GCP, Azure … 2019-02-11 CVE-2019-5736 Breaking out of Docker via runC Score 9.3 ...for pods that do not specify an explicit runAsUser attempt to run as uid 0 (root) on container restart, or if the image was previously pulled to the node. If the pod specified mustRunAsNonRoot: true, the kubelet will refuse to start the container as root. If the pod did not specify mustRunAsNonRoot: true, the kubelet will run the container as uid 0. affected: kubernetes v1.13.6 and v1.14.2 2019-08-28 CVE-2019-11245 Containers attempt to run as uid 0 Score 7.8
  35. @hayorov The History

  36. @hayorov The History 1970s Unix v7, chroot

  37. @hayorov The History 1970s Unix v7, chroot 2000 FreeBSD, Jails

  38. @hayorov The History 1970s Unix v7, chroot 2000 FreeBSD, Jails

    2004 Solaris, Zones
  39. @hayorov The History 1970s Unix v7, chroot 2000 FreeBSD, Jails

    2004 Solaris, Zones 2005 Open VZ
  40. @hayorov The History 1970s Unix v7, chroot 2000 FreeBSD, Jails

    2004 Solaris, Zones 2005 Open VZ 2006 Linux cgroups
  41. @hayorov The History 2008 namespaces, LXC 1970s Unix v7, chroot

    2000 FreeBSD, Jails 2004 Solaris, Zones 2005 Open VZ 2006 Linux cgroups
  42. @hayorov The History 2008 namespaces, LXC 1970s Unix v7, chroot

    2000 FreeBSD, Jails 2004 Solaris, Zones 2005 Open VZ 2006 Linux cgroups 2013 Docker
  43. @hayorov The History 2008 namespaces, LXC 1970s Unix v7, chroot

    2000 FreeBSD, Jails 2004 Solaris, Zones 2005 Open VZ 2006 Linux cgroups 2013 Docker
  44. @hayorov

  45. @hayorov Hardware dockerd

  46. @hayorov Hardware runC shim containerd dockerd

  47. @hayorov Hardware cgroups namespaces capabilities AppArmor SELinux seccomp FS Hardware

    Linux Kernel runC shim containerd dockerd
  48. @hayorov Hardware cgroups namespaces capabilities AppArmor SELinux seccomp FS Hardware

    Linux Kernel runC shim containerd dockerd
  49. @hayorov Hardware cgroups namespaces capabilities AppArmor SELinux seccomp FS Hardware

    Linux Kernel runC shim containerd dockerd namespaces provide a layer of isolation
 PID for managing network interfaces. IPC for managing access to IPC resources. MNT for managing filesystem mount points. UTS for isolating kernel and version identifiers.
  50. @hayorov Hardware cgroups namespaces capabilities AppArmor SELinux seccomp FS Hardware

    Linux Kernel runC shim containerd dockerd namespaces provide a layer of isolation
 PID for managing network interfaces. IPC for managing access to IPC resources. MNT for managing filesystem mount points. UTS for isolating kernel and version identifiers. cgroups share available hardware resources to containers
 
 Memory CPU Block IO Devices Network
  51. @hayorov Hardware cgroups namespaces capabilities AppArmor SELinux seccomp FS Hardware

    Linux Kernel runC shim containerd dockerd namespaces provide a layer of isolation
 PID for managing network interfaces. IPC for managing access to IPC resources. MNT for managing filesystem mount points. UTS for isolating kernel and version identifiers. cgroups share available hardware resources to containers
 
 Memory CPU Block IO Devices Network
  52. @hayorov Hardware cgroups namespaces capabilities AppArmor SELinux seccomp FS Hardware

    Linux Kernel runC shim containerd dockerd namespaces provide a layer of isolation
 PID for managing network interfaces. IPC for managing access to IPC resources. MNT for managing filesystem mount points. UTS for isolating kernel and version identifiers. cgroups share available hardware resources to containers
 
 Memory CPU Block IO Devices Network AppArmor
 allows to restrict programs capabilities 
 with per-program profiles.
 seccomp used for filtering syscalls 
 issued by a program.
 capabilties
 for performing permission checks
  53. @hayorov The History 2008 namespaces, LXC 1970s Unix v7, chroot

    2000 FreeBSD, Jails 2004 Solaris, Zones 2005 Open VZ 2006 Linux cgroups 2013 Docker
  54. @hayorov The History 2008 namespaces, LXC 1970s Unix v7, chroot

    2000 FreeBSD, Jails 2004 Solaris, Zones 2005 Open VZ 2006 Linux cgroups 2013 Docker 2015 OCI, runC
  55. @hayorov The History 2008 namespaces, LXC 1970s Unix v7, chroot

    2000 FreeBSD, Jails 2004 Solaris, Zones 2005 Open VZ 2006 Linux cgroups 2013 Docker 2015 OCI, runC 2016 CRI-O
  56. @hayorov The History 2008 namespaces, LXC 1970s Unix v7, chroot

    2000 FreeBSD, Jails 2004 Solaris, Zones 2005 Open VZ 2006 Linux cgroups 2013 Docker 2015 OCI, runC 2016 CRI-O 2016 rkt
  57. @hayorov The History 2008 namespaces, LXC 1970s Unix v7, chroot

    2000 FreeBSD, Jails 2004 Solaris, Zones 2005 Open VZ 2006 Linux cgroups 2013 Docker 2015 OCI, runC 2016 CRI-O 2016 rkt
  58. @hayorov The History 2008 namespaces, LXC 1970s Unix v7, chroot

    2000 FreeBSD, Jails 2004 Solaris, Zones 2005 Open VZ 2006 Linux cgroups 2013 Docker 2015 OCI, runC 2017 Kata Containers 2016 CRI-O 2016 rkt
  59. @hayorov The History 2008 namespaces, LXC 1970s Unix v7, chroot

    2000 FreeBSD, Jails 2004 Solaris, Zones 2005 Open VZ 2006 Linux cgroups 2013 Docker 2015 OCI, runC 2017 Kata Containers 2018 gVisor 2016 CRI-O 2016 rkt
  60. @hayorov The History 2008 namespaces, LXC 1970s Unix v7, chroot

    2000 FreeBSD, Jails 2004 Solaris, Zones 2005 Open VZ 2006 Linux cgroups 2013 Docker 2015 OCI, runC 2017 Kata Containers 2018 gVisor 2016 CRI-O 2018 Firecracker 2016 rkt
  61. @hayorov VMs vs Containers * Only Type II VMM needs

    to run on operating system.
  62. @hayorov Attacks via the Kernel Kernel Container Node host

  63. @hayorov Attacks via the Kernel Kernel Container Node host

  64. @hayorov Attacks via the Kernel Kernel Container Node host Escape!

  65. @hayorov Isolation ≠ Secure Attacks via the Kernel Kernel Container

    Node host Escape!
  66. @hayorov We want it all … secured zero config lightweight

  67. @hayorov gVisor Sandbox for Containers Independent user space kernel Container

    gVisor Kernel Independent user Hardware Limited System Calls System Calls Strong Isolation
  68. @hayorov Architecture

  69. @hayorov Architecture OCI

  70. @hayorov Architecture runsc OCI

  71. @hayorov Architecture runsc OCI runtime powered by gVisor OCI

  72. @hayorov Sandbox Architecture runsc OCI runtime powered by gVisor OCI

  73. @hayorov Sandbox Architecture Container Sentry (emulated Linux Kernel) runsc OCI

    runtime powered by gVisor OCI
  74. @hayorov Sandbox Architecture Container Sentry (emulated Linux Kernel) runsc User

    Kernel OCI runtime powered by gVisor OCI
  75. @hayorov Sandbox Architecture Container Sentry (emulated Linux Kernel) KVM seccomp

    + ns Host Linux Kernel runsc User Kernel OCI runtime powered by gVisor OCI
  76. @hayorov Sandbox Architecture Container Sentry (emulated Linux Kernel) KVM seccomp

    + ns Host Linux Kernel runsc User Kernel OCI runtime powered by gVisor OCI
  77. @hayorov Sandbox Architecture Container Sentry (emulated Linux Kernel) KVM seccomp

    + ns Host Linux Kernel runsc User Kernel Gofer 9P OCI runtime powered by gVisor OCI
  78. @hayorov Sandbox Architecture Container Sentry (emulated Linux Kernel) KVM seccomp

    + ns Host Linux Kernel runsc User Kernel Gofer 9P OCI runtime powered by gVisor OCI
  79. @hayorov How to start • Locally (macOS Docker)

  80. @hayorov How to start • Locally (macOS Docker) $ wget

    https://storage.googleapis.com/gvisor/releases/nightly/latest/runsc
  81. @hayorov How to start • Locally (macOS Docker) $ wget

    https://storage.googleapis.com/gvisor/releases/nightly/latest/runsc $ cat ~/.docker/daemon.json (taskbar > Preferences > Daemon > Advanced)
  82. @hayorov How to start • Locally (macOS Docker) $ wget

    https://storage.googleapis.com/gvisor/releases/nightly/latest/runsc "default-runtime": "runc", "runtimes": { "runsc": { "path": “/usr/allexx/foo/runsc“ } } $ cat ~/.docker/daemon.json (taskbar > Preferences > Daemon > Advanced)
  83. @hayorov How to start • Locally (macOS Docker) $ wget

    https://storage.googleapis.com/gvisor/releases/nightly/latest/runsc "default-runtime": "runc", "runtimes": { "runsc": { "path": “/usr/allexx/foo/runsc“ } } $ docker run --rm --runtime=runsc -it alpine $ cat ~/.docker/daemon.json (taskbar > Preferences > Daemon > Advanced)
  84. @hayorov How to start • GKE (managed Kubernetes)

  85. @hayorov How to start • GKE (managed Kubernetes) Create a

    new node pool gcloud beta container node-pools create [NODE_POOL_NAME] \ --cluster=[CLUSTER_NAME] \ --node-version=[NODE_VERSION] \ --image-type=cos_containerd \ --sandbox type=gvisor \
  86. @hayorov How to start • GKE (managed Kubernetes) Create a

    new node pool gcloud beta container node-pools create [NODE_POOL_NAME] \ --cluster=[CLUSTER_NAME] \ --node-version=[NODE_VERSION] \ --image-type=cos_containerd \ --sandbox type=gvisor \ $ kubectl get runtimeclasses NAME AGE gvisor 19s
  87. @hayorov How to start • GKE (managed Kubernetes) https: //cloud.google.com/kubernetes-engine/docs/how-to/sandbox-pods

  88. @hayorov How to start • GKE (managed Kubernetes) Running an

    application kind: Deployment metadata: name: httpd spec: replicas: 1 selector: matchLabels: app: httpd template: metadata: labels: app: httpd spec: runtimeClassName: gvisor containers: - name: httpd image: httpd https: //cloud.google.com/kubernetes-engine/docs/how-to/sandbox-pods
  89. @hayorov How to start • GKE (managed Kubernetes) Running an

    application kind: Deployment metadata: name: httpd spec: replicas: 1 selector: matchLabels: app: httpd template: metadata: labels: app: httpd spec: runtimeClassName: gvisor containers: - name: httpd image: httpd Enable raw sockets spec: containers: - name: my-container securityContext: capabilities: add: ["NET_RAW"] https: //cloud.google.com/kubernetes-engine/docs/how-to/sandbox-pods
  90. @hayorov Applicability and performance https: // www.usenix.org/system/files/hotcloud19-paper-young.pdf

  91. @hayorov Applicability and performance • Of 330 syscalls, 233 syscalls

    have a full or partial implementation. https: // www.usenix.org/system/files/hotcloud19-paper-young.pdf
  92. @hayorov Applicability and performance • Of 330 syscalls, 233 syscalls

    have a full or partial implementation. elasticsearch golang java8 jenkins mariadb memcached mongo nginx node php postgres prometheus python https: // www.usenix.org/system/files/hotcloud19-paper-young.pdf
  93. @hayorov Applicability and performance • Of 330 syscalls, 233 syscalls

    have a full or partial implementation. elasticsearch golang java8 jenkins mariadb memcached mongo nginx node php postgres prometheus python elasticsearch golang java8 jenkins mariadb memcached mongo nginx node php postgres prometheus python https: // www.usenix.org/system/files/hotcloud19-paper-young.pdf
  94. @hayorov Applicability and performance • Of 330 syscalls, 233 syscalls

    have a full or partial implementation. • Performance 
 
 CPU (events/sec) no diff Startup time (ms) no diff Mem (usage, MB) 35Mb Net (rps) -50%
 … small operations (I/O) impose a large overhead. elasticsearch golang java8 jenkins mariadb memcached mongo nginx node php postgres prometheus python elasticsearch golang java8 jenkins mariadb memcached mongo nginx node php postgres prometheus python https: // www.usenix.org/system/files/hotcloud19-paper-young.pdf
  95. @hayorov Applicability and performance • Of 330 syscalls, 233 syscalls

    have a full or partial implementation. • Performance 
 
 CPU (events/sec) no diff Startup time (ms) no diff Mem (usage, MB) 35Mb Net (rps) -50%
 … small operations (I/O) impose a large overhead. elasticsearch golang java8 jenkins mariadb memcached mongo nginx node php postgres prometheus python elasticsearch golang java8 jenkins mariadb memcached mongo nginx node php postgres prometheus python https: // www.usenix.org/system/files/hotcloud19-paper-young.pdf • NO direct access to hardware or virtualization (no GPU)
  96. быть безопасными? Могут ли контейнеры

  97. @hayorov CVE-2017-1002101: Host-resolved symlinks Kernel Container Volume Volume Node host

    Escape!
  98. быть безопасными? Могут ли контейнеры скорее нет

  99. @hayorov So Now What?

  100. @hayorov So Now What? • Configure a security context (runAsUser

    != 0)
  101. @hayorov So Now What? • Configure a security context (runAsUser

    != 0) • Keep your software Up-to-date (OS, runtime, Kubernetes)
  102. @hayorov So Now What? • Configure a security context (runAsUser

    != 0) • Discover Falco to start monitoring abnormal activities of your (GKE-compatible) • Keep your software Up-to-date (OS, runtime, Kubernetes)
  103. @hayorov So Now What? • Setup “sandboxed nodepool” with gVisor

    for the riskiest workload • Configure a security context (runAsUser != 0) • Discover Falco to start monitoring abnormal activities of your (GKE-compatible) • Keep your software Up-to-date (OS, runtime, Kubernetes)
  104. @hayorov So Now What? • Setup “sandboxed nodepool” with gVisor

    for the riskiest workload • Configure a security context (runAsUser != 0) • Discover Falco to start monitoring abnormal activities of your (GKE-compatible) • Learn about alternatives: Kata containers and Firecracker MicroVMs • Keep your software Up-to-date (OS, runtime, Kubernetes)
  105. @hayorov So Now What? • Setup “sandboxed nodepool” with gVisor

    for the riskiest workload • Configure a security context (runAsUser != 0) • Discover Falco to start monitoring abnormal activities of your (GKE-compatible) • Learn about alternatives: Kata containers and Firecracker MicroVMs • Use dedicated instances (VMs, Bare Metal) or services in special cases • Keep your software Up-to-date (OS, runtime, Kubernetes)
  106. Thank you questions… Alex Khaerov t.me/hayorov http://bit.ly/xxx