Slide 1

Slide 1 text

быть безопасными? Могут ли контейнеры hayorov Alex Khaerov

Slide 2

Slide 2 text

@hayorov @hayorov Привет ✋

Slide 3

Slide 3 text

@hayorov Alex Khaerov company who I am Development Lead @hayorov

Slide 4

Slide 4 text

@hayorov Alex Khaerov company who I am Development Lead doing software development in the recent decade junior speaker - Python, Kubernetes committee member (Moscow Python, Helm Summit) a huge fan of laptop stickers and a cyclist @hayorov

Slide 5

Slide 5 text

@hayorov Chainstack multi-cloud and multi-blockchain platform as a service based in Singapore # and hiring Alex Khaerov company who I am Development Lead doing software development in the recent decade junior speaker - Python, Kubernetes committee member (Moscow Python, Helm Summit) a huge fan of laptop stickers and a cyclist @hayorov

Slide 6

Slide 6 text

@hayorov

Slide 7

Slide 7 text

@hayorov I am NOT

Slide 8

Slide 8 text

@hayorov I am NOT - Linux kernel developer; - Security researcher; - DevSecOps.

Slide 9

Slide 9 text

@hayorov I am NOT - Linux kernel developer; - Security researcher; - DevSecOps. I am a typical customer of containers.

Slide 10

Slide 10 text

@hayorov My Cluster Company A Container Company B Container Company C Container

Slide 11

Slide 11 text

@hayorov My Cluster Company A Container Company B Container Company C Container

Slide 12

Slide 12 text

@hayorov My Cluster Company A Container Company B Container Company C Container

Slide 13

Slide 13 text

@hayorov My Cluster Company A Container Company B Container Company C Container Container

Slide 14

Slide 14 text

@hayorov My Cluster Company A Container Company B Container Company C Container Container

Slide 15

Slide 15 text

@hayorov

Slide 16

Slide 16 text

@hayorov cs-gcp-apac-1

Slide 17

Slide 17 text

@hayorov cs-gcp-apac-1 Company A S P

Slide 18

Slide 18 text

@hayorov cs-gcp-apac-1 Company A S P Company B S P Company C S P

Slide 19

Slide 19 text

@hayorov cs-gcp-apac-1 Company A S P hub.docker.com I I I Company B S P Company C S P

Slide 20

Slide 20 text

@hayorov cs-gcp-apac-1 Company A S P hub.docker.com I I I from anywhere Ext Ext Ext Company B S P Company C S P

Slide 21

Slide 21 text

@hayorov cs-gcp-apac-1 Company A S P hub.docker.com I I I from anywhere Ext Ext Ext Company B S P Company C S P

Slide 22

Slide 22 text

@hayorov cs-gcp-apac-1 Company A S P hub.docker.com I I I from anywhere Ext Ext Ext Company B S P Company C S P Ext

Slide 23

Slide 23 text

@hayorov cs-gcp-apac-1 Company A S P hub.docker.com I I I from anywhere Ext Ext Ext Company B S P Company C S P Ext

Slide 24

Slide 24 text

@hayorov cs-gcp-apac-1 Company A S P hub.docker.com I I I from anywhere Ext Ext Ext Company B S P Company C S P Ext Ext

Slide 25

Slide 25 text

@hayorov cs-gcp-apac-1 Company A S P hub.docker.com I I I from anywhere Ext Ext Ext Company B S P Company C S P Ext Ext S

Slide 26

Slide 26 text

@hayorov cs-gcp-apac-1 Company A S P hub.docker.com I I I from anywhere Ext Ext Ext Company B S P Company C S P Ext P Ext S

Slide 27

Slide 27 text

@hayorov cs-gcp-apac-1 Company A S P hub.docker.com I I I from anywhere Ext Ext Ext Company B S P Company C S P Ext P Ext S

Slide 28

Slide 28 text

@hayorov KEEP CALM https://www.twistlock.com/labs-blog/breaking-docker-via-runc-explaining-cve-2019-5736/

Slide 29

Slide 29 text

@hayorov KEEP CALM https://www.twistlock.com/labs-blog/breaking-docker-via-runc-explaining-cve-2019-5736/ 2019-02-11 CVE-2019-5736 Breaking out of Docker via runC Score 9.3

Slide 30

Slide 30 text

@hayorov KEEP CALM https://www.twistlock.com/labs-blog/breaking-docker-via-runc-explaining-cve-2019-5736/ 2019-02-11 CVE-2019-5736 Breaking out of Docker via runC Score 9.3 2019-08-28 CVE-2019-11245 Containers attempt to run as uid 0 Score 7.8

Slide 31

Slide 31 text

@hayorov KEEP CALM https://www.twistlock.com/labs-blog/breaking-docker-via-runc-explaining-cve-2019-5736/ ... allows a malicious container to (with minimal user interaction) overwrite the host runc binary and thus gain root-level code execution on the host. The level of user interaction is being able to run any command ... as root ... 2019-02-11 CVE-2019-5736 Breaking out of Docker via runC Score 9.3 2019-08-28 CVE-2019-11245 Containers attempt to run as uid 0 Score 7.8

Slide 32

Slide 32 text

@hayorov KEEP CALM https://www.twistlock.com/labs-blog/breaking-docker-via-runc-explaining-cve-2019-5736/ ... allows a malicious container to (with minimal user interaction) overwrite the host runc binary and thus gain root-level code execution on the host. The level of user interaction is being able to run any command ... as root ... affected: Debian, Docker, Debian, Red Hat, Ubuntu, AWS, GCP, Azure … 2019-02-11 CVE-2019-5736 Breaking out of Docker via runC Score 9.3 2019-08-28 CVE-2019-11245 Containers attempt to run as uid 0 Score 7.8

Slide 33

Slide 33 text

@hayorov KEEP CALM https://www.twistlock.com/labs-blog/breaking-docker-via-runc-explaining-cve-2019-5736/ ... allows a malicious container to (with minimal user interaction) overwrite the host runc binary and thus gain root-level code execution on the host. The level of user interaction is being able to run any command ... as root ... affected: Debian, Docker, Debian, Red Hat, Ubuntu, AWS, GCP, Azure … 2019-02-11 CVE-2019-5736 Breaking out of Docker via runC Score 9.3 ...for pods that do not specify an explicit runAsUser attempt to run as uid 0 (root) on container restart, or if the image was previously pulled to the node. If the pod specified mustRunAsNonRoot: true, the kubelet will refuse to start the container as root. If the pod did not specify mustRunAsNonRoot: true, the kubelet will run the container as uid 0. 2019-08-28 CVE-2019-11245 Containers attempt to run as uid 0 Score 7.8

Slide 34

Slide 34 text

@hayorov KEEP CALM https://www.twistlock.com/labs-blog/breaking-docker-via-runc-explaining-cve-2019-5736/ ... allows a malicious container to (with minimal user interaction) overwrite the host runc binary and thus gain root-level code execution on the host. The level of user interaction is being able to run any command ... as root ... affected: Debian, Docker, Debian, Red Hat, Ubuntu, AWS, GCP, Azure … 2019-02-11 CVE-2019-5736 Breaking out of Docker via runC Score 9.3 ...for pods that do not specify an explicit runAsUser attempt to run as uid 0 (root) on container restart, or if the image was previously pulled to the node. If the pod specified mustRunAsNonRoot: true, the kubelet will refuse to start the container as root. If the pod did not specify mustRunAsNonRoot: true, the kubelet will run the container as uid 0. affected: kubernetes v1.13.6 and v1.14.2 2019-08-28 CVE-2019-11245 Containers attempt to run as uid 0 Score 7.8

Slide 35

Slide 35 text

@hayorov The History

Slide 36

Slide 36 text

@hayorov The History 1970s Unix v7, chroot

Slide 37

Slide 37 text

@hayorov The History 1970s Unix v7, chroot 2000 FreeBSD, Jails

Slide 38

Slide 38 text

@hayorov The History 1970s Unix v7, chroot 2000 FreeBSD, Jails 2004 Solaris, Zones

Slide 39

Slide 39 text

@hayorov The History 1970s Unix v7, chroot 2000 FreeBSD, Jails 2004 Solaris, Zones 2005 Open VZ

Slide 40

Slide 40 text

@hayorov The History 1970s Unix v7, chroot 2000 FreeBSD, Jails 2004 Solaris, Zones 2005 Open VZ 2006 Linux cgroups

Slide 41

Slide 41 text

@hayorov The History 2008 namespaces, LXC 1970s Unix v7, chroot 2000 FreeBSD, Jails 2004 Solaris, Zones 2005 Open VZ 2006 Linux cgroups

Slide 42

Slide 42 text

@hayorov The History 2008 namespaces, LXC 1970s Unix v7, chroot 2000 FreeBSD, Jails 2004 Solaris, Zones 2005 Open VZ 2006 Linux cgroups 2013 Docker

Slide 43

Slide 43 text

@hayorov The History 2008 namespaces, LXC 1970s Unix v7, chroot 2000 FreeBSD, Jails 2004 Solaris, Zones 2005 Open VZ 2006 Linux cgroups 2013 Docker

Slide 44

Slide 44 text

@hayorov

Slide 45

Slide 45 text

@hayorov Hardware dockerd

Slide 46

Slide 46 text

@hayorov Hardware runC shim containerd dockerd

Slide 47

Slide 47 text

@hayorov Hardware cgroups namespaces capabilities AppArmor SELinux seccomp FS Hardware Linux Kernel runC shim containerd dockerd

Slide 48

Slide 48 text

@hayorov Hardware cgroups namespaces capabilities AppArmor SELinux seccomp FS Hardware Linux Kernel runC shim containerd dockerd

Slide 49

Slide 49 text

@hayorov Hardware cgroups namespaces capabilities AppArmor SELinux seccomp FS Hardware Linux Kernel runC shim containerd dockerd namespaces provide a layer of isolation
 PID for managing network interfaces. IPC for managing access to IPC resources. MNT for managing filesystem mount points. UTS for isolating kernel and version identifiers.

Slide 50

Slide 50 text

@hayorov Hardware cgroups namespaces capabilities AppArmor SELinux seccomp FS Hardware Linux Kernel runC shim containerd dockerd namespaces provide a layer of isolation
 PID for managing network interfaces. IPC for managing access to IPC resources. MNT for managing filesystem mount points. UTS for isolating kernel and version identifiers. cgroups share available hardware resources to containers
 
 Memory CPU Block IO Devices Network

Slide 51

Slide 51 text

@hayorov Hardware cgroups namespaces capabilities AppArmor SELinux seccomp FS Hardware Linux Kernel runC shim containerd dockerd namespaces provide a layer of isolation
 PID for managing network interfaces. IPC for managing access to IPC resources. MNT for managing filesystem mount points. UTS for isolating kernel and version identifiers. cgroups share available hardware resources to containers
 
 Memory CPU Block IO Devices Network

Slide 52

Slide 52 text

@hayorov Hardware cgroups namespaces capabilities AppArmor SELinux seccomp FS Hardware Linux Kernel runC shim containerd dockerd namespaces provide a layer of isolation
 PID for managing network interfaces. IPC for managing access to IPC resources. MNT for managing filesystem mount points. UTS for isolating kernel and version identifiers. cgroups share available hardware resources to containers
 
 Memory CPU Block IO Devices Network AppArmor
 allows to restrict programs capabilities 
 with per-program profiles.
 seccomp used for filtering syscalls 
 issued by a program.
 capabilties
 for performing permission checks

Slide 53

Slide 53 text

@hayorov The History 2008 namespaces, LXC 1970s Unix v7, chroot 2000 FreeBSD, Jails 2004 Solaris, Zones 2005 Open VZ 2006 Linux cgroups 2013 Docker

Slide 54

Slide 54 text

@hayorov The History 2008 namespaces, LXC 1970s Unix v7, chroot 2000 FreeBSD, Jails 2004 Solaris, Zones 2005 Open VZ 2006 Linux cgroups 2013 Docker 2015 OCI, runC

Slide 55

Slide 55 text

@hayorov The History 2008 namespaces, LXC 1970s Unix v7, chroot 2000 FreeBSD, Jails 2004 Solaris, Zones 2005 Open VZ 2006 Linux cgroups 2013 Docker 2015 OCI, runC 2016 CRI-O

Slide 56

Slide 56 text

@hayorov The History 2008 namespaces, LXC 1970s Unix v7, chroot 2000 FreeBSD, Jails 2004 Solaris, Zones 2005 Open VZ 2006 Linux cgroups 2013 Docker 2015 OCI, runC 2016 CRI-O 2016 rkt

Slide 57

Slide 57 text

@hayorov The History 2008 namespaces, LXC 1970s Unix v7, chroot 2000 FreeBSD, Jails 2004 Solaris, Zones 2005 Open VZ 2006 Linux cgroups 2013 Docker 2015 OCI, runC 2016 CRI-O 2016 rkt

Slide 58

Slide 58 text

@hayorov The History 2008 namespaces, LXC 1970s Unix v7, chroot 2000 FreeBSD, Jails 2004 Solaris, Zones 2005 Open VZ 2006 Linux cgroups 2013 Docker 2015 OCI, runC 2017 Kata Containers 2016 CRI-O 2016 rkt

Slide 59

Slide 59 text

@hayorov The History 2008 namespaces, LXC 1970s Unix v7, chroot 2000 FreeBSD, Jails 2004 Solaris, Zones 2005 Open VZ 2006 Linux cgroups 2013 Docker 2015 OCI, runC 2017 Kata Containers 2018 gVisor 2016 CRI-O 2016 rkt

Slide 60

Slide 60 text

@hayorov The History 2008 namespaces, LXC 1970s Unix v7, chroot 2000 FreeBSD, Jails 2004 Solaris, Zones 2005 Open VZ 2006 Linux cgroups 2013 Docker 2015 OCI, runC 2017 Kata Containers 2018 gVisor 2016 CRI-O 2018 Firecracker 2016 rkt

Slide 61

Slide 61 text

@hayorov VMs vs Containers * Only Type II VMM needs to run on operating system.

Slide 62

Slide 62 text

@hayorov Attacks via the Kernel Kernel Container Node host

Slide 63

Slide 63 text

@hayorov Attacks via the Kernel Kernel Container Node host

Slide 64

Slide 64 text

@hayorov Attacks via the Kernel Kernel Container Node host Escape!

Slide 65

Slide 65 text

@hayorov Isolation ≠ Secure Attacks via the Kernel Kernel Container Node host Escape!

Slide 66

Slide 66 text

@hayorov We want it all … secured zero config lightweight

Slide 67

Slide 67 text

@hayorov gVisor Sandbox for Containers Independent user space kernel Container gVisor Kernel Independent user Hardware Limited System Calls System Calls Strong Isolation

Slide 68

Slide 68 text

@hayorov Architecture

Slide 69

Slide 69 text

@hayorov Architecture OCI

Slide 70

Slide 70 text

@hayorov Architecture runsc OCI

Slide 71

Slide 71 text

@hayorov Architecture runsc OCI runtime powered by gVisor OCI

Slide 72

Slide 72 text

@hayorov Sandbox Architecture runsc OCI runtime powered by gVisor OCI

Slide 73

Slide 73 text

@hayorov Sandbox Architecture Container Sentry (emulated Linux Kernel) runsc OCI runtime powered by gVisor OCI

Slide 74

Slide 74 text

@hayorov Sandbox Architecture Container Sentry (emulated Linux Kernel) runsc User Kernel OCI runtime powered by gVisor OCI

Slide 75

Slide 75 text

@hayorov Sandbox Architecture Container Sentry (emulated Linux Kernel) KVM seccomp + ns Host Linux Kernel runsc User Kernel OCI runtime powered by gVisor OCI

Slide 76

Slide 76 text

@hayorov Sandbox Architecture Container Sentry (emulated Linux Kernel) KVM seccomp + ns Host Linux Kernel runsc User Kernel OCI runtime powered by gVisor OCI

Slide 77

Slide 77 text

@hayorov Sandbox Architecture Container Sentry (emulated Linux Kernel) KVM seccomp + ns Host Linux Kernel runsc User Kernel Gofer 9P OCI runtime powered by gVisor OCI

Slide 78

Slide 78 text

@hayorov Sandbox Architecture Container Sentry (emulated Linux Kernel) KVM seccomp + ns Host Linux Kernel runsc User Kernel Gofer 9P OCI runtime powered by gVisor OCI

Slide 79

Slide 79 text

@hayorov How to start • Locally (macOS Docker)

Slide 80

Slide 80 text

@hayorov How to start • Locally (macOS Docker) $ wget https://storage.googleapis.com/gvisor/releases/nightly/latest/runsc

Slide 81

Slide 81 text

@hayorov How to start • Locally (macOS Docker) $ wget https://storage.googleapis.com/gvisor/releases/nightly/latest/runsc $ cat ~/.docker/daemon.json (taskbar > Preferences > Daemon > Advanced)

Slide 82

Slide 82 text

@hayorov How to start • Locally (macOS Docker) $ wget https://storage.googleapis.com/gvisor/releases/nightly/latest/runsc "default-runtime": "runc", "runtimes": { "runsc": { "path": “/usr/allexx/foo/runsc“ } } $ cat ~/.docker/daemon.json (taskbar > Preferences > Daemon > Advanced)

Slide 83

Slide 83 text

@hayorov How to start • Locally (macOS Docker) $ wget https://storage.googleapis.com/gvisor/releases/nightly/latest/runsc "default-runtime": "runc", "runtimes": { "runsc": { "path": “/usr/allexx/foo/runsc“ } } $ docker run --rm --runtime=runsc -it alpine $ cat ~/.docker/daemon.json (taskbar > Preferences > Daemon > Advanced)

Slide 84

Slide 84 text

@hayorov How to start • GKE (managed Kubernetes)

Slide 85

Slide 85 text

@hayorov How to start • GKE (managed Kubernetes) Create a new node pool gcloud beta container node-pools create [NODE_POOL_NAME] \ --cluster=[CLUSTER_NAME] \ --node-version=[NODE_VERSION] \ --image-type=cos_containerd \ --sandbox type=gvisor \

Slide 86

Slide 86 text

@hayorov How to start • GKE (managed Kubernetes) Create a new node pool gcloud beta container node-pools create [NODE_POOL_NAME] \ --cluster=[CLUSTER_NAME] \ --node-version=[NODE_VERSION] \ --image-type=cos_containerd \ --sandbox type=gvisor \ $ kubectl get runtimeclasses NAME AGE gvisor 19s

Slide 87

Slide 87 text

@hayorov How to start • GKE (managed Kubernetes) https: //cloud.google.com/kubernetes-engine/docs/how-to/sandbox-pods

Slide 88

Slide 88 text

@hayorov How to start • GKE (managed Kubernetes) Running an application kind: Deployment metadata: name: httpd spec: replicas: 1 selector: matchLabels: app: httpd template: metadata: labels: app: httpd spec: runtimeClassName: gvisor containers: - name: httpd image: httpd https: //cloud.google.com/kubernetes-engine/docs/how-to/sandbox-pods

Slide 89

Slide 89 text

@hayorov How to start • GKE (managed Kubernetes) Running an application kind: Deployment metadata: name: httpd spec: replicas: 1 selector: matchLabels: app: httpd template: metadata: labels: app: httpd spec: runtimeClassName: gvisor containers: - name: httpd image: httpd Enable raw sockets spec: containers: - name: my-container securityContext: capabilities: add: ["NET_RAW"] https: //cloud.google.com/kubernetes-engine/docs/how-to/sandbox-pods

Slide 90

Slide 90 text

@hayorov Applicability and performance https: // www.usenix.org/system/files/hotcloud19-paper-young.pdf

Slide 91

Slide 91 text

@hayorov Applicability and performance • Of 330 syscalls, 233 syscalls have a full or partial implementation. https: // www.usenix.org/system/files/hotcloud19-paper-young.pdf

Slide 92

Slide 92 text

@hayorov Applicability and performance • Of 330 syscalls, 233 syscalls have a full or partial implementation. elasticsearch golang java8 jenkins mariadb memcached mongo nginx node php postgres prometheus python https: // www.usenix.org/system/files/hotcloud19-paper-young.pdf

Slide 93

Slide 93 text

@hayorov Applicability and performance • Of 330 syscalls, 233 syscalls have a full or partial implementation. elasticsearch golang java8 jenkins mariadb memcached mongo nginx node php postgres prometheus python elasticsearch golang java8 jenkins mariadb memcached mongo nginx node php postgres prometheus python https: // www.usenix.org/system/files/hotcloud19-paper-young.pdf

Slide 94

Slide 94 text

@hayorov Applicability and performance • Of 330 syscalls, 233 syscalls have a full or partial implementation. • Performance 
 
 CPU (events/sec) no diff Startup time (ms) no diff Mem (usage, MB) 35Mb Net (rps) -50%
 … small operations (I/O) impose a large overhead. elasticsearch golang java8 jenkins mariadb memcached mongo nginx node php postgres prometheus python elasticsearch golang java8 jenkins mariadb memcached mongo nginx node php postgres prometheus python https: // www.usenix.org/system/files/hotcloud19-paper-young.pdf

Slide 95

Slide 95 text

@hayorov Applicability and performance • Of 330 syscalls, 233 syscalls have a full or partial implementation. • Performance 
 
 CPU (events/sec) no diff Startup time (ms) no diff Mem (usage, MB) 35Mb Net (rps) -50%
 … small operations (I/O) impose a large overhead. elasticsearch golang java8 jenkins mariadb memcached mongo nginx node php postgres prometheus python elasticsearch golang java8 jenkins mariadb memcached mongo nginx node php postgres prometheus python https: // www.usenix.org/system/files/hotcloud19-paper-young.pdf • NO direct access to hardware or virtualization (no GPU)

Slide 96

Slide 96 text

быть безопасными? Могут ли контейнеры

Slide 97

Slide 97 text

@hayorov CVE-2017-1002101: Host-resolved symlinks Kernel Container Volume Volume Node host Escape!

Slide 98

Slide 98 text

быть безопасными? Могут ли контейнеры скорее нет

Slide 99

Slide 99 text

@hayorov So Now What?

Slide 100

Slide 100 text

@hayorov So Now What? • Configure a security context (runAsUser != 0)

Slide 101

Slide 101 text

@hayorov So Now What? • Configure a security context (runAsUser != 0) • Keep your software Up-to-date (OS, runtime, Kubernetes)

Slide 102

Slide 102 text

@hayorov So Now What? • Configure a security context (runAsUser != 0) • Discover Falco to start monitoring abnormal activities of your (GKE-compatible) • Keep your software Up-to-date (OS, runtime, Kubernetes)

Slide 103

Slide 103 text

@hayorov So Now What? • Setup “sandboxed nodepool” with gVisor for the riskiest workload • Configure a security context (runAsUser != 0) • Discover Falco to start monitoring abnormal activities of your (GKE-compatible) • Keep your software Up-to-date (OS, runtime, Kubernetes)

Slide 104

Slide 104 text

@hayorov So Now What? • Setup “sandboxed nodepool” with gVisor for the riskiest workload • Configure a security context (runAsUser != 0) • Discover Falco to start monitoring abnormal activities of your (GKE-compatible) • Learn about alternatives: Kata containers and Firecracker MicroVMs • Keep your software Up-to-date (OS, runtime, Kubernetes)

Slide 105

Slide 105 text

@hayorov So Now What? • Setup “sandboxed nodepool” with gVisor for the riskiest workload • Configure a security context (runAsUser != 0) • Discover Falco to start monitoring abnormal activities of your (GKE-compatible) • Learn about alternatives: Kata containers and Firecracker MicroVMs • Use dedicated instances (VMs, Bare Metal) or services in special cases • Keep your software Up-to-date (OS, runtime, Kubernetes)

Slide 106

Slide 106 text

Thank you questions… Alex Khaerov t.me/hayorov http://bit.ly/xxx