Slide 1

Slide 1 text

Bringing Security and Multi- tenancy to Kubernetes Lei (Harry) Zhang

Slide 2

Slide 2 text

About Me • Lei (Harry) Zhang • #Microsoft MVP in cloud and datacenter management • though I’m a Linux guy :/ • Previous: VMware, Baidu • Feature maintainer of Kubernetes • HyperCrew: • Publications: Docker & Kubernetes Under the Hood • PhD candidate @ZJU: Large-scale cluster management and scheduling

Slide 3

Slide 3 text

A survey about “boundary” • Are you comfortable with Linux containers as an effective boundary? • Yes, I use containers in my private/safe environment • No, I use containers to serve the public cloud

Slide 4

Slide 4 text

As long as we care security… • We have to wrap containers inside full-blown virtual machines • But we lose cloud-native deployment • Slow startup time • Huge resources wasting • Memory tax for every container • … dream reality

Slide 5

Slide 5 text

Revisit container • Container Runtime • The dynamic view and boundary of your running process • Container Image • The static view of your program, data, dependencies, files and directories namespace cgroups FROM busybox ADD temp.txt / VOLUME /data CMD [“echo hello"] Read-Write Layer & /data “echo hello” read-only layer /bin /dev /etc /home /lib / lib64 /media /mnt /opt /proc / root /run /sbin /sys /tmp / usr /var /data /temp.txt /etc/hosts /etc/hostname /etc/resolv.conf read-write layer /tem p.txt json json init layer FROM busybox ADD temp.txt / VOLUME /data CMD [“echo hello"] Docker Container

Slide 6

Slide 6 text

HyperContainer Secure Kubernetes from runtime level

Slide 7

Slide 7 text

HyperContainer • Container Runtime • RunV • • The OCI compatible hypervisor based runtime implementation • Widely adopted by companies like Huawei etc • Control daemon • • Container Image • Docker Image Spec

Slide 8

Slide 8 text

Combine the best parts • Portable and behaves like a Linux container • $ hyperctl run -t busybox echo helloworld • sub-second startup time*, ~12MB memory cost • Fully isolated sandbox with an independent guest kernel • $ hyperctl exec -t busybox uname -r • 4.4.12-hyper (or your provided kernel) • security, backward compatibility, maturity See:

Slide 9

Slide 9 text

HyperContainer is a Pod • That’s how HyperContainer fits into the Kubernetes philosophy • Wait, why Pod is so important?

Slide 10

Slide 10 text

Pod: lesson learned from Borg • Should sample.war be packaged with Tomcat?

Slide 11

Slide 11 text

Pod: lesson learned from Borg • InitContainers: one or more containers started in sequence before the pod's normal containers are started. • Share volumes, perform network operations, and perform computation prior to the app containers.

Slide 12

Slide 12 text

So, Pod is • The group of super-affinity containers • The atomic scheduling unit • The process group in container cloud • Do right things • without modifying your container image • Kubernetes = Spring Framework • Pod = IoC Pod log app infra container volume init container

Slide 13

Slide 13 text

Pod is not easy to simulate • log super affinity app • Requirement: • app: 1G, log: 0.5G • Available: • Node_A: 1.25G, Node_B: 2G • What happens if app scheduled to Node_A?

Slide 14

Slide 14 text

HyperContainer is a Pod • Linux container based runtimes • wraps and encapsulates several app containers into a logical group • Hypervisor container based runtime • hypervisor serves as a natural boundary of Pod

Slide 15

Slide 15 text

HyperContainer is a Pod • kubelet Container Runtime Interface • create sandbox Foo --> create container C --> start container C • stop container C --> remove container C --> delete sandbox Foo • Sandbox • Normally: the infra container • HyperContainer: hypervisor • with HyperKernel • a HyperStart process as PID 1 • setup mnt namespace, launch apps from the images etc

Slide 16

Slide 16 text

Hypernetes Kubernetes with HyperContainer Runtime

Slide 17

Slide 17 text

Hypernetes • Also: h8s 1. Kubernetes + HyperContainer runtime • officially supported by using kubernetes/frakti 2. Multi-tenant network and persistent volumes • battle tested Neutron + Cinder plugin

Slide 18

Slide 18 text

Multi-tenant Network

Slide 19

Slide 19 text

Multi-tenant Network • Goal: • leveraging tenant-aware neutron network for Kubernetes • following the network plugin workflow • Non-goal: • break k8s network model or hack k8s code

Slide 20

Slide 20 text

Define the Network • Network • a top class api object • each tenant (created by Keystone) has its own Network • Network mapping to Neutron “net” • a Network Controller is responsible to manage Network lifecycle

Slide 21

Slide 21 text

Example kubelet SyncLoop controller-manager ControlLoop kubelet SyncLoop proxy proxy network pod replica namespace service job deployment volume petset … etcd scheduler api-server Desired World Real World Call Neutron to create/delete network

Slide 22

Slide 22 text

Kubernetes Network Model • Container reach container • all containers can communicate with all other containers without NAT • Node reach container • all nodes can communicate with all containers (and vice-versa) without NAT • IP addressing • Pod in cluster can be addressed by its IP

Slide 23

Slide 23 text

How h8s fits that? • Network can be assigned to one or more Namespaces • Pods belonging to the same Network can reach each other directly through IP • a Pod’s network mapping to Neutron “port” • kubelet is responsible for Pod network setup • let’s see how kubelet works

Slide 24

Slide 24 text

Example kubelet SyncLoop kubelet SyncLoop proxy proxy 1 Pod created etcd scheduler api-server

Slide 25

Slide 25 text

Example kubelet SyncLoop kubelet SyncLoop proxy proxy 2 Pod object added etcd scheduler api-server

Slide 26

Slide 26 text

Example kubelet SyncLoop kubelet SyncLoop proxy proxy 3.1 New pod object detected 3.2 Bind pod with node etcd scheduler api-server

Slide 27

Slide 27 text

Example kubelet SyncLoop kubelet SyncLoop proxy proxy 4.1 Detected pod bind with me 4.2 Start containers in pod etcd scheduler api-server

Slide 28

Slide 28 text

Design of kubelet InitNetworkPlugin Choose Runtime ҁdocker, rkt, hyper/remote҂ InitNetworkPlugin HandlePods {Add, Update, Remove, Delete, …} NodeStatus Network Status status Manager PLEG SyncLoop Pod Update Worker (e.g.ADD) • generale Pod status • check volume status (talk later) • call runtime to start containers • set up Pod network (see next slide) volume Manager PodUpdate image Manager

Slide 29

Slide 29 text

Set Up Pod Network

Slide 30

Slide 30 text

kubestack A standalone gRPC daemon 1. to “translate” the SetUpPod request to the Neutron network API 2. handling multi-tenant Service proxy

Slide 31

Slide 31 text

Service $ iptables-save | grep my-service -A KUBE-SERVICES -d -p tcp -m comment --comment "default/my-service: cluster IP" -m tcp --dport 8001 -j KUBE-SVC-KEAUNL7HVWWSEZA6 -A KUBE-SVC-KEAUNL7HVWWSEZA6 -m comment --comment "default/my-service:" --mode random -j KUBE-SEP-6XXFWO3KTRMPKCHZ -A KUBE-SVC-KEAUNL7HVWWSEZA6 -m comment --comment "default/my-service:" --mode random -j KUBE-SEP-57KPRZ3JQVENLNBRZ -A KUBE-SEP-6XXFWO3KTRMPKCHZ -p tcp -m comment --comment "default/my-service:" -m tcp -j DNAT --to-destination -A KUBE-SEP-57KPRZ3JQVENLNBRZ -p tcp -m comment --comment "default/my-service:" -m tcp -j DNAT --to-destination portal random mode rules backend rule_1 backend rule_2 OnServiceUpdate OnEndpointsUpdate

Slide 32

Slide 32 text

Multi-tenant Service • Default iptables-based kube-proxy is not tenant aware • Endpoint Pods and Nodes with iptables rules are isolated into different networks • Hypernetes uses a built-in HAproxy as the Service portal • to handle all Service instances within same namespace • the same OnServiceUpdate and OnEndpointsUpdate workflow • ExternalProvider • a OpenStack LB will be created as Service • e.g. curl

Slide 33

Slide 33 text

Persistent Volume

Slide 34

Slide 34 text

Kubernetes Persistent Volume Host path Cinder volume plugin Pod Pod mountPath mountPath attach mount Volume Manager desired World reconcile • Get mountedVolume from actualStateOfWorld • Unmount volumes in mountedVolume but not in desiredStateOfWorld • AttachVolume() if vol in desiredStateOfWorld and not attached • MountVolume() if vol in desiredStateOfWorld and not in mountedVolume • Verify devices that should be detached/unmounted are detached/unmounted • Tips: 1. -v host:path 2. attach VS mount 3. Totally independent from container management

Slide 35

Slide 35 text

Persistent Volume with HyperContainer • Enhanced Cinder volume plugin • Linux container: 1. full OpenStack cluster 2. query Nova to find node 3. attach Cinder volume to host path 4. bind mount host path to Pod containers • HyperContainer: • directly attach block devices to Pod • thanks to the hypervisor based Pod boundary • eliminates extra time to query Nova Host vol Enhanced Cinder volume plugin Pod Pod mountPath mountPath attach vol desired World reconcile Volume Manager

Slide 36

Slide 36 text

PV Example • Create a Cinder volume • Claim volume by reference its volumeID

Slide 37

Slide 37 text

Container Runtime Interface

Slide 38

Slide 38 text

Future of CRI • Keep Docker as the only one default container runtime • oci-runtime, rktlet, hyperd • Frakti: the Remote Container Runtime Kit • • welcome to tryout, star and fork

Slide 39

Slide 39 text

“if image becomes non-standard” • e.g. Docker image becomes somehow Docker specific • Don’t worry, kubelet.imageManager is moving to runtime specific • but then k8s will probably choose • NO DEFAULT runtime

Slide 40

Slide 40 text

Node Node Full Topology Node kubestack Neutron L2 Agent kube-proxy kubelet Cinder Plugin Pod Pod Pod Pod KeyStone Neutron Cinder Master Object: Network Ceph Object: Pod Object: …

Slide 41

Slide 41 text

Summary • A new way to build secure and multi-tenant Kubernetes • Kubernetes + HyperContainer + Neutron Plugin + Cinder Plugin + Keystone • Project URL: • Roadmap • Graduate HyperContainer runtime on k8s upstream • see HyperContainer in official k8s release • Neutron CNI plugin • Tip: is totally built on Hypernetes, try it out :)

Slide 42

Slide 42 text

END Lei (Harry) Zhang @resouer