Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Hypernetes: Multi-tenant and Secure Kubernetes

Hypernetes: Multi-tenant and Secure Kubernetes

Hypernetes: Multi-tenant and Secure Kubernetes This is the speak I did at LinuxCon EU 2016

Lei (Harry) Zhang

September 28, 2016

More Decks by Lei (Harry) Zhang

Other Decks in Technology


  1. About Me • Lei (Harry) Zhang • #Microsoft MVP in

    cloud and datacenter management • though I’m a Linux guy :/ • Previous: VMware, Baidu • Feature maintainer of Kubernetes • HyperCrew: https://hyper.sh • Publications: Docker & Kubernetes Under the Hood • PhD candidate @ZJU: Large-scale cluster management and scheduling
  2. A survey about “boundary” • Are you comfortable with Linux

    containers as an effective boundary? • Yes, I use containers in my private/safe environment • No, I use containers to serve the public cloud
  3. As long as we care security… • We have to

    wrap containers inside full-blown virtual machines • But we lose cloud-native deployment • Slow startup time • Huge resources wasting • Memory tax for every container • … dream reality
  4. Revisit container • Container Runtime • The dynamic view and

    boundary of your running process • Container Image • The static view of your program, data, dependencies, files and directories namespace cgroups FROM busybox ADD temp.txt / VOLUME /data CMD [“echo hello"] Read-Write Layer & /data “echo hello” read-only layer /bin /dev /etc /home /lib / lib64 /media /mnt /opt /proc / root /run /sbin /sys /tmp / usr /var /data /temp.txt /etc/hosts /etc/hostname /etc/resolv.conf read-write layer /tem p.txt json json init layer FROM busybox ADD temp.txt / VOLUME /data CMD [“echo hello"] Docker Container
  5. HyperContainer • Container Runtime • RunV • https://github.com/hyperhq/runv • The

    OCI compatible hypervisor based runtime implementation • Widely adopted by companies like Huawei etc • Control daemon • https://github.com/hyperhq/hyperd • Container Image • Docker Image Spec
  6. Combine the best parts • Portable and behaves like a

    Linux container • $ hyperctl run -t busybox echo helloworld • sub-second startup time*, ~12MB memory cost • Fully isolated sandbox with an independent guest kernel • $ hyperctl exec -t busybox uname -r • 4.4.12-hyper (or your provided kernel) • security, backward compatibility, maturity See: http://hypercontainer.io/why-hyper.html
  7. HyperContainer is a Pod • That’s how HyperContainer fits into

    the Kubernetes philosophy • Wait, why Pod is so important?
  8. Pod: lesson learned from Borg • InitContainers: one or more

    containers started in sequence before the pod's normal containers are started. • Share volumes, perform network operations, and perform computation prior to the app containers.
  9. So, Pod is • The group of super-affinity containers •

    The atomic scheduling unit • The process group in container cloud • Do right things • without modifying your container image • Kubernetes = Spring Framework • Pod = IoC Pod log app infra container volume init container
  10. Pod is not easy to simulate • log super affinity

    app • Requirement: • app: 1G, log: 0.5G • Available: • Node_A: 1.25G, Node_B: 2G • What happens if app scheduled to Node_A?
  11. HyperContainer is a Pod • Linux container based runtimes •

    wraps and encapsulates several app containers into a logical group • Hypervisor container based runtime • hypervisor serves as a natural boundary of Pod
  12. HyperContainer is a Pod • kubelet Container Runtime Interface •

    create sandbox Foo --> create container C --> start container C • stop container C --> remove container C --> delete sandbox Foo • Sandbox • Normally: the infra container • HyperContainer: hypervisor • with HyperKernel • a HyperStart process as PID 1 • setup mnt namespace, launch apps from the images etc
  13. Hypernetes • Also: h8s 1. Kubernetes + HyperContainer runtime •

    officially supported by using kubernetes/frakti 2. Multi-tenant network and persistent volumes • battle tested Neutron + Cinder plugin
  14. Multi-tenant Network • Goal: • leveraging tenant-aware neutron network for

    Kubernetes • following the network plugin workflow • Non-goal: • break k8s network model or hack k8s code
  15. Define the Network • Network • a top class api

    object • each tenant (created by Keystone) has its own Network • Network mapping to Neutron “net” • a Network Controller is responsible to manage Network lifecycle
  16. Example kubelet SyncLoop controller-manager ControlLoop kubelet SyncLoop proxy proxy network

    pod replica namespace service job deployment volume petset … etcd scheduler api-server Desired World Real World Call Neutron to create/delete network
  17. Kubernetes Network Model • Container reach container • all containers

    can communicate with all other containers without NAT • Node reach container • all nodes can communicate with all containers (and vice-versa) without NAT • IP addressing • Pod in cluster can be addressed by its IP
  18. How h8s fits that? • Network can be assigned to

    one or more Namespaces • Pods belonging to the same Network can reach each other directly through IP • a Pod’s network mapping to Neutron “port” • kubelet is responsible for Pod network setup • let’s see how kubelet works
  19. Example kubelet SyncLoop kubelet SyncLoop proxy proxy 3.1 New pod

    object detected 3.2 Bind pod with node etcd scheduler api-server
  20. Example kubelet SyncLoop kubelet SyncLoop proxy proxy 4.1 Detected pod

    bind with me 4.2 Start containers in pod etcd scheduler api-server
  21. Design of kubelet InitNetworkPlugin Choose Runtime ҁdocker, rkt, hyper/remote҂ InitNetworkPlugin

    HandlePods {Add, Update, Remove, Delete, …} NodeStatus Network Status status Manager PLEG SyncLoop Pod Update Worker (e.g.ADD) • generale Pod status • check volume status (talk later) • call runtime to start containers • set up Pod network (see next slide) volume Manager PodUpdate image Manager
  22. kubestack A standalone gRPC daemon 1. to “translate” the SetUpPod

    request to the Neutron network API 2. handling multi-tenant Service proxy
  23. Service $ iptables-save | grep my-service -A KUBE-SERVICES -d

    -p tcp -m comment --comment "default/my-service: cluster IP" -m tcp --dport 8001 -j KUBE-SVC-KEAUNL7HVWWSEZA6 -A KUBE-SVC-KEAUNL7HVWWSEZA6 -m comment --comment "default/my-service:" --mode random -j KUBE-SEP-6XXFWO3KTRMPKCHZ -A KUBE-SVC-KEAUNL7HVWWSEZA6 -m comment --comment "default/my-service:" --mode random -j KUBE-SEP-57KPRZ3JQVENLNBRZ -A KUBE-SEP-6XXFWO3KTRMPKCHZ -p tcp -m comment --comment "default/my-service:" -m tcp -j DNAT --to-destination -A KUBE-SEP-57KPRZ3JQVENLNBRZ -p tcp -m comment --comment "default/my-service:" -m tcp -j DNAT --to-destination portal random mode rules backend rule_1 backend rule_2 OnServiceUpdate OnEndpointsUpdate
  24. Multi-tenant Service • Default iptables-based kube-proxy is not tenant aware

    • Endpoint Pods and Nodes with iptables rules are isolated into different networks • Hypernetes uses a built-in HAproxy as the Service portal • to handle all Service instances within same namespace • the same OnServiceUpdate and OnEndpointsUpdate workflow • ExternalProvider • a OpenStack LB will be created as Service • e.g. curl
  25. Kubernetes Persistent Volume Host path Cinder volume plugin Pod Pod

    mountPath mountPath attach mount Volume Manager desired World reconcile • Get mountedVolume from actualStateOfWorld • Unmount volumes in mountedVolume but not in desiredStateOfWorld • AttachVolume() if vol in desiredStateOfWorld and not attached • MountVolume() if vol in desiredStateOfWorld and not in mountedVolume • Verify devices that should be detached/unmounted are detached/unmounted • Tips: 1. -v host:path 2. attach VS mount 3. Totally independent from container management
  26. Persistent Volume with HyperContainer • Enhanced Cinder volume plugin •

    Linux container: 1. full OpenStack cluster 2. query Nova to find node 3. attach Cinder volume to host path 4. bind mount host path to Pod containers • HyperContainer: • directly attach block devices to Pod • thanks to the hypervisor based Pod boundary • eliminates extra time to query Nova Host vol Enhanced Cinder volume plugin Pod Pod mountPath mountPath attach vol desired World reconcile Volume Manager
  27. Future of CRI • Keep Docker as the only one

    default container runtime • oci-runtime, rktlet, hyperd • Frakti: the Remote Container Runtime Kit • https://github.com/kubernetes/frakti • welcome to tryout, star and fork
  28. “if image becomes non-standard” • e.g. Docker image becomes somehow

    Docker specific • Don’t worry, kubelet.imageManager is moving to runtime specific • but then k8s will probably choose • NO DEFAULT runtime
  29. Node Node Full Topology Node kubestack Neutron L2 Agent kube-proxy

    kubelet Cinder Plugin Pod Pod Pod Pod KeyStone Neutron Cinder Master Object: Network Ceph Object: Pod Object: …
  30. Summary • A new way to build secure and multi-tenant

    Kubernetes • Kubernetes + HyperContainer + Neutron Plugin + Cinder Plugin + Keystone • Project URL: https://github.com/hyperhq/hypernetes • Roadmap • Graduate HyperContainer runtime on k8s upstream • see HyperContainer in official k8s release • Neutron CNI plugin • Tip: https://hyper.sh is totally built on Hypernetes, try it out :)