$30 off During Our Annual Pro Sale. View Details »

Hypernetes: Multi-tenant and Secure Kubernetes

Hypernetes: Multi-tenant and Secure Kubernetes

Hypernetes: Multi-tenant and Secure Kubernetes This is the speak I did at LinuxCon EU 2016

Lei (Harry) Zhang

September 28, 2016
Tweet

More Decks by Lei (Harry) Zhang

Other Decks in Technology

Transcript

  1. Bringing Security and Multi-
    tenancy to Kubernetes
    Lei (Harry) Zhang

    View Slide

  2. About Me
    • Lei (Harry) Zhang
    • #Microsoft MVP in cloud and datacenter management
    • though I’m a Linux guy :/
    • Previous: VMware, Baidu
    • Feature maintainer of Kubernetes
    • HyperCrew: https://hyper.sh
    • Publications: Docker & Kubernetes Under the Hood
    • PhD candidate @ZJU: Large-scale cluster management and scheduling

    View Slide

  3. A survey about “boundary”
    • Are you comfortable with Linux containers as an effective boundary?
    • Yes, I use containers in my private/safe environment
    • No, I use containers to serve the public cloud

    View Slide

  4. As long as we care security…
    • We have to wrap containers inside full-blown virtual machines
    • But we lose cloud-native deployment
    • Slow startup time
    • Huge resources wasting
    • Memory tax for every container
    • …
    dream
    reality

    View Slide

  5. Revisit container
    • Container Runtime
    • The dynamic view and boundary of
    your running process
    • Container Image
    • The static view of your program, data,
    dependencies, files and directories
    namespace cgroups
    FROM busybox
    ADD temp.txt /
    VOLUME /data
    CMD [“echo hello"]
    Read-Write Layer & /data
    “echo hello”
    read-only layer
    /bin /dev /etc /home /lib /
    lib64 /media /mnt /opt /proc /
    root /run /sbin /sys /tmp /
    usr /var /data /temp.txt
    /etc/hosts /etc/hostname /etc/resolv.conf
    read-write layer
    /tem
    p.txt
    json
    json
    init layer
    FROM busybox
    ADD temp.txt /
    VOLUME /data
    CMD [“echo hello"]
    Docker Container

    View Slide

  6. HyperContainer
    Secure Kubernetes from runtime level

    View Slide

  7. HyperContainer
    • Container Runtime
    • RunV
    • https://github.com/hyperhq/runv
    • The OCI compatible hypervisor based runtime implementation
    • Widely adopted by companies like Huawei etc
    • Control daemon
    • https://github.com/hyperhq/hyperd
    • Container Image
    • Docker Image Spec

    View Slide

  8. Combine the best parts
    • Portable and behaves like a Linux container
    • $ hyperctl run -t busybox echo helloworld
    • sub-second startup time*, ~12MB memory cost
    • Fully isolated sandbox with an independent guest kernel
    • $ hyperctl exec -t busybox uname -r
    • 4.4.12-hyper (or your provided kernel)
    • security, backward compatibility, maturity
    See: http://hypercontainer.io/why-hyper.html

    View Slide

  9. HyperContainer is a Pod
    • That’s how HyperContainer fits into the Kubernetes philosophy
    • Wait, why Pod is so important?

    View Slide

  10. Pod: lesson learned from Borg
    • Should sample.war be packaged with Tomcat?

    View Slide

  11. Pod: lesson learned from Borg
    • InitContainers: one or more containers started in
    sequence before the pod's normal containers are
    started.
    • Share volumes, perform network operations, and
    perform computation prior to the app containers.

    View Slide

  12. So, Pod is
    • The group of super-affinity containers
    • The atomic scheduling unit
    • The process group in container cloud
    • Do right things
    • without modifying your container image
    • Kubernetes = Spring Framework
    • Pod = IoC
    Pod
    log app
    infra container
    volume
    init container

    View Slide

  13. Pod is not easy to simulate
    • log super affinity app
    • Requirement:
    • app: 1G, log: 0.5G
    • Available:
    • Node_A: 1.25G, Node_B: 2G
    • What happens if app scheduled to Node_A?

    View Slide

  14. HyperContainer is a Pod
    • Linux container based runtimes
    • wraps and encapsulates several app containers into a logical group
    • Hypervisor container based runtime
    • hypervisor serves as a natural boundary of Pod

    View Slide

  15. HyperContainer is a Pod
    • kubelet Container Runtime Interface
    • create sandbox Foo --> create container C --> start container
    C
    • stop container C --> remove container C --> delete sandbox
    Foo
    • Sandbox
    • Normally: the infra container
    • HyperContainer: hypervisor
    • with HyperKernel
    • a HyperStart process as PID 1
    • setup mnt namespace, launch apps from the images etc

    View Slide

  16. Hypernetes
    Kubernetes with HyperContainer Runtime

    View Slide

  17. Hypernetes
    • Also: h8s
    1. Kubernetes + HyperContainer runtime
    • officially supported by using kubernetes/frakti
    2. Multi-tenant network and persistent volumes
    • battle tested Neutron + Cinder plugin

    View Slide

  18. Multi-tenant Network

    View Slide

  19. Multi-tenant Network
    • Goal:
    • leveraging tenant-aware neutron network for Kubernetes
    • following the network plugin workflow
    • Non-goal:
    • break k8s network model or hack k8s code

    View Slide

  20. Define the Network
    • Network
    • a top class api object
    • each tenant (created by Keystone) has its own Network
    • Network mapping to Neutron “net”
    • a Network Controller is responsible to manage Network lifecycle

    View Slide

  21. Example
    kubelet
    SyncLoop
    controller-manager
    ControlLoop
    kubelet
    SyncLoop
    proxy
    proxy
    network
    pod
    replica
    namespace
    service
    job
    deployment
    volume
    petset

    etcd
    scheduler
    api-server
    Desired World
    Real World
    Call Neutron to
    create/delete
    network

    View Slide

  22. Kubernetes Network Model
    • Container reach container
    • all containers can communicate with all other containers without NAT
    • Node reach container
    • all nodes can communicate with all containers (and vice-versa) without NAT
    • IP addressing
    • Pod in cluster can be addressed by its IP

    View Slide

  23. How h8s fits that?
    • Network can be assigned to one or more
    Namespaces
    • Pods belonging to the same Network can
    reach each other directly through IP
    • a Pod’s network mapping to Neutron “port”
    • kubelet is responsible for Pod network setup
    • let’s see how kubelet works

    View Slide

  24. Example
    kubelet
    SyncLoop
    kubelet
    SyncLoop
    proxy
    proxy
    1 Pod created
    etcd
    scheduler
    api-server

    View Slide

  25. Example
    kubelet
    SyncLoop
    kubelet
    SyncLoop
    proxy
    proxy
    2 Pod object added
    etcd
    scheduler
    api-server

    View Slide

  26. Example
    kubelet
    SyncLoop
    kubelet
    SyncLoop
    proxy
    proxy
    3.1 New pod object detected
    3.2 Bind pod with node
    etcd
    scheduler
    api-server

    View Slide

  27. Example
    kubelet
    SyncLoop
    kubelet
    SyncLoop
    proxy
    proxy
    4.1 Detected pod bind with me
    4.2 Start containers in pod
    etcd
    scheduler
    api-server

    View Slide

  28. Design of kubelet
    InitNetworkPlugin
    Choose Runtime
    ҁdocker, rkt, hyper/remote҂
    InitNetworkPlugin
    HandlePods
    {Add, Update, Remove, Delete, …}
    NodeStatus
    Network
    Status
    status
    Manager
    PLEG
    SyncLoop
    Pod Update Worker (e.g.ADD)
    • generale Pod status
    • check volume status (talk later)
    • call runtime to start containers
    • set up Pod network (see next slide)
    volume
    Manager
    PodUpdate
    image
    Manager

    View Slide

  29. Set Up Pod Network

    View Slide

  30. kubestack
    A standalone gRPC daemon
    1. to “translate” the SetUpPod request to the Neutron network API
    2. handling multi-tenant Service proxy

    View Slide

  31. Service
    $ iptables-save | grep my-service
    -A KUBE-SERVICES -d 10.0.0.116/32 -p tcp -m comment --comment "default/my-service: cluster IP" -m tcp --dport 8001 -j KUBE-SVC-KEAUNL7HVWWSEZA6
    -A KUBE-SVC-KEAUNL7HVWWSEZA6 -m comment --comment "default/my-service:" --mode random -j KUBE-SEP-6XXFWO3KTRMPKCHZ
    -A KUBE-SVC-KEAUNL7HVWWSEZA6 -m comment --comment "default/my-service:" --mode random -j KUBE-SEP-57KPRZ3JQVENLNBRZ
    -A KUBE-SEP-6XXFWO3KTRMPKCHZ -p tcp -m comment --comment "default/my-service:" -m tcp -j DNAT --to-destination 172.17.0.2:80
    -A KUBE-SEP-57KPRZ3JQVENLNBRZ -p tcp -m comment --comment "default/my-service:" -m tcp -j DNAT --to-destination 172.17.0.3:80
    portal 10.10.0.116:8001
    random mode rules
    backend rule_1
    backend rule_2
    172.17.0.2.:80
    172.17.0.3.:80
    OnServiceUpdate
    OnEndpointsUpdate

    View Slide

  32. Multi-tenant Service
    • Default iptables-based kube-proxy is not tenant aware
    • Endpoint Pods and Nodes with iptables rules are isolated into different
    networks
    • Hypernetes uses a built-in HAproxy as the Service portal
    • to handle all Service instances within same namespace
    • the same OnServiceUpdate and OnEndpointsUpdate workflow
    • ExternalProvider
    • a OpenStack LB will be created as Service
    • e.g. curl 58.215.33.98:8078

    View Slide

  33. Persistent Volume

    View Slide

  34. Kubernetes Persistent Volume
    Host
    path
    Cinder volume plugin
    Pod Pod
    mountPath mountPath
    attach
    mount
    Volume
    Manager desired
    World
    reconcile
    • Get mountedVolume from actualStateOfWorld
    • Unmount volumes in mountedVolume but not in
    desiredStateOfWorld
    • AttachVolume() if vol in desiredStateOfWorld and not
    attached
    • MountVolume() if vol in desiredStateOfWorld and not
    in mountedVolume
    • Verify devices that should be detached/unmounted are
    detached/unmounted
    • Tips:
    1. -v host:path
    2. attach VS mount
    3. Totally independent from container
    management

    View Slide

  35. Persistent Volume with HyperContainer
    • Enhanced Cinder volume plugin
    • Linux container:
    1. full OpenStack cluster
    2. query Nova to find node
    3. attach Cinder volume to host path
    4. bind mount host path to Pod containers
    • HyperContainer:
    • directly attach block devices to Pod
    • thanks to the hypervisor based Pod boundary
    • eliminates extra time to query Nova
    Host
    vol
    Enhanced
    Cinder volume plugin
    Pod Pod
    mountPath mountPath
    attach vol
    desired
    World
    reconcile
    Volume
    Manager

    View Slide

  36. PV Example
    • Create a Cinder volume
    • Claim volume by reference its
    volumeID

    View Slide

  37. Container Runtime Interface

    View Slide

  38. Future of CRI
    • Keep Docker as the only one default container runtime
    • oci-runtime, rktlet, hyperd
    • Frakti: the Remote Container Runtime Kit
    • https://github.com/kubernetes/frakti
    • welcome to tryout, star and fork

    View Slide

  39. “if image becomes non-standard”
    • e.g. Docker image becomes somehow Docker specific
    • Don’t worry, kubelet.imageManager is moving to runtime specific
    • but then k8s will probably choose
    • NO DEFAULT runtime

    View Slide

  40. Node Node
    Full Topology
    Node
    kubestack
    Neutron L2 Agent
    kube-proxy
    kubelet
    Cinder Plugin
    Pod Pod Pod Pod
    KeyStone
    Neutron
    Cinder
    Master
    Object: Network
    Ceph
    Object: Pod
    Object: …

    View Slide

  41. Summary
    • A new way to build secure and multi-tenant Kubernetes
    • Kubernetes + HyperContainer + Neutron Plugin + Cinder Plugin + Keystone
    • Project URL: https://github.com/hyperhq/hypernetes
    • Roadmap
    • Graduate HyperContainer runtime on k8s upstream
    • see HyperContainer in official k8s release
    • Neutron CNI plugin
    • Tip: https://hyper.sh is totally built on Hypernetes, try it out :)

    View Slide

  42. END
    Lei (Harry) Zhang
    @resouer

    View Slide