Upgrade to Pro — share decks privately, control downloads, hide ads and more …

kuberception: Self Hosting kubernetes

Tasdik Rahman
December 10, 2018

kuberception: Self Hosting kubernetes

Tasdik Rahman

December 10, 2018
Tweet

More Decks by Tasdik Rahman

Other Decks in Technology

Transcript

  1. Kuberception: Self Hosting Kubernetes
    Tasdik Rahman
    @tasdikrahman | tasdikrahman.me

    View Slide

  2. Who is this talk for
    Kubernetes cluster operators
    People evaluating alternatives to
    KOPS/kubeadm etc. or a
    managed solution
    Kubeadm
    OR

    View Slide

  3. Agenda
    1. What is Self Hosted Kubernetes.
    2. Why?
    3. How does it work?
    4. Learnings from running it on production.
    5. What’s next?

    View Slide

  4. Brief Intro to Kubernetes

    View Slide

  5. Section Header
    Picture Credits: https://elastisys.com/

    View Slide

  6. What is Self hosted
    Kubernetes?

    View Slide

  7. Self hosted Kubernetes
    It runs all required and optional components of a Kubernetes cluster on top of
    Kubernetes itself. The kubelet manages itself or is managed by the system init
    and all the Kubernetes components can be managed by using Kubernetes APIs.
    *Ref: CoreOS tectonic docs

    View Slide

  8. Self hosted Kubernetes

    View Slide

  9. Is it something new?
    * https://github.com/kubernetes/kubernetes/issues/246

    View Slide

  10. Why?

    View Slide

  11. Desired control plane components properties
    ● Highly available
    ● Should be able to tolerate node failures
    ● Scale up and down with requirements
    ● Rollback and upgrades
    ● Monitoring and alerting
    ● Resource allocation
    ● RBAC

    View Slide

  12. How is Self Hosted kubernetes addressing them?
    ● Small Dependencies
    ● Deployment consistency
    ● Introspection
    ● Cluster Upgrades
    ● Easier Highly-Available Configurations
    ● Streamlined, cluster lifecycle management.

    View Slide

  13. Small Dependencies
    Kubelet kubeconfig
    Container runtime
    Minimal on-host requirements

    View Slide

  14. No distinction between master and
    worker nodes

    View Slide

  15. You select master nodes by adding
    labels to it
    $ kubectl label node node1 master=true

    View Slide

  16. Introspection

    View Slide

  17. Cluster upgrades

    View Slide

  18. Easier Highly Available configurations

    View Slide

  19. Streamlined Cluster Lifecycle
    management
    $ kubectl apply -f kube-apiserver.yaml
    $ kubectl apply -f controller-manager.yaml
    $ kubectl apply -f flannel.yaml
    $ kubectl apply -f my-app.yaml

    View Slide

  20. How does it work?

    View Slide

  21. Three main problems to solve for
    it work

    View Slide

  22. Bootstrapping Upgrades
    Disaster
    recovery

    View Slide

  23. Bootstrapping
    ● Control plane running as daemonsets, deployments.
    Making use of secrets and configmaps
    ● But … We need a control to plane to apply these
    deployments and daemonsets on

    View Slide

  24. Credits: ASAPScience

    View Slide

  25. Then how should we solve this?

    View Slide

  26. Use a temporary, static control
    plane to bootstrap the cluster

    View Slide

  27. Bootkube: Looking at it from
    10,000 feet

    View Slide

  28. Temporary
    control-plane
    manifests
    Self hosted
    control-plane
    manifests
    Master (initial) node
    Bootkube
    Temporary
    control-plane
    self-hosted
    control-plane

    View Slide

  29. etcd
    Bootkube
    (kube-apiserver,
    controller-manager,
    scheduler)
    System kubelet
    (managed by system init)
    api-server scheduler
    controller-
    manager
    Ephemeral control plane being
    brought up by bootkube.

    View Slide

  30. etcd
    System kubelet
    (managed by system init)
    api-server scheduler
    controller-
    manager
    Bootkube exits, bringing down the
    ephemeral control plane.

    View Slide

  31. View Slide

  32. Does this even work?

    View Slide

  33. Controller node(master)

    View Slide

  34. Demo

    View Slide

  35. Do a kubernetes control plane
    component version upgrade on the test
    cluster

    View Slide

  36. Learnings from running
    it on production
    clusters

    View Slide

  37. ● Using the right instance types for the compute instances.
    ● Self hosted etcd outage.
    ● api-server crashing during image upgrade.
    ● appropriate resource limits.
    ● Disaster recovery (etcd-backups/bootkube recover/heptio ark).
    ● Blue-green clusters.
    ● Kubelet OOM’d.
    ● Cross checking for compatibility with the cluster upgrade.
    What went wrong and what went right

    View Slide

  38. What’s next?

    View Slide

  39. Automate the boring stuff
    Credits: AlSweigart

    View Slide

  40. Automate
    ● Extend kubernetes by leveraging CRD’s
    ● The cluster upgrade part can be delegated to an operator.
    ● Custom systemd/shell scripts
    ● .

    View Slide

  41. Future of bootkube
    ● Will be replaced by Kubelet pod API.
    ○ The write API would enable an external installation program to setup the control plane of a
    self-hosted Kubernetes cluster without requiring an existing API server.

    View Slide

  42. Links
    ● Github repo used in the demo for setting up the self hosted k8s test cluster
    using typhoon: https://github.com/tasdikrahman/infra
    ● https://typhoon.psdn.io/: used as baseline for this demo to create the self
    hosted k8s cluster.

    View Slide

  43. References
    ● SIG-lifecycle Spec on self hosted kubernetes
    ● bootkube: Design principles
    ● bootkube: How does is work
    ● bootkube: Upgrading the kubernetes cluster
    ● SIG lifecycle google groups early discussions on self hosting

    View Slide

  44. Credits
    ● @rmenn, @hashfyre, @gappan28 for teaching me what I know.
    ● @aaronlevy and @dghubble for always being there on #bootkube on k8s
    slack to clear up any questions on bootkube.
    ● @kubenetesio for sharing the slide template.
    ● The OSS contributors out there who have made k8s and the ecosystem
    around it, what it is today.
    ● Arjun for lending me his laptop for DevOpsdays.

    View Slide

  45. Questions?
    tasdikrahman.me | @tasdikrahman

    View Slide