Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Testable Kubernetes Operators? by Marcel Müller

Testable Kubernetes Operators? by Marcel Müller

DevOps Gathering

March 11, 2020
Tweet

More Decks by DevOps Gathering

Other Decks in Programming

Transcript

  1. Operator Definition 3 - Kubernetes’ controllers concept lets you extend

    the clusters behaviour without modifying the code of Kubernetes itself. - Operators are clients of the Kubernetes API that act as controllers for a Custom Resource. Source: https://kubernetes.io/docs/concepts/extend-kubernetes/operator/
  2. Controller Definition 4 - A controller tracks at least one

    Kubernetes resource type. - These objects have a spec field that represents the desired state. - The controller(s) for that resource are responsible for making the current state come closer to that desired state. Source: https://kubernetes.io/docs/concepts/architecture/controller/
  3. - Operators act like controllers - Operators are clients of

    the Kubernetes API 5 K8s-API Operator Watch CR Controller manager Kube scheduler
  4. - Desired state in Spec - Current state as reality

    - Reconcile by applying diff to current state - Periodically get desired state through list 6 Observe Kubernetes watch|list Evaluate against current state Reconcile
  5. - Watches Prometheus CR - Creates prometheus pod deployments -

    Continuously reconciles desired configuration with actual deployment Example: Prometheus-Operator 8 https://github.com/coreos/prometheus-operator
  6. - Watches Cluster CR - Creates kubernetes clusters on AWS

    matching CR Spec - Continuously reconciles desired configuration with actual cluster Example: AWS-Operator 9 https://github.com/kubernetes-sigs/cluster-api https://github.com/giantswarm/aws-operator
  7. - Operators as a tool for infrastructure management - Managing

    stateful resources outside of k8s External APIs 11
  8. - Same challenges as other applications - Already has hard

    dependency on k8s API External APIs 12
  9. - Reconciliation might never reach consistent state - Multiple loops

    might be needed for consistency Eventual consistency 13
  10. - []string in the object metadata - Object will only

    be deleted from the k8s API if empty - Deletion events will be replayed while finalizers exist Finalizers 16
  11. - deletionTimestamp indicates that object should be deleted - Controllers

    should remove themselves! 17 apiVersion: cluster.k8s.io/v1alpha1 kind: Cluster metadata: … deletionTimestamp: "2019-11-12T12:45:47Z" finalizers: - aws-operator-cluster-controller - cluster-operator-cluster-controller name: demo namespace: default ...
  12. - Improve stability in case of missed events - Absolutely

    necessary if multiple controllers watch the same object! Finalizers 18
  13. - Identifies the state of an object as a number

    - Changes only if the object has changed Resource Version 19
  14. - Should be stored when reading the object - Can

    be applied again when updating the object -> Ensures that it has not changed in the meantime 20 apiVersion: cluster.k8s.io/v1alpha1 kind: Cluster metadata: … name: demo namespace: default resourceVersion: "22453751" ...
  15. - Prevents most simple race conditions - Prevents accidental object

    manipulation in test suites! Resource Version 21
  16. - Status is defined by you - Treat the CR

    as an API apiVersion: v1 kind: Node metadata: ... spec: ... status: ... conditions: - lastHeartbeatTime: "2019-11-12T13:02:14Z" lastTransitionTime: "2019-11-12T13:02:14Z" message: Calico is running on this node reason: CalicoIsUp status: "False" type: NetworkUnavailable ... 23
  17. - Good status implementation allows tests to fail fast -

    Transition timestamps give performance insights Status 24
  18. - Waiting in the reconciliation loops introduces more timeouts -

    Tests should decide when an action is taking too long - Enforce SLAs in tests, monitor them in production Never wait during reconciliation 26
  19. - Issues can be very complex in a distributed system

    - Don’t settle for some logs - kind export logs Get all the logs 27 https://github.com/kubernetes-sigs/kind
  20. - Don’t get complacent - Most issues with flapping are

    actual issues with the operator “It’s just another flap” 28
  21. Questions? Stay in touch - Twitter @muemarcel - Github MarcelMue

    - Meet me at the conference! Thank you! 30