Operator Definition
3
- Kubernetes’ controllers concept lets you extend
the clusters behaviour without modifying the code
of Kubernetes itself.
- Operators are clients of the Kubernetes API that
act as controllers for a Custom Resource.
Source: https://kubernetes.io/docs/concepts/extend-kubernetes/operator/
Slide 4
Slide 4 text
Controller Definition
4
- A controller tracks at least one Kubernetes
resource type.
- These objects have a spec field that represents
the desired state.
- The controller(s) for that resource are
responsible for making the current state come
closer to that desired state.
Source: https://kubernetes.io/docs/concepts/architecture/controller/
Slide 5
Slide 5 text
- Operators act like
controllers
- Operators are clients of the
Kubernetes API
5
K8s-API Operator
Watch CR
Controller
manager
Kube
scheduler
Slide 6
Slide 6 text
- Desired state in Spec
- Current state as reality
- Reconcile by applying diff
to current state
- Periodically get desired
state through list
6
Observe Kubernetes watch|list
Evaluate against current state
Reconcile
Slide 7
Slide 7 text
Operator Definition
7
Kubernetes API
Operator
watch events
Custom
Resource
(Definition)
submits
Take a
decision
Slide 8
Slide 8 text
- Watches Prometheus CR
- Creates prometheus pod
deployments
- Continuously reconciles
desired configuration with
actual deployment
Example:
Prometheus-Operator
8
https://github.com/coreos/prometheus-operator
Slide 9
Slide 9 text
- Watches Cluster CR
- Creates kubernetes clusters
on AWS matching CR Spec
- Continuously reconciles
desired configuration with
actual cluster
Example:
AWS-Operator
9
https://github.com/kubernetes-sigs/cluster-api
https://github.com/giantswarm/aws-operator
Slide 10
Slide 10 text
What makes operators hard to test?
10
Slide 11
Slide 11 text
- Operators as a tool for
infrastructure management
- Managing stateful resources
outside of k8s
External APIs
11
Slide 12
Slide 12 text
- Same challenges as other
applications
- Already has hard dependency
on k8s API
External APIs
12
Slide 13
Slide 13 text
- Reconciliation might never
reach consistent state
- Multiple loops might be
needed for consistency
Eventual
consistency
13
- []string in the object
metadata
- Object will only be deleted
from the k8s API if empty
- Deletion events will be
replayed while finalizers
exist
Finalizers
16
Slide 17
Slide 17 text
- deletionTimestamp indicates
that object should be
deleted
- Controllers should remove
themselves!
17
apiVersion: cluster.k8s.io/v1alpha1
kind: Cluster
metadata:
…
deletionTimestamp: "2019-11-12T12:45:47Z"
finalizers:
- aws-operator-cluster-controller
- cluster-operator-cluster-controller
name: demo
namespace: default
...
Slide 18
Slide 18 text
- Improve stability in case of
missed events
- Absolutely necessary if
multiple controllers watch
the same object!
Finalizers
18
Slide 19
Slide 19 text
- Identifies the state of an
object as a number
- Changes only if the object
has changed
Resource Version
19
Slide 20
Slide 20 text
- Should be stored when
reading the object
- Can be applied again when
updating the object
-> Ensures that it has not
changed in the meantime
20
apiVersion: cluster.k8s.io/v1alpha1
kind: Cluster
metadata:
…
name: demo
namespace: default
resourceVersion: "22453751"
...
Slide 21
Slide 21 text
- Prevents most simple race
conditions
- Prevents accidental object
manipulation in test suites!
Resource Version
21
Slide 22
Slide 22 text
- Status as reflection of
current state
- Reflect error and failure
states
Status
22
Slide 23
Slide 23 text
- Status is defined by you
- Treat the CR as an API
apiVersion: v1
kind: Node
metadata:
...
spec:
...
status:
...
conditions:
- lastHeartbeatTime: "2019-11-12T13:02:14Z"
lastTransitionTime: "2019-11-12T13:02:14Z"
message: Calico is running on this node
reason: CalicoIsUp
status: "False"
type: NetworkUnavailable
...
23
Slide 24
Slide 24 text
- Good status implementation
allows tests to fail fast
- Transition timestamps give
performance insights
Status
24
Slide 25
Slide 25 text
Experiences from testing our
operators
25
Slide 26
Slide 26 text
- Waiting in the
reconciliation loops
introduces more timeouts
- Tests should decide when an
action is taking too long
- Enforce SLAs in tests,
monitor them in production
Never wait
during
reconciliation
26
Slide 27
Slide 27 text
- Issues can be very complex
in a distributed system
- Don’t settle for some logs
- kind export logs
Get all the logs
27
https://github.com/kubernetes-sigs/kind
Slide 28
Slide 28 text
- Don’t get complacent
- Most issues with flapping
are actual issues with the
operator
“It’s just
another flap”
28
Slide 29
Slide 29 text
Testable kubernetes operators?
29
Slide 30
Slide 30 text
Questions?
Stay in touch
- Twitter @muemarcel
- Github MarcelMue
- Meet me at the conference!
Thank you!
30