Slide 1

Slide 1 text

SRE Principle and [Kubernetes] Operator Practice Josh Wood Developer Advocate, Red Hat [email protected] – @joshixisjosh9 – joshix.com

Slide 2

Slide 2 text

Why should you care about Operators?

Slide 3

Slide 3 text

Any application in any system must be installed, configured, managed and upgraded over time Patching is critical to security

Slide 4

Slide 4 text

“Anything that isn’t automated is slowing you down”

Slide 5

Slide 5 text

$ kubectl scale deploy/staticweb --replicas=3

Slide 6

Slide 6 text

Deploying a database is easy

Slide 7

Slide 7 text

$ kubectl create deployment db --image=quay.io/my/db

Slide 8

Slide 8 text

Running a database over time is harder

Slide 9

Slide 9 text

● Resize/Upgrade ● Reconfigure ● Backup ● Healing

Slide 10

Slide 10 text

If only Kubernetes knew...

Slide 11

Slide 11 text

1. Application-specific custom controllers 2. Custom resource definitions (CRD) Extending the Kubernetes API

Slide 12

Slide 12 text

Custom Resource Developer / Kubernetes User Deployments StatefulSets Autoscalers Secrets Config maps PersistentVolume How Does an Operator Work? K8s API kind: ProductionReadyDatabase apiVersion: database .example.com/v1alpha1 metadata: name: my-important -database spec: connectionPoolSize: 300 readReplicas: 2 version: v4.0.1 Custom Kubernetes Controller Watch Events Reconciliation + Custom Resource Definition Kubernetes Operator Native Kubernetes Resources

Slide 13

Slide 13 text

Custom Resource (CR) kind: ProductionReadyDatabase apiVersion: database.example.com/v1alpha1 metadata: name: my-production-ready-database spec: clusterSize: 3 readReplicas: 2 version: v4.0.1 [...]

Slide 14

Slide 14 text

Operators are automated software managers software SREs that manage the entire lifecycle of Kubernetes applications

Slide 15

Slide 15 text

controllers Hausenblas, Schimanski. Programming Kubernetes. O’Reilly, 2019.

Slide 16

Slide 16 text

Value of Operators Improve the “time to first value” for your customers Minimize software upgrade risk and associated operational costs Embed best practices from the experts – you – into the Operator Provide a cloud-like "As a Service" experience

Slide 17

Slide 17 text

Red Hat Products ISV Partners Community TYPES OF OPERATORS OPERATOR HUB Operator Hub - Allows administrators to selectively make operators available from curated sources to users in the cluster.

Slide 18

Slide 18 text

...and many more OPERATORS ACROSS THE INDUSTRY

Slide 19

Slide 19 text

Operator Maturity Model Phase I Phase II Phase III Phase IV Phase V Basic Install Automated application provisioning and configuration management Seamless Upgrades Patch and minor version upgrades supported Full Lifecycle App lifecycle, storage lifecycle (backup, failure recovery) Deep Insights Metrics, alerts, log processing and workload analysis Auto Pilot Horizontal/vertical scaling, auto config tuning, abnormal detection, scheduling tuning

Slide 20

Slide 20 text

● O’Reilly “SRE Book” (Beyer et al) ● Carla Geisser (al) paraphrased: ~“Human intervention… is a bug” ● SREs write code to fix those bugs ● SREs write software to run other software ● SREs write Kubernetes Operators Site Reliability Engineering (SRE)

Slide 21

Slide 21 text

● Can you set operand configuration in the CR? ● Do CR changes cause non-disruptive updates to the Operand? ● Does CR status show what has and hasn’t been applied? Level 1 Installation - Deployment

Slide 22

Slide 22 text

● Can the Operator upgrade its Operand? ● Without disruption? ● Does CR status show what has and hasn’t been upgraded? Level 2 Upgrades

Slide 23

Slide 23 text

● Can your Operator back up its Operand? ● Can your Operator restore from a previous Operand backup? ● Ready/Live probes? Active monitoring of basic execution state? ● CPU and other requests and limits set for Operand? Level 3 Full Lifecycle Management

Slide 24

Slide 24 text

● Does the Operator expose metrics about its own health? ● Metrics and alerts for the Operand? ● Does CR status show what has and hasn’t been applied? Level 4 Deep Insights

Slide 25

Slide 25 text

The RED Method defines three key metrics for every service ● Rate (the number of requests per second) ● Errors (the number of those requests that are failing) ● Duration (the amount of time those requests take) RED Rate (aka Traffic) - Errors - Duration (aka Latency)

Slide 26

Slide 26 text

● Marine autopilots are reasonable models, especially with rudder position feedback ● Auto scaling, healing, tuning ○ Detect condition from metrics, scale horizontally (Replicas) or vertically (Requests/Limits) ○ Think especially about scaling back down; resource savings ○ Detecting deterioration in Operand(s) (based on Level 4’s metrics) and take action to redeploy or reconfigure ● CR Status, custom Events: Clear status and especially error conditions Level 5 Auto Pilot

Slide 27

Slide 27 text

“Toil Not, Neither Spin” (Kubernetes Operators, Dobies & Wood) SRE defines “toil” as: ● Automatable - your computer would enjoy it! ● Without enduring value - needs done but doesn’t change the system ● Grows linearly with growth of the system Level 5 (cont.) Auto Pilot

Slide 28

Slide 28 text

Operator Maturity Model Phase I Phase II Phase III Phase IV Phase V Basic Install Automated application provisioning and configuration management Seamless Upgrades Patch and minor version upgrades supported Full Lifecycle App lifecycle, storage lifecycle (backup, failure recovery) Deep Insights Metrics, alerts, log processing and workload analysis Auto Pilot Horizontal/vertical scaling, auto config tuning, abnormal detection, scheduling tuning

Slide 29

Slide 29 text

● SRE stuff: Add metrics awareness and tuning to your Operator ● Other APIs / API representations: k8fs? ● K8fs presents Kubernetes API as a synthetic file hierarchy ● % cp manifest.yaml /mnt/k8s/ns/default/deployments/ ● % echo 3 >/mnt/k8s/ns/default/deployments/myapp/replicas Experiments/Challenges “...left as an exercise for the reader…”

Slide 30

Slide 30 text

https://operatorframework.io https://operatorhub.io https://learn.openshift.com/operatorframework/ http://bit.ly/kubernetes-operators Resources

Slide 31

Slide 31 text

Thank you linkedin.com/showcase/red-hat-developer youtube - bit.ly/2YRIWTk facebook.com/redhatdeveloperprogram twitter.com/rhdevelopers 31 Josh Wood [email protected] @joshixisjosh9 joshix.com