Slide 1

Slide 1 text

@shahiddev Shahid Iqbal | Freelance consultant @shahiddev Kubernetes Going beyond the basics

Slide 2

Slide 2 text

@shahiddev Very brief intro Freelance hands-on consultant working on .NET, Azure & Kubernetes .NET developer/Architect for over a decade & Microsoft MVP Based in the UK and working globally Co-organiser of the MK.net meetup in the UK @shahiddev on Twitter https://www.linkedin.com/in/shahiddev/ https://blog.headforcloud.com https://sessionize.com/shahid-iqbal

Slide 3

Slide 3 text

@shahiddev Agenda Cover more detailed concepts within Kubernetes Scheduling Admission controllers Options for extending K8s Scaling - Virtual node - KEDA Demos!

Slide 4

Slide 4 text

@shahiddev Not covering Fundamentals of Kubernetes Deep dive into creating custom controllers/operators Disclaimer: the definition of “advanced” topics is very subjective

Slide 5

Slide 5 text

@shahiddev Audience participation

Slide 6

Slide 6 text

@shahiddev Pod scheduling

Slide 7

Slide 7 text

@shahiddev Control plane node etcd API Server Scheduler Controller manager Cloud Controller manager

Slide 8

Slide 8 text

@shahiddev Scheduling pods Create a pod Scheduler detects no node assigned Assigns a node

Slide 9

Slide 9 text

@shahiddev Why influence pod scheduling/placement? Heterogenous cluster with specialised hardware/software Allocate to teams/multi-tenancy Regulatory requirements Application architecture requires components to be co-located or separated

Slide 10

Slide 10 text

@shahiddev Approaches to influencing pod scheduling Node selector Node affinity/anti-affinity Node taints/tolerations Pod affinity/anti-affinity Custom scheduler

Slide 11

Slide 11 text

@shahiddev Node selector Add nodeSelector in our PodSpec

Slide 12

Slide 12 text

No content

Slide 13

Slide 13 text

@shahiddev Node selector If you need to add custom node label Add custom key-value pair label to node

Slide 14

Slide 14 text

@shahiddev Node selector issues Basic matching on exact key-value pair Pods will fail to start if no node is found with matching label

Slide 15

Slide 15 text

@shahiddev Node affinity/anti-affinity Allows pods to decide which node to use based on labels Match on conditions rather than exactly with a key-value pair. “In”, “NotIn”, “Exists”, “DoesNotExist”, “Gt”, “Lt” Can be selective on how “demanding” you are

Slide 16

Slide 16 text

@shahiddev Specifying demand for node requiredDuringSchedulingIgnoredDuringExecution Hard requirement → Pod not scheduled preferredDuringSchedulingIgnoredDuringExecution Soft requirement → Pod still scheduled requiredDuringSchedulingRequiredDuringExecution Not implemented yet

Slide 17

Slide 17 text

@shahiddev Node affinity

Slide 18

Slide 18 text

@shahiddev Node affinity

Slide 19

Slide 19 text

@shahiddev Node selectors/affinity issues If we want to prevent certain nodes from being used we cannot do this easily. We would need to ensure EVERY deployment had a node anti- affinity.

Slide 20

Slide 20 text

@shahiddev Taints & Tolerations Allows nodes to repel pods based on taint Nodes are tainted and Pods tolerate taints Taints are comprised of: key, value and effect NoSchedule, PreferNoSchedule, NoExecute

Slide 21

Slide 21 text

@shahiddev Taints and tolerations Existing running pods can be evicted if a node is tainted and a pod doesn’t tolerate it K8s add taints to nodes in certain circumstances (node problems)

Slide 22

Slide 22 text

@shahiddev Taints & tolerations Taint node

Slide 23

Slide 23 text

@shahiddev Taints & tolerations Schedule without toleration

Slide 24

Slide 24 text

@shahiddev Taints & tolerations Add toleration to pod spec

Slide 25

Slide 25 text

@shahiddev Taints & tolerations Schedule with toleration

Slide 26

Slide 26 text

@shahiddev Taints vs Node affinity With Node affinity any pod could be scheduled unless the pod explicitly sets node anti-affinity – Node has no say in the matter Taints allow nodes to repel all pods unless they tolerate the taint - including currently running pods! Use Taints to prevent “casual scheduling” on certain Nodes E.g. where you have a limited/expensive resource

Slide 27

Slide 27 text

@shahiddev Inter-pod affinity/anti-affinity Select nodes used for pod based on what other pods are running on it Ensure certain components run on same node e.g. cache alongside app Ensure certain components don’t run in same zone e.g. ensure app components can tolerate node loss

Slide 28

Slide 28 text

@shahiddev Inter-pod affinity/anti-affinity Same constructs for indicating strictness requiredDuringSchedulingIgnoredDuringExecution preferredDuringSchedulingIgnoredDuringExecution Topologykey References a node label The “level” of infrastructure that is used to apply the rules E.g. hostname or failure domain or availability zone

Slide 29

Slide 29 text

@shahiddev TopologyKey Label key Kubernetes.io/hostname Node-1 Node-2 Node-3 Node-4 failure-domain.beta.kubernetes.io/zone 1 1 2 2 Label Value

Slide 30

Slide 30 text

@shahiddev Inter-pod affinity/anti-affinity

Slide 31

Slide 31 text

@shahiddev Pod affinity Node 1 web cache Node 2 Node 3 Replicas: 4 PodAffinity: cache (preferred) PodAntiAffinity: web Topologykey: kubernetes.io/hostname Node 4

Slide 32

Slide 32 text

@shahiddev Pod affinity Node 1 cache Node 2 Node 3 Replicas: 4 Node 4 web web web web PodAffinity: cache (preferred) PodAntiAffinity: web Topologykey: kubernetes.io/hostname PodAffinity: web (preferred) PodAntiAffinity: cache Topologykey: kubernetes.io/hostname

Slide 33

Slide 33 text

@shahiddev Pod affinity Node 1 cache Node 2 Node 3 Replicas: 4 Node 4 web web web web PodAffinity: cache (preferred) PodAntiAffinity: web Topologykey: kubernetes.io/hostname PodAffinity: web (preferred) PodAntiAffinity: cache Topologykey: kubernetes.io/hostname cache cache cache

Slide 34

Slide 34 text

@shahiddev Pod affinity – Zone topology Zone 1 Node 1 web cache Zone 1 Node 2 Zone 2 Node 3 Replicas: 2 PodAffinity: cache (preferred) PodAntiAffinity: web Topologykey: zone Zone 2 Node 4

Slide 35

Slide 35 text

@shahiddev Pod affinity – Zone topology Zone 1 Node 1 cache Zone 1 Node 2 Zone 2 Node 3 Replicas: 2 Zone 2 Node 4 web web PodAffinity: cache (preferred) PodAntiAffinity: web Topologykey: zone PodAffinity: web (preferred) PodAntiAffinity: cache Topologykey: zone

Slide 36

Slide 36 text

@shahiddev Pod affinity – Zone topology Zone 1 Node 1 cache Zone 1 Node 2 Zone 2 Node 3 Replicas: 2 Zone 2 Node 4 web PodAffinity: cache (preferred) PodAntiAffinity: web Topologykey: zone PodAffinity: web (preferred) PodAntiAffinity: cache Topologykey: zone cache web

Slide 37

Slide 37 text

@shahiddev Web front end

Slide 38

Slide 38 text

@shahiddev Cache

Slide 39

Slide 39 text

@shahiddev Pod distribution

Slide 40

Slide 40 text

@shahiddev Custom scheduler Scheduler written in any language Needs access to API server To use define scheduler name in pod spec

Slide 41

Slide 41 text

@shahiddev Controlling/Extending K8s

Slide 42

Slide 42 text

@shahiddev Taking more control… Want to have more control over resources that are created Apply custom policies to resources (e.g. must have certain labels) Prevent certain resources being created Inject additional logic transparently into resources

Slide 43

Slide 43 text

@shahiddev Admissions controllers Code that intercepts API server requests before they are persisted Controllers can be Validating – can inspect the objects but not modify Mutating – can modify the objects Both Enabled/disabled using kube-apisever Limited options on managed K8s providers Are compiled into the api server binary

Slide 44

Slide 44 text

@shahiddev Admission controllers DefaultTolerationSeconds MutatingAdmissionWebhook ValidatingAdmissionWebhook ResourceQuota Priority NamespaceLifecycle LimitRanger ServiceAccount PersistentVolumeClaimResize DefaultStorageClass

Slide 45

Slide 45 text

@shahiddev API request lifecycle HTTP handler AuthN/AuthZ Mutating admission controllers Object schema validation Validating admission controllers Persistence (etcd) Mutating admission webhooks Validating admission webhooks Mutating admission webhooks Validating admission webhooks Adapted from: https://banzaicloud.com/blog/k8s-admission-webhooks/

Slide 46

Slide 46 text

@shahiddev Admission Webhooks Implemented by two “special” admission controllers MutatingAdmissionWebhook – modifies resources/creates new resources ValidatingAdmissionWebhook – use to block resource creation Controllers invoke HTTP callback Logic doesn’t need to be compiled into api server Logic can be hosted inside/outside the cluster

Slide 47

Slide 47 text

@shahiddev QUICK DEMO ADMISSION WEBHOOKS

Slide 48

Slide 48 text

@shahiddev Open Policy Agent (OPA) Admission controllers let you tightly control what can run in your cluster. Use OPA framework uses admission control but abstracts the lower level details. https://www.openpolicyagent.org

Slide 49

Slide 49 text

@shahiddev Extending Kubernetes API Build abstractions on top of K8s resources Create entirely new resources within K8s Use kubectl to manage custom resources

Slide 50

Slide 50 text

@shahiddev Extending Kubernetes API options Extension API servers Custom resource definitions Custom controllers

Slide 51

Slide 51 text

@shahiddev Custom Resource Definitions (CRDs) A new resource type alongside the built in types Can use kubectl to create and delete Stored in Etcd Useless without controller to act on resource

Slide 52

Slide 52 text

@shahiddev Custom Resource Definition

Slide 53

Slide 53 text

@shahiddev Creating a Foo resource

Slide 54

Slide 54 text

No content

Slide 55

Slide 55 text

@shahiddev Custom controllers Can be used to customise behaviour of existing resources Often paired with CRDs to add behaviour to custom resources Often implemented in Go Operator ~= Crds + custom controllers

Slide 56

Slide 56 text

@shahiddev Well known operators https://github.com/operator-framework/awesome-operators

Slide 57

Slide 57 text

@shahiddev Writing your own operator? https://github.com/operator-framework

Slide 58

Slide 58 text

@shahiddev Scaling application & clusters

Slide 59

Slide 59 text

@shahiddev Autoscaling Horizontal Pod Autoscaler (HPA) Scale number of pods based on metrics v2 HPA – can use external metrics Vertical Pod Autoscaler (VPA) Increase the resources for a given pod based on metrics (scale up) Cluster Autoscaler (CA) Scale cluster if pods are waiting to be scheduled Relies on cloud provider to increase node count Virtual kubelet/node OSS project to connect external compute resource to K8s cluster Interact with resource via familiar k8s api

Slide 60

Slide 60 text

@shahiddev Auto scaling triggers Horizontal scaling can be based on metrics from pod V1 HPA uses CPU/Memory V2 HPA (beta) can scale from almost any metric including external metrics (e.g. queue depth) VPA CPU/Memory usage of pod Cluster autoscaler based on pods waiting to be scheduled due to insufficient cluster resources

Slide 61

Slide 61 text

@shahiddev Scale to zero Out of the box Kubernetes unable to auto-scale pods to zero instances* Desirable to scale certain microservices to zero instances Message handlers “functions” style applications * K8s 1.15 adds support for this via feature gate

Slide 62

Slide 62 text

@shahiddev KEDA – Kubernetes Event Driven Autoscaler Open source project led by Microsoft and RedHat Allows for Kubernetes deployments to be auto scaled based on events Scale up from zero -> n instances Scale down from n -> zero instances

Slide 63

Slide 63 text

@shahiddev How KEDA works

Slide 64

Slide 64 text

@shahiddev KEDA scalers/event sources • AWS CloudWatch • AWS Simple Queue Service • Azure Event Hub† • Azure Service Bus Queues and Topics • Azure Storage Queues • GCP PubSub • Kafka • Liiklus • Prometheus • RabbitMQ • Redis Lists Others in development

Slide 65

Slide 65 text

@shahiddev Virtual Kubelet/Node

Slide 66

Slide 66 text

@shahiddev Virtual Kubelet implementations Azure Container Instances AWS Fargate Hashicorp Nomad Service Fabric Mesh Azure IoT Edge …others

Slide 67

Slide 67 text

@shahiddev Azure Container Instances “Serverless” containers No infrastructure required Per sec billing for running container Good for: Testing images Short lived containers Bursting for sudden spikes

Slide 68

Slide 68 text

@shahiddev Bursting load using virtual node Bursting to ACI to continue scaling beyond cluster capacity ACI

Slide 69

Slide 69 text

@shahiddev Virtual nodes option in AKS

Slide 70

Slide 70 text

@shahiddev DEMO KEDA VIRTUAL NODE SCALING

Slide 71

Slide 71 text

@shahiddev Virtual node recap • Virtual node was tainted to prevent pods being scheduled “accidentally” • The e-commerce shop deployment to burst was configured with • Toleration for the virtual node taint – now allows pods to be scheduled on the virtual node • Node anti-affinity to the virtual node (soft) – prevents usage of the virtual node unless there is no other choice

Slide 72

Slide 72 text

@shahiddev Wrapping it up Many powerful constructs available in Kubernetes to control pod scheduling Admission webhooks allow customisation of resources with minimal code Custom resources with controllers give you ultimate extensibility Virtual node may allow for “serverless” k8s clusters in the future

Slide 73

Slide 73 text

@shahiddev Where can I go to learn more? http://www.katacoda.com https://www.katacoda.com/openshift/courses/operatorframework https://github.com/Azure-Samples/virtual-node-autoscale http://bit.ly/k8s-microservices-video

Slide 74

Slide 74 text

@shahiddev Shahid Iqbal | Freelance consultant @shahiddev Thank you! Questions? @shahiddev on Twitter https://www.linkedin.com/in/shahiddev/ https://blog.headforcloud.com