Kubernetes - Beyond the basics (vol 1.)

@shahiddev Shahid Iqbal | Freelance consultant @shahiddev Kubernetes Going beyond
the basics

@shahiddev Very brief intro Freelance hands-on consultant working on .NET,
Azure & Kubernetes .NET developer/Architect for over a decade & Microsoft MVP Based in the UK and working globally Co-organiser of the MK.net meetup in the UK @shahiddev on Twitter https://www.linkedin.com/in/shahiddev/ https://blog.headforcloud.com https://sessionize.com/shahid-iqbal

@shahiddev Agenda Cover more detailed concepts within Kubernetes Scheduling Admission
controllers Options for extending K8s Scaling - Virtual node - KEDA Demos!

@shahiddev Not covering Fundamentals of Kubernetes Deep dive into creating
custom controllers/operators Disclaimer: the definition of “advanced” topics is very subjective

@shahiddev Audience participation

@shahiddev Pod scheduling

@shahiddev Control plane node etcd API Server Scheduler Controller manager
Cloud Controller manager

@shahiddev Scheduling pods Create a pod Scheduler detects no node
assigned Assigns a node

@shahiddev Why influence pod scheduling/placement? Heterogenous cluster with specialised hardware/software
Allocate to teams/multi-tenancy Regulatory requirements Application architecture requires components to be co-located or separated

@shahiddev Approaches to influencing pod scheduling Node selector Node affinity/anti-affinity
Node taints/tolerations Pod affinity/anti-affinity Custom scheduler

@shahiddev Node selector Add nodeSelector in our PodSpec

@shahiddev Node selector If you need to add custom node
label Add custom key-value pair label to node

@shahiddev Node selector issues Basic matching on exact key-value pair
Pods will fail to start if no node is found with matching label

@shahiddev Node affinity/anti-affinity Allows pods to decide which node to
use based on labels Match on conditions rather than exactly with a key-value pair. “In”, “NotIn”, “Exists”, “DoesNotExist”, “Gt”, “Lt” Can be selective on how “demanding” you are

@shahiddev Specifying demand for node requiredDuringSchedulingIgnoredDuringExecution Hard requirement → Pod
not scheduled preferredDuringSchedulingIgnoredDuringExecution Soft requirement → Pod still scheduled requiredDuringSchedulingRequiredDuringExecution Not implemented yet

@shahiddev Node affinity

@shahiddev Node selectors/affinity issues If we want to prevent certain
nodes from being used we cannot do this easily. We would need to ensure EVERY deployment had a node anti- affinity.

@shahiddev Taints & Tolerations Allows nodes to repel pods based
on taint Nodes are tainted and Pods tolerate taints Taints are comprised of: key, value and effect NoSchedule, PreferNoSchedule, NoExecute

@shahiddev Taints and tolerations Existing running pods can be evicted
if a node is tainted and a pod doesn’t tolerate it K8s add taints to nodes in certain circumstances (node problems)

@shahiddev Taints & tolerations Taint node

@shahiddev Taints & tolerations Schedule without toleration

@shahiddev Taints & tolerations Add toleration to pod spec

@shahiddev Taints & tolerations Schedule with toleration

@shahiddev Taints vs Node affinity With Node affinity any pod
could be scheduled unless the pod explicitly sets node anti-affinity – Node has no say in the matter Taints allow nodes to repel all pods unless they tolerate the taint - including currently running pods! Use Taints to prevent “casual scheduling” on certain Nodes E.g. where you have a limited/expensive resource

@shahiddev Inter-pod affinity/anti-affinity Select nodes used for pod based on
what other pods are running on it Ensure certain components run on same node e.g. cache alongside app Ensure certain components don’t run in same zone e.g. ensure app components can tolerate node loss

@shahiddev Inter-pod affinity/anti-affinity Same constructs for indicating strictness requiredDuringSchedulingIgnoredDuringExecution preferredDuringSchedulingIgnoredDuringExecution
Topologykey References a node label The “level” of infrastructure that is used to apply the rules E.g. hostname or failure domain or availability zone

@shahiddev TopologyKey Label key Kubernetes.io/hostname Node-1 Node-2 Node-3 Node-4 failure-domain.beta.kubernetes.io/zone
1 1 2 2 Label Value

@shahiddev Inter-pod affinity/anti-affinity

@shahiddev Pod affinity Node 1 web cache Node 2 Node
3 Replicas: 4 PodAffinity: cache (preferred) PodAntiAffinity: web Topologykey: kubernetes.io/hostname Node 4

@shahiddev Pod affinity Node 1 cache Node 2 Node 3
Replicas: 4 Node 4 web web web web PodAffinity: cache (preferred) PodAntiAffinity: web Topologykey: kubernetes.io/hostname PodAffinity: web (preferred) PodAntiAffinity: cache Topologykey: kubernetes.io/hostname

@shahiddev Pod affinity Node 1 cache Node 2 Node 3
Replicas: 4 Node 4 web web web web PodAffinity: cache (preferred) PodAntiAffinity: web Topologykey: kubernetes.io/hostname PodAffinity: web (preferred) PodAntiAffinity: cache Topologykey: kubernetes.io/hostname cache cache cache

@shahiddev Pod affinity – Zone topology Zone 1 Node 1
web cache Zone 1 Node 2 Zone 2 Node 3 Replicas: 2 PodAffinity: cache (preferred) PodAntiAffinity: web Topologykey: zone Zone 2 Node 4

cache Zone 1 Node 2 Zone 2 Node 3 Replicas: 2 Zone 2 Node 4 web web PodAffinity: cache (preferred) PodAntiAffinity: web Topologykey: zone PodAffinity: web (preferred) PodAntiAffinity: cache Topologykey: zone

cache Zone 1 Node 2 Zone 2 Node 3 Replicas: 2 Zone 2 Node 4 web PodAffinity: cache (preferred) PodAntiAffinity: web Topologykey: zone PodAffinity: web (preferred) PodAntiAffinity: cache Topologykey: zone cache web

@shahiddev Web front end

@shahiddev Cache

@shahiddev Pod distribution

@shahiddev Custom scheduler Scheduler written in any language Needs access
to API server To use define scheduler name in pod spec

@shahiddev Controlling/Extending K8s

@shahiddev Taking more control… Want to have more control over
resources that are created Apply custom policies to resources (e.g. must have certain labels) Prevent certain resources being created Inject additional logic transparently into resources

@shahiddev Admissions controllers Code that intercepts API server requests before
they are persisted Controllers can be Validating – can inspect the objects but not modify Mutating – can modify the objects Both Enabled/disabled using kube-apisever Limited options on managed K8s providers Are compiled into the api server binary

@shahiddev Admission controllers DefaultTolerationSeconds MutatingAdmissionWebhook ValidatingAdmissionWebhook ResourceQuota Priority NamespaceLifecycle LimitRanger
ServiceAccount PersistentVolumeClaimResize DefaultStorageClass

@shahiddev API request lifecycle HTTP handler AuthN/AuthZ Mutating admission controllers
Object schema validation Validating admission controllers Persistence (etcd) Mutating admission webhooks Validating admission webhooks Mutating admission webhooks Validating admission webhooks Adapted from: https://banzaicloud.com/blog/k8s-admission-webhooks/

@shahiddev Admission Webhooks Implemented by two “special” admission controllers MutatingAdmissionWebhook
– modifies resources/creates new resources ValidatingAdmissionWebhook – use to block resource creation Controllers invoke HTTP callback Logic doesn’t need to be compiled into api server Logic can be hosted inside/outside the cluster

@shahiddev QUICK DEMO ADMISSION WEBHOOKS

@shahiddev Open Policy Agent (OPA) Admission controllers let you tightly
control what can run in your cluster. Use OPA framework uses admission control but abstracts the lower level details. https://www.openpolicyagent.org

@shahiddev Extending Kubernetes API Build abstractions on top of K8s
resources Create entirely new resources within K8s Use kubectl to manage custom resources

@shahiddev Extending Kubernetes API options Extension API servers Custom resource
definitions Custom controllers

@shahiddev Custom Resource Definitions (CRDs) A new resource type alongside
the built in types Can use kubectl to create and delete Stored in Etcd Useless without controller to act on resource

@shahiddev Custom Resource Definition

@shahiddev Creating a Foo resource

@shahiddev Custom controllers Can be used to customise behaviour of
existing resources Often paired with CRDs to add behaviour to custom resources Often implemented in Go Operator ~= Crds + custom controllers

@shahiddev Well known operators https://github.com/operator-framework/awesome-operators

@shahiddev Writing your own operator? https://github.com/operator-framework

@shahiddev Scaling application & clusters

@shahiddev Autoscaling Horizontal Pod Autoscaler (HPA) Scale number of pods
based on metrics v2 HPA – can use external metrics Vertical Pod Autoscaler (VPA) Increase the resources for a given pod based on metrics (scale up) Cluster Autoscaler (CA) Scale cluster if pods are waiting to be scheduled Relies on cloud provider to increase node count Virtual kubelet/node OSS project to connect external compute resource to K8s cluster Interact with resource via familiar k8s api

@shahiddev Auto scaling triggers Horizontal scaling can be based on
metrics from pod V1 HPA uses CPU/Memory V2 HPA (beta) can scale from almost any metric including external metrics (e.g. queue depth) VPA CPU/Memory usage of pod Cluster autoscaler based on pods waiting to be scheduled due to insufficient cluster resources

@shahiddev Scale to zero Out of the box Kubernetes unable
to auto-scale pods to zero instances* Desirable to scale certain microservices to zero instances Message handlers “functions” style applications * K8s 1.15 adds support for this via feature gate

@shahiddev KEDA – Kubernetes Event Driven Autoscaler Open source project
led by Microsoft and RedHat Allows for Kubernetes deployments to be auto scaled based on events Scale up from zero -> n instances Scale down from n -> zero instances

@shahiddev How KEDA works

@shahiddev KEDA scalers/event sources • AWS CloudWatch • AWS Simple
Queue Service • Azure Event Hub† • Azure Service Bus Queues and Topics • Azure Storage Queues • GCP PubSub • Kafka • Liiklus • Prometheus • RabbitMQ • Redis Lists Others in development

@shahiddev Virtual Kubelet/Node

@shahiddev Virtual Kubelet implementations Azure Container Instances AWS Fargate Hashicorp
Nomad Service Fabric Mesh Azure IoT Edge …others

@shahiddev Azure Container Instances “Serverless” containers No infrastructure required Per
sec billing for running container Good for: Testing images Short lived containers Bursting for sudden spikes

@shahiddev Bursting load using virtual node Bursting to ACI to
continue scaling beyond cluster capacity ACI

@shahiddev Virtual nodes option in AKS

@shahiddev DEMO KEDA VIRTUAL NODE SCALING

@shahiddev Virtual node recap • Virtual node was tainted to
prevent pods being scheduled “accidentally” • The e-commerce shop deployment to burst was configured with • Toleration for the virtual node taint – now allows pods to be scheduled on the virtual node • Node anti-affinity to the virtual node (soft) – prevents usage of the virtual node unless there is no other choice

@shahiddev Wrapping it up Many powerful constructs available in Kubernetes
to control pod scheduling Admission webhooks allow customisation of resources with minimal code Custom resources with controllers give you ultimate extensibility Virtual node may allow for “serverless” k8s clusters in the future

@shahiddev Where can I go to learn more? http://www.katacoda.com https://www.katacoda.com/openshift/courses/operatorframework
https://github.com/Azure-Samples/virtual-node-autoscale http://bit.ly/k8s-microservices-video

@shahiddev Shahid Iqbal | Freelance consultant @shahiddev Thank you! Questions?
@shahiddev on Twitter https://www.linkedin.com/in/shahiddev/ https://blog.headforcloud.com

Kubernetes - Beyond the basics (vol 1.)

Kubernetes - Beyond the basics (vol 1.)

More Decks by Shahid Iqbal

Other Decks in Technology

Featured

Transcript