ChaDevOps: Deploying Self Healing Services with Kubernetes

ChaDevOps: Deploying Self Healing Services with Kubernetes

In this talk I tell the story of how Kubernetes kept Spire systems running through an AWS service disruption. From there I go through a demo showing just how simple it is to deploy self healing services with Kubernetes. You'll learn how to avoid some common problems, and should leave this talk with everything you need to get started with Kubernetes.

Related to this Presentation:

GitHub (https://github.com/robscott/self-healing-k8s-demo)

Further Reading:

The Children's Illustrated Guide to Kubernetes (https://deis.com/blog/2016/kubernetes-illustrated-guide/)

Quickstart for Google Container Engine (https://cloud.google.com/container-engine/docs/quickstart)

Setting up an HA Kubernetes Cluster in AWS with private topology with Kops 1.5.1 (https://www.nivenly.com/kops-1-5-1/)

KubeCon Videos (https://www.youtube.com/playlist?list=PLj6h78yzYM2PAavlbv0iZkod4IVh_iGqV)

8f4b23887287d886cde6dee7b54a83e5?s=128

Rob Scott

June 21, 2017
Tweet

Transcript

  1. Deploying Self Healing Services with Kubernetes Rob Scott | @robertjscott

    ChaDevOps, June 20, 2017
  2. @spire spire.me

  3. Remember This? February 28, 2017

  4. None
  5. All Spire systems were still up

  6. It’s never that simple

  7. All Spire systems were still up

  8. Our Systems Before Kubernetes

  9. The core services powering Spire Website API Scheduler HTTP Services

    Background Services Background Processing Notifications Management Portal
  10. Node 1 Notifications Background Processing API Management Portal What it

    all looks like in Kubernetes API Website STAGING DEMO QA QA Scheduler Website QA STAGING STAGING Node 2 Notifications Background Processing API Management Portal API Website DEMO QA QA Website STAGING Node 3 Notifications Background Processing Management Portal API STAGING DEMO DEMO Scheduler Website DEMO DEMO Notifications Notifications DEMO DEMO Background ProcessingQA Node 4 Notifications API Management Portal Website QA QA Scheduler Background Processing DEMO Background ProcessingQA Management Portal STAGING DEMO STAGING STAGING QA Management Portal STAGING STAGING STAGING DEMO QA
  11. Node 1 Notifications Background Processing API Management Portal What if

    a Node dies? API Website STAGING DEMO QA QA Scheduler Website QA STAGING STAGING Node 2 Notifications Background Processing API Management Portal API Website DEMO QA QA Website STAGING Node 3 Notifications Background Processing Management Portal API STAGING DEMO DEMO Scheduler Website DEMO DEMO Notifications Notifications DEMO DEMO Background ProcessingQA Node 4 Notifications API Management Portal Website QA QA Scheduler Background Processing DEMO Background ProcessingQA Management Portal STAGING DEMO STAGING STAGING QA Management Portal STAGING STAGING STAGING DEMO QA
  12. Node 1 Notifications Background Processing API Management Portal After redistribution

    API Website STAGING DEMO QA QA Scheduler Website QA STAGING STAGING Node 2 Notifications Background Processing API Management Portal API Website DEMO QA QA Website STAGING Node 3 Notifications Background Processing Management Portal API STAGING DEMO DEMO Scheduler Website DEMO DEMO Notifications Notifications DEMO DEMO Background ProcessingQA Notifications API Management Portal Website QA QA Scheduler Background Processing DEMO Background ProcessingQA Management Portal STAGING DEMO STAGING STAGING QA Management Portal STAGING STAGING STAGING DEMO QA
  13. Kuberwhat?

  14. Initial Release: July 21, 2015 Google partnered with the Linux

    Foundation to form the Cloud Native Computing Foundation (CNCF) to govern Kubernetes.
  15. Container Orchestration Tools SWARM

  16. Container Orchestration Trends

  17. Container Orchestration Trends

  18. Demo Everything you’ll need to deploy your own self healing

    services with Kubernetes.
  19. Namespace Kubernetes Foundation

  20. apiVersion: v1 kind: Namespace metadata: name: self-healing-k8s-demo

  21. None
  22. Service Kubernetes Foundation

  23. apiVersion: v1 kind: Service metadata: name: self-healing-k8s-demo spec: type: LoadBalancer

    selector: app: self-healing-k8s-demo ports: - protocol: TCP port: 80 targetPort: 3000
  24. None
  25. Pod Kubernetes Foundation

  26. apiVersion: v1 kind: Pod metadata: name: demo-pod labels: app: self-healing-k8s-demo

    spec: containers: - name: demo-http-server image: quay.io/robertjscott/demo-http-server:0.1.1
  27. None
  28. None
  29. None
  30. Deployment Kubernetes Foundation

  31. apiVersion: extensions/v1beta1 kind: Deployment metadata: name: demo-deployment spec: replicas: 3

    template: metadata: labels: app: self-healing-k8s-demo spec: containers: - name: demo-http-server image: quay.io/robertjscott/demo-http-server:0.1.1
  32. None
  33. None
  34. Example: Bad Code Example

  35. None
  36. None
  37. Liveness Probes Key Concept

  38. spec: containers: - name: demo-http-server image: quay.io/robertjscott/demo-http-server:0.1.1 livenessProbe: httpGet: path:

    /alive port: 3000 periodSeconds: 5 timeoutSeconds: 1
  39. None
  40. Example: Slow Server Example

  41. containers: - name: demo-http-server image: quay.io/robertjscott/demo-http-server:0.1.1 livenessProbe: httpGet: path: /alive

    port: 3000 periodSeconds: 5 timeoutSeconds: 1 initialDelaySeconds: 45 env: - name: STARTUP_DELAY_SECONDS value: '40'
  42. None
  43. Readiness Probes Key Concept

  44. containers: - name: demo-http-server image: quay.io/robertjscott/demo-http-server:0.1.1 livenessProbe: httpGet: path: /alive

    port: 3000 periodSeconds: 5 timeoutSeconds: 1 initialDelaySeconds: 45 readinessProbe: httpGet: path: /ready port: 3000 periodSeconds: 5 timeoutSeconds: 1
  45. None
  46. Example: Mayhem Example

  47. apiVersion: extensions/v1beta1 kind: Deployment metadata: name: mayhem-deployment spec: replicas: 5

    template: metadata: labels: app: mayhem spec: containers: - name: mayhem image: quay.io/robertjscott/mayhem:0.1.0
  48. None
  49. Resource Limits Key Concept

  50. spec: containers: - name: mayhem image: quay.io/robertjscott/mayhem:0.1.0 resources: requests: memory:

    64Mi cpu: 125m limits: memory: 64Mi cpu: 125m
  51. None
  52. Recap

  53. Liveness Probes When these probes fail, Kubernetes attempts to restart

    the container.
  54. Readiness Probes Kubernetes does not send traffic to the container

    until these probes succeed.
  55. Resource Limits Without enforcing proper resource limits, a single rogue

    container can take down a node.
  56. Bonus

  57. Affinity and Anti-Affinity Proper configuration can ensure your pods are

    deployed across availability zones or regions.
  58. nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: failure-domain.beta.kubernetes.io/zone operator: In

    values: - us-east-1c - us-east-1d
  59. preferredDuringSchedulingIgnoredDuringExecution: - weight: 1 preference: matchExpressions: - key: beta.kubernetes.io/instance-type operator:

    In values: - m4.large
  60. Where to go from here • The Children's Illustrated Guide

    to Kubernetes • Quickstart for Google Container Engine • Setting up an HA Kubernetes Cluster in AWS with private topology with Kops 1.5.1 • KubeCon Videos
  61. With proper configuration, Kubernetes services can heal themselves @robertjscott |

    robertjscott.ca