Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Site Reliability Engineering For Kubernetes

Site Reliability Engineering For Kubernetes

In this presentation, Tammy shares important failure modes to consider when responsible for the reliability of Kubernetes in your organisation.

This tech talk is based on an article Tammy wrote: https://tammybutow.medium.com/site-reliability-engineering-for-kubernetes-b52877c70fb7

203e64aeb53ae59b2b4dcf923c163c23?s=128

Tammy Bryant Butow

May 12, 2021
Tweet

Transcript

  1. @tambryantbutow Tammy Bryant Butow @tambryantbutow Site Reliability Engineering For Kubernetes

  2. @tambryantbutow WHAT CHANGES WHEN YOU ARE RESPONSIBLE FOR KUBERNETES CLUSTERS?

    WHAT METRICS MATTER FOR K8s? WHAT KIND OF INCIDENTS ARE COMMON WITH K8s? WHAT SHOULD I AVOID RUNNING ON K8s?
  3. @tambryantbutow codeberg.org/hjacobs/kubernetes-failure-stories

  4. @tambryantbutow COMMON FAILURE MODES FOR KUBERNETES IN PRODUCTION

  5. @tambryantbutow

  6. @tambryantbutow

  7. @tambryantbutow COMMON FAILURE MODES FOR KUBERNETES ON AWS

  8. @tambryantbutow COMMON CPU FAILURE MODES FOR KUBERNETES ON AWS

  9. @tambryantbutow COMMON FAILURE MODES FOR KUBERNETES ON GCP

  10. @tambryantbutow COMMON FAILURE MODES FOR KUBERNETES ON AZURE

  11. @tambryantbutow

  12. @tambryantbutow CREATING A RELIABILITY PLAN

  13. @tambryantbutow tammybutow.medium.com/site-reliability-engineering- for-kubernetes-b52877c70fb7 TS;WR

  14. @tambryantbutow gremlin.com/bootcamp

  15. @tambryantbutow KUBERNETES DASHBOARD - RESPONSE TIMES

  16. @tambryantbutow gremlin.com/community/tutorials/how-to-create-a-kubernetes-clust er-on-ubuntu-16-04-with-kubeadm-and-weave-net/

  17. GOT STICKERS? GREMLIN.COM/TALK/SRE @tambryantbutow

  18. THANK YOU! @tambryantbutow