Slide 1

Slide 1 text

No content

Slide 2

Slide 2 text

Build, Break, and Survive! ~ A Kawaii Manga Journey Through the Ups and Downs of Production Apps ~ AOI TAKAHASHI The Grand Adventure Of Production Apps

Slide 3

Slide 3 text

About me ● Aoi Takahashi ● Working as a Site Reliability Engineer at an IT Company ● I live with two Dogs(Golden Retriever)🐶 ● Love reading and writing Manga! ● 𝕏: @_a0i ● GitHub: github.com/aoi1 ● LinkedIn

Slide 4

Slide 4 text

● I wrote a book 「つくって、壊して、直して学ぶKubernetes入門」which means “Build, Breaking, Fixing: A Playful Way to Learn Kubernetes” ● In this session, I’ll guide you through the world of Build, Break and Fixing — the very heart of the book this talk is based on.

Slide 5

Slide 5 text

The Target of This Session ● People who know the basics of Kubernetes. ○ I will not explain what a Pod is, what a Service is, etc. ● People who have little experience in running applications in production with Kubernetes. I hope to help people like the above to run their applications with a little more confidence.

Slide 6

Slide 6 text

No content

Slide 7

Slide 7 text

Table of Contents Chapter 1. - The Beginning - Before the Hero Draws the Sword (kubectl) Chapter 2. - Level Up by Battling Monsters - Learn Troubleshooting Techniques for Real-World Production Issues Chapter 3. - And so the journey continues... - Recommended Books and Sessions for the Journey Ahead

Slide 8

Slide 8 text

- The Beginning - The hero embarked on a quest to slay the monsters wreaking havoc on production applications. This is the tale of a lone hero who rose to save the production environment. Monsters known as incidents strike at applications, even today… Can the hero protect the service and emerge victorious...?

Slide 9

Slide 9 text

Hi, my name is KubeKnight! I’m on an adventure to save the world!

Slide 10

Slide 10 text

⭐Read left to right Not the Japanese way!

Slide 11

Slide 11 text

No content

Slide 12

Slide 12 text

Hero's Sword:kubectl What you first need to do… Is to become proficient in using kubectl. The most fundamental and versatile weapon.

Slide 13

Slide 13 text

kubectl basics in troubleshooting Level 1. Basic Practice kubectl get pods

Slide 14

Slide 14 text

kubectl basics in troubleshooting kubectl get pods -o wide

Slide 15

Slide 15 text

kubectl basics in troubleshooting kubectl get pods -o yaml

Slide 16

Slide 16 text

kubectl basics in troubleshooting kubectl get pods -o jsonpath

Slide 17

Slide 17 text

kubectl basics in troubleshooting kubectl describe pods

Slide 18

Slide 18 text

Don’t forget! You should specify the namespace; otherwise, the default namespace will be used. 🙁 kubectl get pods ✅ kubectl get pods --namespace(-n) mynamespace ✅ kubectl config set-context --current --namespace=mynamespace kubectl get pods ✅ kubens mynamespace kubectl get pods

Slide 19

Slide 19 text

Other Tools kubectx kubectx is a CLI tool that simplifies switching between multiple Kubernetes contexts. A context in Kubernetes defines the cluster, user, and namespace to use. kubens kubens complements kubectx by simplifying namespace switching. Instead of specifying --namespace repeatedly.

Slide 20

Slide 20 text

Other Tools k9s k9s is a powerful terminal-based UI that lets you interactively manage your Kubernetes cluster.

Slide 21

Slide 21 text

Hero’s Learnings: Mastering the Basics of kubectl ● kubectl is your primary weapon ○ get comfortable with the basics:kubectl get, describe ● Always specify the correct namespace — don't get lost in the default realm!

Slide 22

Slide 22 text

No content

Slide 23

Slide 23 text

First monster: ImagePullBackOff Here, we’ll start by learning how to fight our first monster. We’ll do this by deliberately reproducing a common error — the ImagePullBackOff state — and then fixing it.

Slide 24

Slide 24 text

Build it: NGINX Pod Apply the manifest to your Kubernetes cluster.

Slide 25

Slide 25 text

Build it: NGINX Pod

Slide 26

Slide 26 text

Break it: ImagePullBackOff Apply the manifest to your Kubernetes cluster.

Slide 27

Slide 27 text

Break it: ImagePullBackOff

Slide 28

Slide 28 text

Fix it: kubectl edit Level 2. kubectl edit

Slide 29

Slide 29 text

About kubectl edit Here, we used kubectl edit to fix the issue right from the command line. But a word of caution: kubectl edit makes direct changes to your live environment. In real-world operations, changes should flow through your deployment workflow, where proper manifests are applied safely — according to your team’s practices.

Slide 30

Slide 30 text

No content

Slide 31

Slide 31 text

- Level Up by Battling Monsters - What’s important in troubleshooting is being able to effectively repeat cycles of forming hypotheses and testing them. But in order to form good hypotheses, you also need to understand the big picture. For example, having a “map” of a typical setup for deploying a stateless application can help smooth your journey.

Slide 32

Slide 32 text

Get a map

Slide 33

Slide 33 text

Build and Break the Pod: Running…? Apply the manifest to your Kubernetes cluster.

Slide 34

Slide 34 text

Build and Break the Pod: Running…? ←The Container is not Ready

Slide 35

Slide 35 text

Troubleshooting using kubectl Level 3. kubectl logs ❌ kubectl get logs

Slide 36

Slide 36 text

Let’s make a guess — what’s going wrong here? Log ✅ ✅ ✅

Slide 37

Slide 37 text

Thinking… The manifest The code The log

Slide 38

Slide 38 text

Fix it

Slide 39

Slide 39 text

No content

Slide 40

Slide 40 text

Miniboss: Access to the forbidden chronicle

Slide 41

Slide 41 text

Build it Apply the manifest to your Kubernetes cluster.

Slide 42

Slide 42 text

No content

Slide 43

Slide 43 text

I WILL BREAK THE CHRONICLE TO SAVE MY BOSS!!!

Slide 44

Slide 44 text

Break it Apply the manifest to your Kubernetes cluster.

Slide 45

Slide 45 text

Again, use the map ← Investigate the Pod first

Slide 46

Slide 46 text

Check the pod ✅ kubectl describe pod ✅ kubectl logs

Slide 47

Slide 47 text

Again… use the map Log ✅ Events ✅

Slide 48

Slide 48 text

Tips: Accessing pods for debugging Level 4. kubectl port-forward

Slide 49

Slide 49 text

Troubleshooting with kubectl Level 4. kubectl exec

Slide 50

Slide 50 text

Again… use the map ✅ ✅ ✅

Slide 51

Slide 51 text

Again… use the map ← TRY ACCESS

Slide 52

Slide 52 text

Troubleshooting with kubectl Level 4. kubectl run You cannot get a response. ACCESS

Slide 53

Slide 53 text

Thinking… ✅ ✅ ✅ ❌

Slide 54

Slide 54 text

Thinking…

Slide 55

Slide 55 text

Fix it: kubectl edit

Slide 56

Slide 56 text

No content

Slide 57

Slide 57 text

Hero’s Learnings: Battling Common Incidents ● Troubleshooting = Hypothesize → Test → Observe → Repeat ● Know what to look for: ○ Pod status ○ Container states ○ Logs & events ● Misconfigured probes can hurt your app — use them wisely!

Slide 58

Slide 58 text

- Upgrade Your Gear - Improving observability enables faster troubleshooting. Examples: the three pillars ● Logs - Storing logs for a long period helps with historical investigations. ● Metrics - Making metrics accessible helps with setting up and responding to alerts. ● Traces - Making traces accessible allows you to identify which requests had issues.

Slide 59

Slide 59 text

- And so the journey continues... - The journey is far from over. The troubleshooting methods introduced in this session are just a small part—out in the real world, you'll face many more challenges. To overcome them, it's important to draw on the wisdom of those who came before you.

Slide 60

Slide 60 text

- Recommended Books, Communities - 10 Weird Ways to Blow Up Your Kubernetes - Melanie Cebula & Bruce Sherrod, Airbnb https://www.youtube.com/watch?v=FrQ8Lwm9_j8 10 More Weird Ways to Blow Up Your Kubernetes - Jian Cheung & Joseph Kim, Airbnb https://www.youtube.com/watch?v=4CT0cI62YHk Books in English: ● Cloud Native DevOps with Kubernetes: Building, Deploying, and Scaling Modern Applications in the Cloud by John Arundel, Justin Domingus ● Kubernetes Patterns: Reusable Elements for Designing Cloud-Native Applications by Bilgin Ibryam , Roland Huß Books in Japanese: ● Kubernetes完全ガイド by Aoyama Masaya OR… Earn a certification:CKAD, CKA

Slide 61

Slide 61 text

END

Slide 62

Slide 62 text

Appendix

Slide 63

Slide 63 text

Hint: Sample of Status ErrImagePull There was some issue when pulling an image. Error The container has some error. OOMKilled The container was killed by OOM.

Slide 64

Slide 64 text

Hint: Setting the probes Liveness probe Liveness probes determine when to restart a container. For example, liveness probes could catch a deadlock when an application is running but unable to make progress. Readiness probe Readiness probes determine when a container is ready to start accepting traffic. Startup probe A startup probe verifies whether the application within a container is started. Probes are originally intended to make applications more robust, but misconfigurations can sometimes prevent the application from starting. Of course, there are also cases where the application doesn't start as intended, in order to protect it properly.

Slide 65

Slide 65 text

Hint: Pod Phases & Container States Pending The Pod has been accepted by the Kubernetes cluster, but one or more of the containers has not been set up and made ready to run. This includes time a Pod spends waiting to be scheduled as well as the time spent downloading container images over the network. Running The Pod has been bound to a node, and all of the containers have been created. At least one container is still running, or is in the process of starting or restarting. Succeeded All containers in the Pod have terminated in success, and will not be restarted. Failed All containers in the Pod have terminated, and at least one container has terminated in failure. That is, the container either exited with non-zero status or was terminated by the system, and is not set for automatic restarting. Unknown For some reason the state of the Pod could not be obtained. This phase typically occurs due to an error in communicating with the node where the Pod should be running.

Slide 66

Slide 66 text

Hint: Container States Waiting If a container is not in either the Running or Terminated state, it is Waiting. The container is still running the operations it requires in order to complete start up. Running The Running status indicates that a container is executing without issues. Terminated A container in the Terminated state began execution and then either ran to completion or failed for some reason.