Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Debug-application-inside-Kubernetes-using-Linux...

Kenta Tada
October 24, 2019

 Debug-application-inside-Kubernetes-using-Linux-Kernel-tools

Kenta Tada

October 24, 2019
Tweet

More Decks by Kenta Tada

Other Decks in Programming

Transcript

  1. R&D Center Base System Development Department Copyright 2019 Sony Corporation

    Debug application inside Kubernetes using Linux Kernel tools Kenta Tada R&D Center Sony Corporation
  2. Agenda ⚫Introduction of oci-ftrace-syscall-analyzer which is our system call analyzer

    for Kubernetes ⚫How to get process coredumps on Kubernetes
  3. kublet High Level Runtime (containerd) Low Level Runtime (runC) Pod

    (Sample Application) Node Master etcd kube-apiserver kube- scheduler kube- controller- manager kubectl Kubernetes
  4. kernel tools kublet High Level Runtime (containerd) Low Level Runtime

    (runC) Pod (Sample Application) Node Master etcd kube-apiserver kube- scheduler kube- controller- manager kubectl user kernel Kubernetes and kernel tools
  5. Background ⚫We developed the lightweight and secure runC-based container platform

    for embedded system ⚫That platform needs to launch secure(restricted and rootless) containers for third party ⚫We developed the ftrace-based system call analyzer to generate secure configs too ⚫Currently, we are porting those tools to our Kubernetes environments
  6. Use kernel tools to trace applications transparently ⚫Existing methods are

    very useful but additional packages and additional capabilities are needed to debug ⚫On the other hand, we just want to investigate system calls sometimes • Needed capabilities • Correct file permissions • seccomp settings for security ⚫Let’s use kernel tools to trace applications transparently
  7. Kernel technologies our syscall analyzer used ⚫ftrace • Tracing framework

    for the Linux kernel • ftrace can collect various information although it is typically considered the function tracer • Easy to set up(Just write settings to tracefs) –No eBPF compiler(No LLVM) ⚫Tracepoints • Static trace points inside kernel
  8. What is needed to integrate 1. Divide ftrace ring buffer

    using ftrace instances for for each containers • https://speakerdeck.com/kentatada/container-debug-using-ftrace 2. Set up ftrace inside container startup today’s topic
  9. Set up ftrace inside container startup ⚫How to insert the

    ftrace setting tool before container startup ⚫How to get PID1’s process inside the container ⚫What ftrace settings are needed to trace container’s processes
  10. How to insert the ftrace setting tool before container startup

    ⚫Container Lifecycle and related hook ⚫Our ftrace-based tracer should be executed at prestart because we want to trace from the process start like strace process lifetime poststart poststop prestart process start process stop Setup ftrace Collect logs
  11. How to get PID1’s process inside the container ⚫From OCI

    runtime spec, the state of the container which includes container initial PID must be passed to hooks over stdin • https://github.com/opencontainers/runtime- spec/blob/master/config.md ⚫So, we get the info about PID1’s process inside the container from stdin ⚫This approach can be useful on any low level runtimes if they comply with OCI runtime spec
  12. What ftrace settings are needed to trace container’s processes ⚫Enable

    system call events which you want to trace (e.g. From /sys/kernel/debug/tracing/events/syscalls) ⚫Only trace the specified PID (e.g. # echo [PID] > /sys/kernel/debug/tracing/set_event_pid) ⚫Trace processes which PID of “set_event_pid” forked (e.g. echo 1 > /sys/kernel/debug/tracing/options/event-fork)
  13. kernel tools kublet High Level Runtime (containerd) Low Level Runtime

    (runC) Pod (Sample Application) Node Master etcd kube-apiserver kube- scheduler kube- controller- manager kubectl user kernel We could integrate runC with ftrace-based syscall analyzer integration
  14. How to set up prestart hook in Kubernetes ⚫Kubernetes Pod

    Lifecycle and related hook ⚫Kubernetes did not provide prestart hook • https://github.com/kubernetes/kubernetes/issues/140 ⚫Next, we investigate prestart hook in the layer of high level runtime process lifetime process start process stop prestop poststart
  15. How to set up prestart hook in containerd ⚫ In

    the first place, CRI does not currently provide a way to specify the hook into the container’s config.json ⚫ High level runtime has their own implementation ⚫ Below is the containerd’s ongoing project • https://github.com/containerd/cri/pull/1248 • https://github.com/containerd/cri/issues/405
  16. How to set up prestart hook in CRI-O ⚫ CRI-O

    has already provided their own solution "oci-hooks“ • podman has the same feature ⚫ oci-hooks provides a way for users to configure the intended hooks for Open Container Initiative containers so they will only be executed for containers that need their functionality, and then only for the stages where they're needed https://github.com/containers/libpod/blob/master/pkg/hooks/docs/oci-hooks.5.md
  17. CRI-O oci-hooks prestart example { "version": "1.0.0", "hook": { "path":

    "/usr/local/bin/oci-ftrace-syscall-analyzer", "args": ["oci-ftrace-syscall-analyzer"] }, "when": { "always": true }, "stages": [ "prestart" ] }
  18. kernel tools kublet High Level Runtime (containerd) Low Level Runtime

    (runC) Pod (Sample Application) Node Master etcd kube-apiserver kube- scheduler kube- controller- manager kubectl user kernel Our integration is done!! integration
  19. oci-seccomp-bpf-hook ⚫oci-seccomp-bpf-hook generates seccomp profiles by tracing the syscalls made

    by the container using eBPF ⚫The perf is used to log syscalls ⚫This tool has a few limitations • Needs CAP_SYS_ADMIN to run • Compiles C code on the fly using LLVM • Cannot use podman run --rm along with this ability
  20. When oci-ftrace-syscall-analyzer is used ⚫Your production system doesn’t want to

    provide privileges with users ⚫Your production kernel didn’t prepare for eBPF configurations ⚫Your production system doesn’t want to use LLVM • GCC will support the BPF backend? –Compiling to BPF with GCC : https://lwn.net/Articles/800606/
  21. What is the problem? ⚫Process core dump will be recorded

    at the path of /proc/sys/kernel/core_pattern ⚫But containers have their own Linux namespace and kernel does not support Linux namespace ⚫Kubernetes users cannot get their process core dump. ⚫Kubernetes issues •https://github.com/kubernetes/kubernetes/issues/48787
  22. Community approach1 : Modify kernel code ⚫Modify core dump code

    inside kernel to support Linux namespace ⚫Patch • https://lkml.org/lkml/2017/8/2/77 ⚫Not merged
  23. Community approach2 : Implement add-on for Kubernetes Not merged too

    https://github.com/kubernetes/kubernetes/issues/48787
  24. Wrap up ⚫oci-ftrace-syscall-analyzer is seamlessly integrated with Kubernetes using CRI-O

    • https://github.com/KentaTada/oci-ftrace-syscall-analyzer ⚫We should consider how to get process coredump on Kubernetes
  25. Challenging of oci-ftrace-syscall-analyzer ⚫Integrate our tool with containerd ⚫Implement the

    user space logging facility originated from our internal container tools ⚫Use kprobes to hook system call to investigate syscall args ⚫Implement seccomp generator ⚫Get rid of unnecessary syscall logs recorded from prestart to actual runC’s exec • oci-seccomp-bpf-hook used prctl(2) as starting point. Is it actually standard??