Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Debug-application-inside-Kubernetes-using-Linux-Kernel-tools

Kenta Tada
October 24, 2019

 Debug-application-inside-Kubernetes-using-Linux-Kernel-tools

Kenta Tada

October 24, 2019
Tweet

More Decks by Kenta Tada

Other Decks in Programming

Transcript

  1. R&D Center Base System Development Department
    Copyright 2019 Sony Corporation
    Debug application inside Kubernetes using Linux Kernel tools
    Kenta Tada
    R&D Center
    Sony Corporation

    View Slide

  2. About me
    ⚫System Software Engineer, Sony
    ⚫OSS Contributor
    • runC
    • Docker
    • containerd
    and so on

    View Slide

  3. Agenda
    ⚫Introduction of oci-ftrace-syscall-analyzer which is our
    system call analyzer for Kubernetes
    ⚫How to get process coredumps on Kubernetes

    View Slide

  4. Overview of Kubernetes

    View Slide

  5. kublet
    High Level Runtime
    (containerd)
    Low Level Runtime
    (runC)
    Pod
    (Sample Application)
    Node
    Master
    etcd
    kube-apiserver
    kube-
    scheduler
    kube-
    controller-
    manager
    kubectl
    Kubernetes

    View Slide

  6. kernel
    tools
    kublet
    High Level Runtime
    (containerd)
    Low Level Runtime
    (runC)
    Pod
    (Sample Application)
    Node
    Master
    etcd
    kube-apiserver
    kube-
    scheduler
    kube-
    controller-
    manager
    kubectl
    user
    kernel
    Kubernetes and kernel tools

    View Slide

  7. Introduction of oci-ftrace-syscall-analyzer

    View Slide

  8. Background
    ⚫We developed the lightweight and secure runC-based
    container platform for embedded system
    ⚫That platform needs to launch secure(restricted and
    rootless) containers for third party
    ⚫We developed the ftrace-based system call analyzer to
    generate secure configs too
    ⚫Currently, we are porting those tools to our Kubernetes
    environments

    View Slide

  9. Existing debug methods on Kubernetes
    ⚫Install debug tools
    ⚫Create the debug image
    ⚫Prepare for the debug sidecar

    View Slide

  10. Use kernel tools to trace applications transparently
    ⚫Existing methods are very useful but additional packages
    and additional capabilities are needed to debug
    ⚫On the other hand, we just want to investigate system calls
    sometimes
    • Needed capabilities
    • Correct file permissions
    • seccomp settings for security
    ⚫Let’s use kernel tools to trace applications transparently

    View Slide

  11. Kernel technologies for tracing
    http://mmi.hatenablog.com/entry/2018/03/04/052249

    View Slide

  12. Kernel technologies our syscall analyzer used
    http://mmi.hatenablog.com/entry/2018/03/04/052249
    syscall analyzer

    View Slide

  13. Kernel technologies our syscall analyzer used
    ⚫ftrace
    • Tracing framework for the Linux kernel
    • ftrace can collect various information although it is typically
    considered the function tracer
    • Easy to set up(Just write settings to tracefs)
    –No eBPF compiler(No LLVM)
    ⚫Tracepoints
    • Static trace points inside kernel

    View Slide

  14. What is needed to integrate
    1. Divide ftrace ring buffer using ftrace instances for for each
    containers
    • https://speakerdeck.com/kentatada/container-debug-using-ftrace
    2. Set up ftrace inside container startup
    today’s
    topic

    View Slide

  15. Set up ftrace inside container startup
    ⚫How to insert the ftrace setting tool before container
    startup
    ⚫How to get PID1’s process inside the container
    ⚫What ftrace settings are needed to trace container’s
    processes

    View Slide

  16. How to insert the ftrace setting tool before container startup
    ⚫Container Lifecycle and related hook
    ⚫Our ftrace-based tracer should be executed at prestart
    because we want to trace from the process start like strace
    process lifetime
    poststart poststop
    prestart
    process
    start
    process
    stop
    Setup
    ftrace
    Collect
    logs

    View Slide

  17. How to get PID1’s process inside the container
    ⚫From OCI runtime spec, the state of the container which
    includes container initial PID must be passed to hooks over
    stdin
    • https://github.com/opencontainers/runtime-
    spec/blob/master/config.md
    ⚫So, we get the info about PID1’s process inside the container
    from stdin
    ⚫This approach can be useful on any low level runtimes if
    they comply with OCI runtime spec

    View Slide

  18. What ftrace settings are needed to trace container’s processes
    ⚫Enable system call events which you want to trace
    (e.g. From /sys/kernel/debug/tracing/events/syscalls)
    ⚫Only trace the specified PID
    (e.g. # echo [PID] > /sys/kernel/debug/tracing/set_event_pid)
    ⚫Trace processes which PID of “set_event_pid” forked
    (e.g. echo 1 > /sys/kernel/debug/tracing/options/event-fork)

    View Slide

  19. Let’s trace ls command inside container

    View Slide

  20. kernel
    tools
    kublet
    High Level Runtime
    (containerd)
    Low Level Runtime
    (runC)
    Pod
    (Sample Application)
    Node
    Master
    etcd
    kube-apiserver
    kube-
    scheduler
    kube-
    controller-
    manager
    kubectl
    user
    kernel
    We could integrate runC with ftrace-based syscall analyzer integration

    View Slide

  21. How to set up prestart hook in Kubernetes
    ⚫Kubernetes Pod Lifecycle and related hook
    ⚫Kubernetes did not provide prestart hook
    • https://github.com/kubernetes/kubernetes/issues/140
    ⚫Next, we investigate prestart hook in the layer of high level
    runtime
    process lifetime
    process
    start
    process
    stop
    prestop
    poststart

    View Slide

  22. How to set up prestart hook in containerd
    ⚫ In the first place, CRI does not currently provide a way to specify
    the hook into the container’s config.json
    ⚫ High level runtime has their own implementation
    ⚫ Below is the containerd’s ongoing project
    • https://github.com/containerd/cri/pull/1248
    • https://github.com/containerd/cri/issues/405

    View Slide

  23. How to set up prestart hook in CRI-O
    ⚫ CRI-O has already provided their own solution "oci-hooks“
    • podman has the same feature
    ⚫ oci-hooks provides a way for users to configure the intended
    hooks for Open Container Initiative containers so they will only
    be executed for containers that need their functionality, and
    then only for the stages where they're needed
    https://github.com/containers/libpod/blob/master/pkg/hooks/docs/oci-hooks.5.md

    View Slide

  24. https://github.com/containers/libpod/blob/master/pkg/hooks/docs/oci-hooks.5.md

    View Slide

  25. CRI-O oci-hooks prestart example
    {
    "version": "1.0.0",
    "hook": {
    "path": "/usr/local/bin/oci-ftrace-syscall-analyzer",
    "args": ["oci-ftrace-syscall-analyzer"]
    },
    "when": {
    "always": true
    },
    "stages": [
    "prestart"
    ]
    }

    View Slide

  26. Demo : Get syscall logs of redis Pod

    View Slide

  27. kernel
    tools
    kublet
    High Level Runtime
    (containerd)
    Low Level Runtime
    (runC)
    Pod
    (Sample Application)
    Node
    Master
    etcd
    kube-apiserver
    kube-
    scheduler
    kube-
    controller-
    manager
    kubectl
    user
    kernel
    Our integration is done!! integration

    View Slide

  28. Related tools
    ⚫oci-seccomp-bpf-hook
    • https://github.com/containers/oci-seccomp-bpf-hook
    • eBPF-based seccomp generator
    ⚫kubectl-trace
    • https://github.com/iovisor/kubectl-trace
    today’s
    topic

    View Slide

  29. oci-seccomp-bpf-hook
    ⚫oci-seccomp-bpf-hook generates seccomp profiles by tracing
    the syscalls made by the container using eBPF
    ⚫The perf is used to log syscalls
    ⚫This tool has a few limitations
    • Needs CAP_SYS_ADMIN to run
    • Compiles C code on the fly using LLVM
    • Cannot use podman run --rm along with this ability

    View Slide

  30. Kernel technologies oci-seccomp-bpf-hook used
    http://mmi.hatenablog.com/entry/2018/03/04/052249
    oci-seccomp-bpf-hook

    View Slide

  31. When oci-ftrace-syscall-analyzer is used
    ⚫Your production system doesn’t want to provide privileges
    with users
    ⚫Your production kernel didn’t prepare for eBPF
    configurations
    ⚫Your production system doesn’t want to use LLVM
    • GCC will support the BPF backend?
    –Compiling to BPF with GCC : https://lwn.net/Articles/800606/

    View Slide

  32. Process coredump on Kubernetes

    View Slide

  33. What is the problem?
    ⚫Process core dump will be recorded at the path of
    /proc/sys/kernel/core_pattern
    ⚫But containers have their own Linux namespace and kernel
    does not support Linux namespace
    ⚫Kubernetes users cannot get their process core dump.
    ⚫Kubernetes issues
    •https://github.com/kubernetes/kubernetes/issues/48787

    View Slide

  34. Community approach1 : Modify kernel code
    ⚫Modify core dump code inside kernel to support Linux
    namespace
    ⚫Patch
    • https://lkml.org/lkml/2017/8/2/77
    ⚫Not merged

    View Slide

  35. Community approach2 : Implement add-on for Kubernetes
    Not merged too
    https://github.com/kubernetes/kubernetes/issues/48787

    View Slide

  36. Wrap up
    ⚫oci-ftrace-syscall-analyzer is seamlessly integrated with
    Kubernetes using CRI-O
    • https://github.com/KentaTada/oci-ftrace-syscall-analyzer
    ⚫We should consider how to get process coredump on
    Kubernetes

    View Slide

  37. Challenging of oci-ftrace-syscall-analyzer
    ⚫Integrate our tool with containerd
    ⚫Implement the user space logging facility originated from
    our internal container tools
    ⚫Use kprobes to hook system call to investigate syscall args
    ⚫Implement seccomp generator
    ⚫Get rid of unnecessary syscall logs recorded from prestart to
    actual runC’s exec
    • oci-seccomp-bpf-hook used prctl(2) as starting point. Is it actually
    standard??

    View Slide

  38. We Are Hiring!!
    https://www.sony.co.jp/SonyInfo/Jobs/careers/

    View Slide

  39. SONYはソニー株式会社の登録商標または商標です。
    各ソニー製品の商品名・サービス名はソニー株式会社またはグループ各社の登録商標または商標です。その他の製品および会社名は、各社の商号、登録商標または商標です。

    View Slide