Upgrade to Pro — share decks privately, control downloads, hide ads and more …

管你要 trace 什麼 bpftrace 用下去就對了 (KaLUG場)

管你要 trace 什麼 bpftrace 用下去就對了 (KaLUG場)

Avatar for shunghsiyu

shunghsiyu

July 12, 2025
Tweet

More Decks by shunghsiyu

Other Decks in Programming

Transcript

  1. About me Shung-Hsi Yu @shunghsiyu(@fosstodon.org) Based in Taitung, Taiwan Works

    at SUSE Kernel Engineer Maintains (e)BPF stack in SLES and openSUSE 2
  2. 4 $ systemctl restart your-own.service Job for your-own.service failed because

    the control process exited with error code. See "systemctl status your-own.service" and "journalctl -xe" for details.
  3. $ systemctl status your-own.service • your-own.service Loaded: loaded (/etc/systemd/system/your-own.service; disabled;

    vendo… Active: failed (Result: exit-code) since Thu 2025-07-12 00:00:01 UTC Process: 80054 ExecStart=/usr/sbin/your-own (code=exited, status=1/FAIL… Jul 12 00:00:01 system systemd[1]: Starting Your Own Service... Jul 12 00:00:01 system your-own[80054]: Configuration file not found! 5
  4. 6

  5. $ man your-own No manual entry for your-own Possibly, man

    page is not installed, try online at: https://manpages.opensuse.org/something 7
  6. 8

  7. 9

  8. $ systemctl status your-own.service • your-own.service Loaded: loaded (/etc/systemd/system/your-own.service; disabled;

    vendo… Active: failed (Result: exit-code) since Thu 2025-07-12 00:00:01 UTC Process: 80054 ExecStart=/usr/sbin/your-own (code=exited, status=1/FAIL… Jul 12 00:00:01 system systemd[1]: Starting Your Own Service... Jul 12 00:00:01 system your-own[80054]: Configuration file not found! 10
  9. 11

  10. 12 $ bpftrace -e ' tracepoint:syscalls:sys_enter_open, /comm == "your-own"/ {

    print(str(args->filename)) }' Attaching 1 probes… /etc/your-own-special.conf
  11. 17 Tracing - Is this function called? What function is

    called? - What are the arguments? - What is the return value? - How long does something take?
  12. 18 bpftrace Language - Event-driven - Awk-like language, inspired by

    DTrace - C-like data structure definition & usage
  13. 20 Probe - an event (usually with wildecard/* support) -

    specific function called/returns - “tracepoint” (declared by developer) - bpftrace started, timer firing, etc…
  14. 21 $ bpftrace -e ' tracepoint:syscalls:sys_enter_open, /comm == "your-own"/ {

    print(str(args->filename)) }' Attaching 1 probes… /etc/your-own-special.conf
  15. 25 System Call (Syscall) - Interface for userspace program (e.g.

    Python) - Request the kernel to do something - e.g. open a file
  16. 34 Built-ins - Special variables - args: arguments associated with

    probe - comm: program name of current process - pid: ID of current process
  17. 43 Functions - Helpers provided by bpftrace - print(): simple

    printing of given value - str(): convert string pointer to actual string
  18. 46 Built-ins - Special variables - args: arguments associated with

    probe - comm: program name of current process - pid: ID of current process
  19. 58 Built-ins - Special variables - $1, $2, …: nth

    positional parameter passed to the bpftrace program
  20. 73 Functions - Helpers provided by bpftrace - print(): simple

    printing of given value - printf(): advance printing with formatting
  21. 82 $ ./opensnoop3.bt 'your-own' Attaching 6 probes… /lib64/libc.so.6: /etc/your-own-special.conf: 3

    -2 /etc/your-own-special.conf: -2 /etc/your-own-special.conf: /lib64/libc.so.6: -2
  22. 83 $ ./opensnoop3.bt 'your-own' Attaching 6 probes… /lib64/libc.so.6: /etc/your-own-special.conf: 3

    -2 /etc/your-own-special.conf: -2 /etc/your-own-special.conf: /lib64/libc.so.6: -2
  23. 84 Overlapping open() - More than one threads calling open()

    - Duration of their execution may overlap Thread 1 Thread 2 sys_enter_open sys_enter_open sys_exit_open sys_exit_open
  24. 85 Built-ins - Special variables - tid: Thread ID of

    the current thread - uniquely identifies an execution
  25. - A BPF memory object, e.g. @files - storage area

    - (usually) key-value map - i.e. similar to a global variable 92 Map Variable
  26. 100 Operators and Expressions - Supports arithmetic operators - +

    - * / - Logical (&&), Bitwise (^ |), and Relational (<= !=) works, too
  27. 101 Latency - How long does it take? - Get

    a timestamp when started - Get a timestamp when ended - Calculate difference between the timestamps
  28. 102 Functions - Helpers provided by bpftrace - nsecs(): returns

    a timestamp in nanoseconds - can drop parenthesis when no argument
  29. - Lexical-scoped variable, e.g. $duration - stores simple values -

    numbers (int, long, …), strings - i.e. similar to a local variable 109 Scratch Variable
  30. 113 Dealing with Lots of Data - What is open()

    was calls 1k times per seconds? - Statistics to the rescue: - minimum, maximum, average, sum - quantile, histogram, time-series
  31. $ ./sys_latency_hist.bt 'your-own' @latency: [256, 512) 64 | | [512,

    1K) 216818 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@| [1K, 2K) 160007 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ | [2K, 4K) 30255 |@@@@@@@ | [4K, 8K) 7114 |@ | [8K, 16K) 318 | | [16K, 32K) 150 | | 119
  32. 123 bpftrace Strengths 1. Safe 2. Dynamic Tracing 3. Low

    Overhead 4. Easy eBPF Make kernel programming possible for everyone
  33. 124 bpftrace Strengths 1. Safe 2. Dynamic Tracing 3. Low

    Overhead 4. Easy eBPF Image from https://ebpf.io/what-is-ebpf/
  34. 127 Adding printf() - Needs to recompile and restart the

    process - more than once (?) - No Heisenbug
  35. 129 Similar Tools - strace/ltrace: high overhead, as high as

    100x - ftrace: less dynamic, restricted process - LTTng: requires out-of-tree kernel module - bcc: less up-to-date in terms of features
  36. 134 Resources Videos - An introduction to bpftrace tracing language

    - bpftrace: a path to the ultimate Linux tracing… Texts - A thorough introduction to bpftrace - bpftrace(8) Manual Page
  37. 140 /* In net/ipv4/tcp_output.c, build a SYN and send it

    off. */ int tcp_connect(struct sock *sk) { struct tcp_sock *tp = tcp_sk(sk); struct sk_buff *buff; …
  38. 141 /* In net/ipv4/tcp_output.c, build a SYN and send it

    off. */ int tcp_connect(struct sock *sk) { struct tcp_sock *tp = tcp_sk(sk); struct sk_buff *buff; …
  39. 142 struct sock { /* In include/net/sock.h */ struct sock_common

    { struct { __be32 skc_daddr; /* Destination address */ __be32 skc_rcv_saddr; }; struct { __be16 skc_dport; /* Destination port, in network byte order */
  40. 143 struct sock { /* In include/net/sock.h */ struct sock_common

    { struct { __be32 skc_daddr; /* Destination address */ __be32 skc_rcv_saddr; }; struct { __be16 skc_dport; /* Destination port, in network byte order */
  41. 144 /* In net/ipv4/tcp_output.c, build a SYN and send it

    off. */ int tcp_connect(struct sock *sk) { struct tcp_sock *tp = tcp_sk(sk); struct sk_buff *buff; …
  42. 145 fentry:tcp_connect /args.sk->__sk_common.skc_family == 2/ { $daddr = ntop(args.sk->__sk_common.skc_daddr); $dport

    = bswap(args.sk->__sk_common.skc_dport); printf("%s:%d\n", $daddr, $dport);
  43. 146 /* In net/ipv4/tcp_output.c, build a SYN and send it

    off. */ int tcp_connect(struct sock *sk) { struct tcp_sock *tp = tcp_sk(sk); struct sk_buff *buff; …
  44. 147 fentry:tcp_connect /args.sk->__sk_common.skc_family == 2/ { $daddr = ntop(args.sk->__sk_common.skc_daddr); $dport

    = bswap(args.sk->__sk_common.skc_dport); printf("%s:%d\n", $daddr, $dport);
  45. 148 struct sock { /* In include/net/sock.h */ struct sock_common

    { struct { __be32 skc_daddr; /* Destination address */ __be32 skc_rcv_saddr; }; struct { __be16 skc_dport; /* Destination port, in network byte order */
  46. 149 struct sock { /* In include/net/sock.h */ struct sock_common

    { struct { __be32 skc_daddr; /* Destination address */ __be32 skc_rcv_saddr; }; struct { __be16 skc_dport; /* Destination port, in network byte order */
  47. 150 fentry:tcp_connect /args.sk->__sk_common.skc_family == 2/ { $daddr = ntop(args.sk->__sk_common.skc_daddr); $dport

    = bswap(args.sk->__sk_common.skc_dport); printf("%s:%d\n", $daddr, $dport);
  48. 151 Functions - Helpers provided by bpftrace - ntop(): convert

    IP address data to text - bswap(): reverse byte order
  49. 152 fentry:tcp_connect /args.sk->__sk_common.skc_family == 2/ { $daddr = ntop(args.sk->__sk_common.skc_daddr); $dport

    = bswap(args.sk->__sk_common.skc_dport); printf("%s:%d\n", $daddr, $dport);
  50. 153 fentry:tcp_connect /args.sk->__sk_common.skc_family == 2/ { $daddr = ntop(args.sk->__sk_common.skc_daddr); $dport

    = bswap(args.sk->__sk_common.skc_dport); printf("%s:%d\n", $daddr, $dport);
  51. 154 fentry:tcp_connect /args.sk->__sk_common.skc_family == 2/ { $daddr = ntop(args.sk->__sk_common.skc_daddr); $dport

    = bswap(args.sk->__sk_common.skc_dport); printf("%s:%d\n", $daddr, $dport);
  52. 157 Choosing a Probe - kprobe/kretprobe for kernel functions -

    uprobe/uretprobe for userspace functions - Tracepoint and USDT for predefined points