Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Debugging – Learning Linux @ Facebook

Debugging – Learning Linux @ Facebook

This talk is part of the internal Learning Linux at Facebook learning sessions focused on debugging.

While this was an internal talk, and the recordings and speaker notes won't be made public, as they contained internal information, my employer was kind enough to allow me to openly publish them.

Javier Honduvilla Coto

April 01, 2022
Tweet

More Decks by Javier Honduvilla Coto

Other Decks in Technology

Transcript

  1. - Traditional tools - The Kernel as an API -

    Userspace debugging - Kernel debugging - BPF - Performance troubleshooting - Additional resources Agenda
  2. [h]top [javierhonduco@󰞵 ~] top Tasks: 540 total, 2 running, 530

    sleeping, 1 stopped, 7 zombie %Cpu(s): 2.6 us, 3.9 sy, 0.0 ni, 93.4 id, 0.0 wa, 0.0 hi, 0.1 si, 0.0 st MiB Mem : 57283.7 total, 4258.3 free, 22155.9 used, 30869.5 buff/cache MiB Swap: 25600.0 total, 16267.6 free, 9332.4 used. 34453.5 avail Mem PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 708530 root 20 0 456212 0 0 R 100.0 0.0 1540:20 oid.real 8419 root 20 0 1153836 46536 8620 S 7.6 0.1 447:36.76 dynolog 3842 smcproxy 20 0 3097676 402876 0 S 3.6 0.7 370:25.26 smc_proxy 3301357 scribe 20 0 2193228 123288 46192 S 2.6 0.2 0:37.18 scribed 43976 confige+ 20 0 2152296 152300 5876 S 2.3 0.3 118:53.75 configerator_pr 3246204 root 20 0 2314612 129148 20424 S 2.0 0.2 167:15.07 cppfbagentd 3263093 apache 20 0 10.5g 2.7g 355796 S 1.7 4.8 1:50.58 hphpi 942140 javierh+ 20 0 1546636 21188 3788 S 0.7 0.0 24:59.85 srclient 1793689 root 20 0 418848 5456 2364 S 0.7 0.0 16:31.23 rsyslogd 3444668 javierh+ 20 0 278308 5360 4132 R 0.7 0.0 0:00.05 top 3716658 root 20 0 1073152 13584 0 S 0.7 0.0 175:14.63 below 10 root 20 0 0 0 0 I 0.3 0.0 21:19.66 rcu_sched 4051 root 20 0 1131728 9444 0 S 0.3 0.0 7:30.31 fb_netnorad_res
  3. iostat [javierhonduco@󰞵 ~] iostat 1 Linux 5.2.9-229_fbk15_hardened_4185_g357f49b36602 (devvm3554.lla0.facebook.com) 14/06/21 _x86_64_

    (24 CPU) avg-cpu: %user %nice %system %iowait %steal %idle 3.96 0.33 2.93 0.32 0.00 92.46 Device tps kB_read/s kB_wrtn/s kB_read kB_wrtn vda 214.23 8945.63 9785.19 4749732297 5195506674
  4. vmstat [javierhonduco@󰞵 ~] vmstat 1 procs -----------memory---------- ---swap-- -----io---- -system--

    ------cpu----- r b swpd free buff cache si so bi bo in cs us sy id wa st 4 0 9554280 924204 3152 34698948 1 2 375 410 1 2 4 3 92 0 0
  5. lsof [javierhonduco@󰞵 ~] sudo lsof | head COMMAND PID TID

    TASKCMD USER FD TYPE DEVICE SIZE/OFF NODE NAME systemd 1 root cwd DIR 0,28 414 256 / systemd 1 root rtd DIR 0,28 414 256 / systemd 1 root txt REG 0,28 1877208 76395834 /usr/lib/systemd/systemd systemd 1 root mem REG 0,27 76395834 /usr/lib/systemd/systemd (path dev=0,28) systemd 1 root mem REG 0,27 257637 /usr/lib64/libnl-3.so.200.26.0 (path dev=0,28) systemd 1 root mem REG 0,27 257645 /usr/lib64/libnl-route-3.so.200.26.0 (path dev=0,28) systemd 1 root mem REG 0,27 6307144 /usr/lib64/libibverbs.so.1.11.32.0 (path dev=0,28)
  6. Procfs [javierhonduco@󰞵 ~] ls /proc/$$/ attr cwd limits mountstats personality

    smaps_rollup uid_map auxv environ loginuid net projid_map stack wchan cgroup exe make-it-fail ns root stat clear_refs fail-nth map_files numa_maps sched statm cmdline fd maps oom_adj schedstat status comm fdinfo mem oom_score sessionid syscall coredump_filter gid_map mountinfo oom_score_adj setgroups task cpuset io mounts pagemap smaps timerslack_ns
  7. Procfs [javierhonduco@󰞵 ~] ls -lah /proc/$$/exe lrwxrwxrwx 1 javierhonduco users

    0 May 26 03:35 /proc/3355607/exe -> /usr/bin/bash [javierhonduco@󰞵 ~] ls -l /proc/self/exe lrwxrwxrwx 1 javierhonduco users 0 May 26 03:52 /proc/self/exe -> /usr/bin/ls
  8. Procfs [javierhonduco@󰞵 ~] cat /proc/self/cmdline -u | xxd 00000000: 6361

    7400 2f70 726f 632f 7365 6c66 2f63 cat./proc/self/c 00000010: 6d64 6c69 6e65 002d 7500 mdline.-u.
  9. Procfs [javierhonduco@󰞵 ~] ls /proc/self/fd 0 1 2 3 [javierhonduco@󰞵

    ~] ls -l /proc/self/fd/0 lrwx------ 1 javierhonduco users 64 May 26 03:51 /proc/self/fd/0 -> /dev/pts/16
  10. close(10) Kernel Performs actions Through libc* Schedules process Next instruction

    Kernel space User space System calls (Simplified diagram 😅)
  11. Userspace – System calls [javierhonduco@󰞵 ~/learning_linux] strace ./app execve("./app", ["./app"],

    0x7ffd7e8a8600 /* 53 vars */) = 0 brk(NULL) = 0x602000 [...] openat(AT_FDCWD, "/lib64/libc.so.6", O_RDONLY|O_CLOEXEC) = 3 read(3, "\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\2405\2\0\0\0\0\0"..., [...] openat(AT_FDCWD, "/tmp/not_present_file", O_RDONLY) = -1 ENOENT (No such file or directory) [...] openat(AT_FDCWD, "/tmp/create_file", O_RDONLY|O_CREAT, 014140) = 3 write(1, "Got fd=3 around line=11\n", 24Got fd=3 around line=11 ) = 24 exit_group(24) = ? +++ exited with 24 +++
  12. Userspace – System calls [javierhonduco@󰞵 ~/learning_linux] strace -e trace=openat ./app

    | tail -2 openat(AT_FDCWD, "/tmp/not_present_file", O_RDONLY) = -1 ENOENT (No such file or directory) openat(AT_FDCWD, "/tmp/create_file", O_RDONLY|O_CREAT|O_TRUNC, 064140) = 3
  13. Userspace – System calls [javierhonduco@󰞵 ~/learning_linux] strace -c ./app %

    time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 92.14 0.000129 32 4 1 openat 7.86 0.000011 5 2 write 0.00 0.000000 0 2 read 0.00 0.000000 0 2 close 0.00 0.000000 0 3 fstat 0.00 0.000000 0 1 lseek 0.00 0.000000 0 5 mmap 0.00 0.000000 0 4 mprotect 0.00 0.000000 0 1 munmap 0.00 0.000000 0 4 brk 0.00 0.000000 0 1 1 access 0.00 0.000000 0 1 execve 0.00 0.000000 0 2 1 arch_prctl ------ ----------- ----------- --------- --------- ---------------- 100.00 0.000140 4 32 3 total
  14. Userspace – System calls [javierhonduco@󰞵 ~/learning_linux] strace -e trace=/open.* -k

    ./app [...] openat(AT_FDCWD, "/tmp/not_present_file", O_RDONLY) = -1 ENOENT (No such file or directory) > /usr/lib64/libc-2.28.so(open+0x52) [0xed2e2] > /home/javierhonduco/learning_linux/app(main+0x1b) [0x4005f1] > /usr/lib64/libc-2.28.so(__libc_start_main+0xf2) [0x23492] > /home/javierhonduco/learning_linux/app(_start+0x2d) [0x40051d]
  15. Perf trace [javierhonduco@󰞵 ~] perf trace ./app |& grep open

    0.096 ( 0.011 ms): app/3620882 openat(dfd: CWD, filename: 0xb1ffce40, flags: RDONLY|CLOEXEC) = 3 0.223 ( 0.007 ms): app/3620882 openat(dfd: CWD, filename: 0xb1df28e3, flags: RDONLY|CLOEXEC) = 3 0.253 ( 0.007 ms): app/3620882 openat(dfd: CWD, filename: 0xb1ff84e0, flags: RDONLY|CLOEXEC) = 3 0.349 ( 0.007 ms): app/3620882 openat(dfd: CWD, filename: 0xb1ff89d0, flags: RDONLY|CLOEXEC) = 3 0.435 ( 0.006 ms): app/3620882 openat(dfd: CWD, filename: 0xb1ff8ec0, flags: RDONLY|CLOEXEC) = 3 0.775 ( 0.006 ms): app/3620882 openat(dfd: CWD, filename: 0x4006e8) = -1 ENOENT (No such file or directory) 0.869 ( 0.063 ms): app/3620882 openat(dfd: CWD, filename: 0x400718, flags: RDONLY|CREAT|TRUNC, mode: IFLNK|ISUID|IXUSR|IRGRP|0xb1dc0000) = 3
  16. GDB [javierhonduco@󰞵 ~] gdb --args ./app [...] (gdb) b main

    Breakpoint 1 at 0x4005de: file app.c, line 7. (gdb) run Starting program: /home/javierhonduco/learning_linux/app [...] (gdb) bt #0 main () at app.c:7 (gdb) info locals fail_fd = 0 fd = 0
  17. GDB – dump all stacks [javierhonduco@󰞵 ~] gdb --args ./app

    [...] (gdb) thread apply all bt Thread 1 (process 267065): #0 main () at app.c:7
  18. GDB – printing [javierhonduco@󰞵 ~] gdb --args ./app 1 2

    3 4 5 6 [...] (gdb) p $rdi $1 = 6 (gdb) p *((char **)$rdx) $2 = 0x7fffffffdd95 "SHELL=/bin/bash" (gdb) p *(char **)$rdx@6 $4 = { 0x7fffffffdd95 "SHELL=/bin/bash", 0x7fffffffdda5 "LESS=-n -i", 0x7fffffffddb0 "HISTCONTROL=ignoredups:erasedups", 0x7fffffffddd1 "PHPSH_CEILING_DIRECTORIES=/home", 0x7fffffffddf1 "TMUX=/tmp/tmux-134475/default,708016,0", 0x7fffffffde18 "HISTSIZE=" }
  19. dmesg [javierhonduco@󰞵 ~] dmesg --human --color=always | tail [ +0.000003]

    BTRFS error (device loop0): bad tree block start, want 134725632 have 0 [ +0.004627] BTRFS error (device loop0): bad tree block start, want 134725632 have 0 [ +0.002381] BTRFS error (device loop0): bad tree block start, want 134725632 have 0 [ +0.002302] BTRFS error (device loop0): bad tree block start, want 134725632 have 0 [ +0.002299] BTRFS error (device loop0): bad tree block start, want 134725632 have 0 [ +5.034803] systemd[1]: Created slice system-timers-service_cert_strobelight_server.slice.
  20. drgn [javierhonduco@󰞵 ~] sudo drgn [...] >>> prog['jiffies'] (volatile unsigned

    long)4892481323 >>> prog['jiffies'] (volatile unsigned long)4892483099
  21. drgn [javierhonduco@󰞵 ~] sudo drgn [...] >>> find_task(prog, 1) *(struct

    task_struct *)0xffff888e3ffe0000 = { .thread_info = (struct thread_info){ .flags = (unsigned long)0, .status = (u32)0, }, .state = (volatile long)1, .stack = (void *)0xffffc90006244000, .usage = (refcount_t){ .refs = (atomic_t){ .counter = (int)2, }, }, .flags = (unsigned int)4194560, .ptrace = (unsigned int)0,
  22. BPF

  23. Event - Function* called - Function* returned - ... BPF

    code Triggers BPF Userspace Loads Userspace’ Sends data
  24. Bpf/trace – probe types • Userspace ◦ Uprobe ◦ Uretprobe

    ◦ USDTs • Kernel ◦ Kprobe ◦ Kretprobe ◦ Tracepoint ◦ Timers (interval) ◦ ...
  25. Bpftrace – open tracing [javierhonduco@󰞵 ~] sudo bpftrace -e 'tracepoint:syscalls:sys_enter_open

    { printf("%s %s\n", comm, str(args->filename)); }' Attaching 1 probe... fbagentc /sys/kernel/mm/transparent_hugepage/enabled smcc /proc/sys/vm/overcommit_memory smcc /sys/kernel/mm/transparent_hugepage/enabled smcc /proc/sys/vm/overcommit_memory smcc /sys/kernel/mm/transparent_hugepage/enabled smcc /proc/sys/vm/overcommit_memory smcc /sys/kernel/mm/transparent_hugepage/enabled ^C
  26. • strace -fc/-t • bpftrace • Tools that use procfs

    and other VFS • perf Performance debugging
  27. Perf – CPU profiling (record) [javierhonduco@󰞵 ~] perf record -g

    /bin/timeout 1 yes > /dev/null Using perf wrapper that supports hot-text. Try perf.real if you encounter any issues. [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.103 MB perf.data (1470 samples) ]
  28. Perf – stat [javierhonduco@󰞵 ~] sudo perf stat /bin/timeout 1

    yes > /dev/null Using perf wrapper that supports hot-text. Try perf.real if you encounter any issues. Performance counter stats for '/bin/timeout 1 yes': 1,001.64 msec task-clock # 0.999 CPUs utilized 9 context-switches # 0.009 K/sec 0 cpu-migrations # 0.000 K/sec 184 page-faults # 0.184 K/sec <not supported> cycles <not supported> instructions <not supported> branches <not supported> branch-misses 1.002260238 seconds time elapsed 0.369696000 seconds user 0.632553000 seconds sys
  29. Top-down analysis - Start with the “classic” tools when there’s

    an issue - Go deeper with the more advanced tools
  30. - Linux APIs with loads of examples: The Linux Programming

    Interface - Drgn: https://drgn.readthedocs.io/ - Bpftrace reference guide: https://github.com/iovisor/bpftrace/blob/master/docs/reference_guide.md - Linux debugging zines: https://wizardzines.com/zines/debugging/ - Performance: all of http://www.brendangregg.com/ - Some man pages are very good, such as proc(5)’s - BCC tools / bpftrace tools Additional resources