Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Ruby x BPF in Action / RubyKaigi 2022

KONDO Uchio
September 09, 2022

Ruby x BPF in Action / RubyKaigi 2022

KONDO Uchio

September 09, 2022
Tweet

More Decks by KONDO Uchio

Other Decks in Technology

Transcript

  1. Uchio Kondo Infra & Streaming team @ Mirrativ, Inc Speaker

    @ RubyKaigi 2016, 2018, 2019, 2021 “Hacker” Supporter @ Engineer’s Café Fukuoka Ruby & Rust enthuasist, Linux freak Live in Fukuoka
  2. e.g. Visualize “SYN queue” From cloudflare’s blog “SYN packet handling

    in the wild”: https://blog.cloudflare.com/syn-packet-handling-in-the-wild/
  3. 2 servers in different config • Same server (WEBRick 1.7.0,

    ruby 3.1.2) ◦ • Same bench parameter: ◦ • Different value of net.core.somaxconn ◦ 4,096 vs 500, how this makes effect?
  4. #2 How it works ref. https://speakerdeck.com/chikuwait/learn-ebpf?slide=17 by Yuki Nakata, 2020

    emoji from https://github.com/twitter/twemoji/tree/master/assets (*) Very simplified Scripting Bytecode BPF VM BPF Map User Interface Collectiong Kernel Data… … or perf buffer, etc. The Userland The Kingdom of Kernel
  5. #2 How it works BTF requires: Kernel version >= 5.6

    && CONFIG_DEBUG_INFO_BTF should be enabled
  6. #4 (Expanding) Use cases… • BPF-based network & security for

    containers • the de facto Kubernetes threat detection engine using BPF
  7. What is RbBCC? • A: BCC for Ruby (libbcc FFI

    binding for Ruby) • WHAT is BCC? ◦ BPF Compiler Collection: ◦ An SDK to make BPF tools, using Script Languages (Python/Lua supported officially) ◦ But - Ruby is not in its support list, so I’m developping I’m going to show How to use – How to write BPF Ruby codes.
  8. #1 kprobe • kprobe - (mainly) function trace in Linux

    Kernel • e.g. __ARCH_sys_execve() ◦ The substantial function called when execve(2) is invoked
  9. Example RbBCC Program: Ruby Part: Load the C part above

    and Get feedback and data from BPF program inside kernel
  10. Trace the connect • Also has 2 parts: ◦ BPF

    DSL in C ◦ Load program & handle data in Ruby
  11. #2 tracepoint (for kernel) • Different stuff from Ruby’s TracePoint

    class • A static entrypoint to trace kernel events • It won’t change in the future version of Linux ◦ kprobe traces an exported symbol of kernel, so it should be changed and maybe unstable.
  12. • Example: tracing WEBrick (again): ◦ ruby: ◦ ab: •

    Tracing command: ◦ ruby: ◦ strace: FYI: Performance sideeffect
  13. #3 uprobe • Using uprobe (and USDT afterwards) with ease,

    build a special Ruby binary with a specific option:
  14. #3 uprobe • Collecting rb_str_new()’s: (function return timestamp - function

    entry timestamp) • This represents the latency of a function call • function entry = uprobe, function return = uretprobe
  15. #3 uprobe • Example of rb_str_new()’s latency histogram: ruby -e

    ‘p “Hello”’ ruby --disable gems -e ‘p “Hello”’
  16. #4 USDT • USDT: Userspace Statically Defined Tracepoint ◦ Probe

    points that an author of a program embedded in advance ◦ cf. uprobe traces real function call dynamically • USDT for uprobe is just as Tracepoint for kprobe Dynamic Static Kernel space kprobe tracepount User space uprobe USDT
  17. #4 USDT • Ruby’s USDT (first for DTrace, but available

    via BPF in Linux) Japanese article: https://magazine.rubyist.net/articles/0041/0041-200Special-dtrace.html https://rubyreferences.github.io/rubyref/advanced/dtrace.html
  18. #4 USDT • Example: USDTs about GC: ◦ usdt:./bin/ruby:ruby:gc__mark__begin ◦

    usdt:./bin/ruby:ruby:gc__mark__end ◦ usdt:./bin/ruby:ruby:gc__sweep__begin ◦ usdt:./bin/ruby:ruby:gc__sweep__end • They can be used to trace GC latency: ◦ (mark_end_time - mark_begin_time)
  19. #4 USDT • Example: Real-time tracing of RSS, GC mark

    and sweep statics ◦ Plumping up the Sinatra app process and visualize
  20. Summary: • BPF Observability has 4 keys of tracing source:

    • RbBCC can access all of four. Just use Ruby (and small C). • Use Ruby to trace Ruby. Dynamic Static Kernel space kprobe tracepount User space uprobe USDT
  21. Real World Tuning • Well-Done Speedup Contest in RubyKaigi •

    Theme: JSON parser Ruston ◦ mainly implemented by … Rust. (it’s native gem) ◦ Somewhat slow compared to de-facto json.rb
  22. run perf • perf is useful to grasp the overall

    bottleneck • json’s flamegraph
  23. Let’s start tracing by BPF • tracing focused function: malloc/free

    for this time (*) N is limited to 10,000 in solo measurement
  24. Point 1: Reduce iter()/String • Reduce iterator methods on Lex#peek()

    ◦ peek() is called many times on lexing process…
  25. Point 1: Reduce iter()/String • This leads to reduce the

    usage of String ◦ Use &[u8] instead
  26. Point 1: Reduce iter()/String • Then measure! malloc calloc free

    Ruston Before 750197 22 753491 Ruston After 110197 22 113596 cf. C JSON 20206 10022 34142 (*) N = 10,000
  27. Point 2: Reduce realloc • Try to reduce realloc to

    allocate in advance ◦ Specify capacity via Vec::with_capacity()
  28. Point 2: Reduce realloc • Measure! … The effect seems

    limited. - To be continued - realloc elapsed(s) longer case Ruston w/o vec capacity 90002 0.080172 Ruston w/ vec capacity 40002 0.071188 cf. C JSON 2 0.052459
  29. The result #2 • Comparison before / after all; for

    case N = 50,000 user system total Ruston Before 0.277292 0.000000 0.277292 Ruston After All 0.051765 0.000000 0.051765 cf. C JSON 0.054263 0.000000 0.054263
  30. Lessons learned • Existing tools are useful (e.g. perf, strace,

    gdb…) • To grasp detailed bottleneck, making simple BPF tool is effective. • uprobe is an entrypoint to x-ray native programs’ performance e.g. C, C++ and Rust (also … Zig?) • Just keep them in mind: measure, reproduce, measure.
  31. Acknolegements: • The Book “Linux Observability with BPF” ◦ by

    David Calavera, Lorenzo Fontana ◦ https://www.oreilly.com/library/view/linux-observability-wit h/9781492050193/ • Brendan Gregg for his superb articles: ◦ https://www.brendangregg.com/bpf-performance-tools-b ook.html • Masashi Misono for his Japanese introduction to BPF ◦ https://atmarkit.itmedia.co.jp/ait/articles/2004/09/news006 .html
  32. Acknolegements: • RbBCC received Ruby Association Grant in 2019 ◦

    report: https://www.ruby.or.jp/ja/news/20200508 ◦ Maintored by Koichi “ko1” Sasada (Cookpad, Inc.) ◦ Given some advices from Ryosuke Matsumoto (Sakura Internet), Takao Shimayoshi and Yoshiaki Kasahara (Kyushu Univ.)
  33. Environment of this slide: • Ruby: ◦ 3.1.2 with dtrace

    enabled • Linux: ◦ CPU: aarch64 ◦ Ubuntu 20.04.1 with kernel 5.8.0-63-generic • Other libraries and softwares: ◦ BCC(libbcc): 0.18.0 built with LLVM 9 ◦ strace: 5.5 (from package manager) ◦ perf: 5.8.18 (from package manager) • Code Examples