Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Making the RBS Parser Faster

Making the RBS Parser Faster

RubyKaigi 2026, Hakodate, Hokkaido

Avatar for Soutaro Matsumoto

Soutaro Matsumoto

April 23, 2026

More Decks by Soutaro Matsumoto

Other Decks in Programming

Transcript

  1. Soutaro Matsumoto • Senior Software Engineer at Shopify Ruby DX

    team • A Ruby core committer • RBS designer • Steep developer
  2. Soutaro Matsumoto • Senior Software Engineer at Shopify Ruby DX

    team • A Ruby core committer • RBS designer • Steep developer
  3. Soutaro Matsumoto • Senior Software Engineer at Shopify Ruby DX

    team • A Ruby core committer • RBS designer • Steep developer
  4. steep server singleton(T)[S] type Generics lower bounds: T > S

    ruby-rbs and ruby-rbs-sys crates steep check -e "1 + true" Several type narrowing updat RBS 4.0 & Steep 2.0 steep query
  5. steep server Inline RBS declaration support 🎉 singleton(T)[S] type Generics

    lower bounds: T > S ruby-rbs and ruby-rbs-sys crates steep check -e "1 + true" Several type narrowing updat RBS 4.0 & Steep 2.0 steep query
  6. Inline RBS Type Declaration • RBS and Steep directly support

    the feature, without using rbs-inline gem • Enable it by adding inline: true to the check calls in your Steep fi le Experimental
  7. steep query • steep query lets you navigate your codebase

    from the command line • Output is in the same JSON format as LSP $ steep query hover signature_service:3:15 $ steep query definition Steep::Typing
  8. steep query • steep query lets you navigate your codebase

    from the command line • Output is in the same JSON format as LSP $ steep query hover signature_service:3:15 $ steep query definition Steep::Typing
  9. steep query • steep query lets you navigate your codebase

    from the command line • Output is in the same JSON format as LSP $ steep query hover signature_service:3:15 $ steep query definition Steep::Typing
  10. steep query • steep query lets you navigate your codebase

    from the command line • Output is in the same JSON format as LSP $ steep query hover signature_service:3:15 $ steep query definition Steep::Typing
  11. steep query • steep query lets you navigate your codebase

    from the command line • Output is in the same JSON format as LSP $ steep query hover signature_service:3:15 $ steep query definition Steep::Typing
  12. steep query • steep query lets you navigate your codebase

    from the command line • Output is in the same JSON format as LSP $ steep query hover signature_service:3:15 $ steep query definition Steep::Typing
  13. The RBS Parser • RBS parser translates source text into

    an AST • Using RBS always starts from parsing
  14. RBS::Parser in RBS gem The pure C parser The New

    Parser Architecture • The new parser constructs a pure C AST and translates it into Ruby objects The source code C Struct AST Ruby Object AST
  15. 🤔 We can ignore the issue because parsing accounts for

    only a small portion of total tool execution time
  16. 🤔 We can ignore the issue because parsing accounts for

    only a small portion of total tool execution time We could parallelize parsing for better performance if needed.
  17. 🤔 We can ignore the issue because parsing accounts for

    only a small portion of total tool execution time The new parser scans the input twice. It is inevitable. We could parallelize parsing for better performance if needed.
  18. 🤔 We can ignore the issue because parsing accounts for

    only a small portion of total tool execution time The new parser scans the input twice. It is inevitable. We could parallelize parsing for better performance if needed.
  19. 🤔 We can ignore the issue because parsing accounts for

    only a small portion of total tool execution time The new parser scans the input twice. It is inevitable. We could parallelize parsing for better performance if needed.
  20. 🤔 We can ignore the issue because parsing accounts for

    only a small portion of total tool execution time The new parser scans the input twice. It is inevitable. We could parallelize parsing for better performance if needed.
  21. 🤔 We can ignore the issue because parsing accounts for

    only a small portion of total tool execution time The new parser scans the input twice. It is inevitable. We could parallelize parsing for better performance if needed.
  22. Small Inputs • Small inputs are more common with inline

    RBS declarations, as used in both RBS and Sorbet • We have many small RBS inputs in one fi le for annotations
  23. Small Inputs • Small inputs are more common with inline

    RBS declarations, as used in both RBS and Sorbet • We have many small RBS inputs in one fi le for annotations
  24. Start Pro fi ling • Best practice: pro fi le

    the code before you start • I started with some random optimizations (didn't work) • We use Instruments, a pro fi ler bundled in Xcode on macOS
  25. Lexer • Lexer -- lexical analyzer or tokenizer -- groups

    characters into tokens • Parser operates on tokens rather than individual characters c l a s s I n t e g e r < N u m e r i c Characters Tokens
  26. Lexer • Lexer -- lexical analyzer or tokenizer -- groups

    characters into tokens • Parser operates on tokens rather than individual characters c l a s s I n t e g e r < N u m e r i c Characters Tokens kCLASS
  27. Lexer • Lexer -- lexical analyzer or tokenizer -- groups

    characters into tokens • Parser operates on tokens rather than individual characters c l a s s I n t e g e r < N u m e r i c Characters Tokens kCLASS tUIDENT
  28. Lexer • Lexer -- lexical analyzer or tokenizer -- groups

    characters into tokens • Parser operates on tokens rather than individual characters c l a s s I n t e g e r < N u m e r i c Characters Tokens kCLASS tUIDENT pLT
  29. Lexer • Lexer -- lexical analyzer or tokenizer -- groups

    characters into tokens • Parser operates on tokens rather than individual characters c l a s s I n t e g e r < N u m e r i c Characters Tokens kCLASS tUIDENT pLT tUIDENT
  30. t y p e t = " വ ؗ "

    t (116) Moves 1 byte
  31. t y p e t = " വ ؗ "

    t (116) Moves 1 byte
  32. t y p e t = " വ ؗ "

    t (116) Moves 1 byte വ (20989)
  33. t y p e t = " വ ؗ "

    t (116) Moves 1 byte Moves 3 bytes വ (20989)
  34. Encoding of RBS Source Text • There was no speci

    fi cation of encoding of RBS source text • De fi ne an encoding spec 💪 • Follow Ruby's spec • It supports multi-byte encoding, but they must be ASCII compatible • UTF-8, SJIS, EUC-JP are fi ne, UTF-16 and UTF-32 are not
  35. The Problem • Our lexer generator doesn't support encoding other

    than Unicode • Do we need to implement the conversion from supported encodings to Unicode? 😫
  36. We don't need actual codepoints for parsing Only comments and

    string literal types allow non-ASCII characters
  37. We don't need actual codepoints for parsing type language =

    "ϧϏʔ" type language = "ϥετ" Only comments and string literal types allow non-ASCII characters
  38. We don't need actual codepoints for parsing type language =

    "ϧϏʔ" type language = "ϥετ" # ू߹Λදݱ͢ΔΫϥε class Set[T] end # ϧϧϧϧϧϧϧϧϧϧ class Set[T] end Only comments and string literal types allow non-ASCII characters
  39. Skip Codepoint Calculation • If the next character is multi-byte,

    returning any Unicode codepoint works • If it's a single-byte character, it's an ASCII character • The actual comment/string content can be fetched from the bu ff er
  40. Skip Codepoint Calculation • If the next character is multi-byte,

    returning any Unicode codepoint works • If it's a single-byte character, it's an ASCII character • The actual comment/string content can be fetched from the bu ff er
  41. Benchmarking RBS(4.0.0.dev.4 in Gemfile-unicode) parsing with 86 files (2148952 bytes)...

    ✅ 30.108 i/s (33.254 ms/i) (±3.321%) Benchmarking RBS(4.0.0.dev.4 in Gemfile-base) parsing with 86 files (2148952 bytes)... ✅ 25.854 i/s (38.759 ms/i) (±3.868%) Lexer Updated • Correctly handles non-UTF-8 input • Add character byte count to the lexer struct, and use it in rbs_peek to skip second encoding->char_width call 16% improvement with core RBS benchmark
  42. You should try a malloc-based allocator Sure! Thanks! 🙊 ...

    Why though? The pro fi ler doesn't show anything related to the allocator problem. 🤔
  43. You should try a malloc-based allocator Sure! Thanks! 🙊 ...

    Why though? The pro fi ler doesn't show anything related to the allocator problem. 🤔 It solved the problem! 😳
  44. Custom Allocator 1. Allocate large memory for arena at the

    start of parsing 2. Allocate memory from arena during parsing 3. Free the entire arena when parsing is complete
  45. Custom Allocator 1. Allocate large memory for arena at the

    start of parsing 2. Allocate memory from arena during parsing 3. Free the entire arena when parsing is complete Arena
  46. Custom Allocator 1. Allocate large memory for arena at the

    start of parsing 2. Allocate memory from arena during parsing 3. Free the entire arena when parsing is complete Arena
  47. Custom Allocator 1. Allocate large memory for arena at the

    start of parsing 2. Allocate memory from arena during parsing 3. Free the entire arena when parsing is complete Arena
  48. Custom Allocator 1. Allocate large memory for arena at the

    start of parsing 2. Allocate memory from arena during parsing 3. Free the entire arena when parsing is complete Arena
  49. Custom Allocator 1. Allocate large memory for arena at the

    start of parsing 2. Allocate memory from arena during parsing 3. Free the entire arena when parsing is complete Arena
  50. Custom Allocator 1. Allocate large memory for arena at the

    start of parsing 2. Allocate memory from arena during parsing 3. Free the entire arena when parsing is complete Arena
  51. Custom Allocator 1. Allocate large memory for arena at the

    start of parsing 2. Allocate memory from arena during parsing 3. Free the entire arena when parsing is complete
  52. mmap (2) • mmap (2) is a system call that

    maps fi les directly into a process's address space • We use mmap simply to allocate a large contiguous block of memory for our custom allocator • munmap to return the memory to OS
  53. malloc-Based Allocator • Use malloc to allocate memory and free

    to release the memory • malloc/free actually uses mmap/munmap internally, but the C library runtime usually reuse the memory blocks for better performance
  54. Reusing Memory Blocks mmap-based Allocator malloc-based Allocator Parsing #1 ⚡

    ⚡ ⚡ ⚡ ⚡ ⚡ Parsing #2 Page-faults C library runtime keeps the blocks
  55. Reusing Memory Blocks mmap-based Allocator malloc-based Allocator Parsing #1 ⚡

    ⚡ ⚡ ⚡ ⚡ ⚡ Parsing #2 Page-faults C library runtime keeps the blocks
  56. Reusing Memory Blocks mmap-based Allocator malloc-based Allocator Parsing #1 ⚡

    ⚡ ⚡ ⚡ ⚡ ⚡ Parsing #2 Page-faults C library runtime keeps the blocks
  57. Reusing Memory Blocks mmap-based Allocator malloc-based Allocator Parsing #1 ⚡

    ⚡ ⚡ ⚡ ⚡ ⚡ Parsing #2 Page-faults C library runtime keeps the blocks
  58. Reusing Memory Blocks mmap-based Allocator malloc-based Allocator Parsing #1 ⚡

    ⚡ ⚡ ⚡ ⚡ ⚡ Parsing #2 Page-faults C library runtime keeps the blocks
  59. Reusing Memory Blocks mmap-based Allocator malloc-based Allocator Parsing #1 ⚡

    ⚡ ⚡ ⚡ ⚡ ⚡ ⚡ ⚡ ⚡ Parsing #2 Page-faults C library runtime keeps the blocks
  60. Reusing Memory Blocks mmap-based Allocator malloc-based Allocator Parsing #1 ⚡

    ⚡ ⚡ ⚡ ⚡ ⚡ ⚡ ⚡ ⚡ Parsing #2 Page-faults No page-fault! C library runtime keeps the blocks
  61. malloc Is Faster! • It triggers fewer page-faults than mmap-based

    allocator • Page-fault cost is relatively big with small fi les
  62. Final Results 1.67x faster core Steep Timee Big 300 Small

    300 Baseline Lexer Allocator Head For smaller RBS fi les, baseline is slower then 3.9
  63. Conclusion • The new RBS parser is up to 1.6x

    faster than RBS 3.9 • It correctly handles UTF-8 and other encoding inputs • Pro fi lers are useful, but they are not a silver bullet • For this parser optimization, the biggest improvement didn't come from pro fi ling 🤷 • RBS 4 and Steep 2 are out 📣