Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Understanding Ruby Grammar Through Conflicts

Avatar for yui-knk yui-knk
August 09, 2025

Understanding Ruby Grammar Through Conflicts

RubyConf Taiwan 2025
https://2025.rubyconf.tw/

Avatar for yui-knk

yui-knk

August 09, 2025
Tweet

More Decks by yui-knk

Other Decks in Programming

Transcript

  1. Design By COSCUP 2025 Marketing Team CC-BY 4.0 Understanding Ruby

    Grammar Through Conflicts August 9, 2025 RubyConf Taiwan 2025 @yui-knk Yuichiro Kaneko
  2. • Yuichiro Kaneko • I'm from Japan • This is

    my second time in Taiwan • yui-knk (GitHub) / spikeolaf (Twitter) • Treasure Data • Engineering Manager of Applications Backend About me
  3. • CRuby committer, mainly develop parser generator and parser •

    Lrama LALR (1) parser generator (2023, Ruby 3.3) • “parse.y” in Ruby About me
  4. • Have you ever created a parser? • Have you

    ever created a parser generator? Questions ✋
  5. Design By COSCUP 2025 Marketing Team CC-BY 4.0 Key Concepts

    in Parsing Language, Grammar and so on
  6. • A (formal) language is a subset of words •

    Some words belong to Ruby language • Others don’t What is Language?
  7. • Even so these codes are transcendental and imbroglio codes

    (TRICK), they belong to Ruby language Ruby https://github.com/tric/trick2022/blob/master/01-tompng/entry.rb https://github.com/tric/trick2022/blob/master/06-mame/entry.rb
  8. • At a glance, this code seems Ruby code, however

    it doesn’t belong to Ruby language Not Ruby
  9. • (Ruby) language is a in fi nite set of

    words • Grammar is a fi nite set of rules which de fi ne language What is Grammar? Grammar Language …
  10. • Grammar provides structure to the language • This structure

    is often represented as a tree structure (syntax tree) Grammar and structure + 1 2 3 * * + 1 2 3 Correct Wrong
  11. • The role of a parser is to analyze whether

    a given string of text conforms to a de fi ned grammar • In programming language, its major roles are: • Checking for syntax errors • Building the abstract syntax tree. What is Parser? Grammar + 1 2 3 * AST Input Check Build
  12. • A Parser Generator is a tool that generates a

    parser from a grammar de fi nition • Lrama is the tool that ful fi lls this role What is Parser Generator? Grammar fi le Generate Parser Generator Parser
  13. Design By COSCUP 2025 Marketing Team CC-BY 4.0 Structure of

    Ruby Grammar Some important grammatical rules in Ruby
  14. • Ruby's grammar can be divided into these four rules:

    • stmt (statement) • expr (expression) • arg (argument) • primary Four key grammar rules
  15. • “stmt” includes these rules, for example • alias, undef

    • modi fi er if, unless, while, until and rescue • expr 1. stmt
  16. • “expr” includes these rules, for example • and, or

    • not • method call w/o parentheses • pattern matching • arg 2. expr
  17. • “arg” includes these rules, for example • unary operators

    • binary operators • ternary operator • primary • “arg” can be an element of arguments 3. arg
  18. • There is an inclusive relationship among these four rules

    • For example, since a “primary” is also an “arg”, a “primary” can be written wherever an “arg” can Inclusive relation of rules primary arg expr stmt primary primary
  19. Design By COSCUP 2025 Marketing Team CC-BY 4.0 #1 Arguments

    and parameters Prioritize consistency
  20. • “arg” is a rule for actual arguments • While

    it's technically “arg_value”, I'll be using “arg” in this explanation to keep things simple • All four of these arguments are grammatically considered “arg” Actual arguments arg arg arg arg
  21. • “arg” is also a rule for the default value

    of formal parameter • All four of these formal parameters' default values are grammatically considered “arg” Default value of formal parameter arg arg arg arg
  22. • Actually, in the grammar fi le, both the actual

    arguments and the default values for formal parameters use “arg_value” • This means that the same expressions can be used for both. Both use “arg_value”
  23. • It's true that for block arguments, the default value

    is a “primary”, not an “arg” • Because of this, you can't write an expression containing a binary operator, like in the code below Default value of block arguments
  24. • In the grammar fi le, it speci fi es

    “primary_value” instead of “arg_value” “primary_value” v.s. “arg_value”
  25. • So, what would happen if we changed this “primary”

    to “arg”? Why is it “primary”?
  26. • The parser has processed up to the “1”, and

    the next token is the “|” (pipe) • At this point, the parser has two options for the “|” • as a binary operator • as the delimiter for a block's arguments • This state, where there are multiple options, is called con fl ict What is con fl ict?
  27. • The con fl ict can be resolved by setting

    a precedence for the rules • This change treats the “|” as the token that encloses the block's arguments Resolve the con fl ict
  28. • In short, the grammar allows for any binary operator

    except for the “|” • However, Ruby does not make that choice. Instead, it adopts a grammar that uniformly disallows binary operators in that Design choice
  29. • You can write the same content for both actual

    arguments and the default values of formal parameters in Ruby • Unlike those, the default values for block arguments cannot contain binary expressions • This can be enabled by adding constraints, but that hasn't been done • It seems to prioritize consistency over allowing certain binary expressions, avoiding an exception for one speci fi c operator Summary
  30. Design By COSCUP 2025 Marketing Team CC-BY 4.0 #2 command

    call One of a distinctive grammar of Ruby
  31. • In Ruby's grammar, method calls without parentheses are called

    “command” or “command_call” What is command call?
  32. • Since “command_call” belongs to “expr” and not to “arg”,

    it cannotɹ be used as an argument in principle “expr” v.s. “arg”
  33. • For instance, when we try to change “command_call” to

    belong to “arg”, a con fl ict arises Why is it “expr”?
  34. • The most obvious con fl ict is the ambiguity

    of where the arguments for method `m` and the command (`cmd`) end • Two arguments for `m` • Three arguments for `m` • Four arguments for `m` How many args are passed?
  35. • There's an exception to this: a command can be

    written as an actual argument when there is only one argument • This is because if there's only one argument, the parser doesn't need to determine where the arguments end • I personally think this is a pretty interesting design choice Exception Only one argument Multiple arguments
  36. • Even with just one argument, you can't pass a

    block • I think this design choice likely stems from an intent to keep the content you can write in an argument consistent, regardless of whether parentheses are used or not How blocks are handled
  37. • It is a distinctive feature of Ruby that parentheses

    can be omitted • Sometimes it improves the writing feel • It has the side effect of making the end of a grammatical construct unclear • This may be increasing the complexity of the grammar • There are even interesting design choice, such as allowing “command” to be written as argument if it is a single argument Summary
  38. Design By COSCUP 2025 Marketing Team CC-BY 4.0 #3 Endless

    method definitions Why are symbols omitted…?
  39. • There are two distinct grammar rules for endless method

    de fi nitions, depending on what's written in the method's body • One belongs to “stmt” • The other belongs to “arg” Two endless method de fi nitions
  40. • If a method's body is a method call with

    parentheses, it belongs to “arg” and can therefore be used as an argument • If it's a command, it belongs to “stmt” and cannot be used as an argument “arg” v.s. “stmt”
  41. • This is for the same reason as before with

    command: it becomes ambiguous where the argument list ends • One argument for `m` • Two arguments for `m` • Three arguments for `m How many args are passed? (again)
  42. • With a standard method de fi nition, the body

    is clearly de fi ned until the `end` keyword is reached, so this kind of ambiguity doesn't arise Standard method de fi nition case
  43. • [Feature #17398] is where it became a topic of

    discussion that a certain kind of endless method de fi nition becomes a “stmt” • https://bugs.ruby-lang.org/ issues/17398 • It includes a note of caution that endless method de fi nitions cannot be passed as arguments to a private method in some cases Feature #17398
  44. • “call_args” allows command calls only when there is a

    single argument • So, by the same logic, wouldn't it be possible to allow an endless method de fi nition whose body is a command only when it's the sole argument? Modify “call_args”
  45. • When actually added the rule to call_args and compiled

    it, no con fl ict occurred • The resulting Ruby accepts the script that had previously caused a syntax error • This solves the problem noted in Feature #17398 No con fl ict on new grammar
  46. • With a regular method de fi nition, the end

    of the method is clearly marked by the `end` keyword • Since endless method de fi nitions don't have an `end` keyword, they face the same constraint as command calls, they cannot be used as arguments • However it's possible to allow it to be written as an argument by limiting its use case Summary
  47. Design By COSCUP 2025 Marketing Team CC-BY 4.0 #4 Pattern

    matching A very rich set of grammar rules
  48. • Pattern matching, like endless method de fi nitions, is

    a "recent" syntax in Ruby's long history, so there have been multiple tickets fi led about it • Pattern matching was introduced as an experimental feature in Ruby 2.7 (2019-12-25) • Pattern matching with `case/in` was promoted to a stable feature in Ruby 3.0 (2020-12-25) • One-line pattern matching was promoted to a stable feature in Ruby 3.1 (2021-12-25) Pattern matching
  49. • [Bug #17925] Pattern matching syntax using semicolon one-line •

    https://bugs.ruby-lang.org/issues/17925 • [Bug #18080] Syntax error on one-line pattern matching • https://bugs.ruby-lang.org/issues/18080 • [Bug #21378] variable pinning does not look for method arguments • https://bugs.ruby-lang.org/issues/21378 • [Bug #21097] `x = a rescue b in c` and `def f = a rescue b in c` parsed differently between parse.y and prism • https://bugs.ruby-lang.org/issues/21097 Tickets related to pattern matching
  50. • [Bug #17925] Pattern matching syntax using semicolon one-line •

    https://bugs.ruby-lang.org/issues/17925 • [Bug #18080] Syntax error on one-line pattern matching • https://bugs.ruby-lang.org/issues/18080 • I've written a pretty detailed explanation, so feel free to check out the ticket if you'd like • Today, we're going to look at the third ticket, [Bug #21378] Tickets related to pattern matching
  51. • The topic of discussion in this ticket is a

    script where an endless method de fi nition and pattern matching are combined, like this • The reporter expected that the codes from `y` to the end of the braces would be interpreted as a pattern matching • However the parser interprets the part from `def` to `y` as the left-hand side of the pattern matching [Bug #21378]
  52. • Endless method de fi nitions can have a body

    that is either a “command” or “arg” “command” or “arg”
  53. • Body of the endless method de fi nition should

    be “arg” or “command” • Pattern matching is “expr” • “expr” is not “arg” Organize the issue Pattern matching is “expr” Body should be “arg” or “command”
  54. • Left side of patten matching should be “arg” •

    Endless method de fi nition with arg is “arg” • Then this script is interpreted as “endless method de fi nition IN pattern” Organize the issue Left side of patten matching should be “arg” This is “arg”
  55. • It's best to start by trying a general approach

    that doesn’t specialized for pattern matching • Change pattern matching to be “arg” • Then pattern matching can be body of endless method de fi nitions General approach “arg” can be a body of endless method de fi nition If pattern matching were an “arg”
  56. Group the con fl icts • These con fl icts

    can be broadly categorized into six groups • Con fl icts on “=>” • Con fl icts on “,” • Con fl icts on “|” • Con fl icts on “^” • (Con fl icts on “..” and “…”) • (Con fl icts on “**”) • I will take a closer look at the fi rst four
  57. • Simply, the fat arrow (`=>`) is already used in

    the argument syntax • Therefore, if pattern matching is made an “arg”, this fat arrow causes con fl icts Con fl icts on “=>” key value Pattern matching ???
  58. • Brackets or braces can be omitted for array and

    hash patterns in pattern matching • This leads to the familiar con fl ict of ambiguity about where arguments are separated Con fl icts on “,”
  59. • In pattern matching, “|” (pipe) is used to enumerate

    multiple patterns • Since this token is also used for binary operators, which are included in “arg”, a con fl ict arises Con fl icts on “|” Pattern 1 Pattern 2 Pattern matching arg ???
  60. • The fact that the “^” (caret) causes a con

    fl ict might be a bit surprising • In pattern matching, “^” is used for variable pinning, which is a unary operator • At fi rst glance, it might seem that it wouldn't con fl ict with the “^” used for binary operators Con fl icts on “^”
  61. • You can use a trailing comma in pattern matching

    • Brackets or braces can be omitted for array and hash patterns in pattern matching The key is the trailing comma
  62. • This makes it impossible to distinguish whether it's simply

    variable pinning or a combination of a trailing comma and a binary operator The key is the trailing comma array pattern w/ variable pinning pattern matching w/ trailing comma arg Binary Operator ???
  63. • Including pattern matching in `arg` by adding constraints •

    Enforcing brackets or braces when it’s used as arguments • Disallow trailing comma • Treat ‘|’ as binary operator then multiple patterns can’t be speci fi ed or the other way around • Allow pattern matching to be written in speci fi c locations like a command Possible solutions
  64. • Another approach is specialized for pattern matching • Instead

    of including pattern matching into “arg”, allowing pattern matching on the body of endless method de fi nition Another approach Pattern matching is “expr” Body should be “arg” or “command” Pattern matching is “expr” Body should be “arg” or “command” or “pattern matching”
  65. • Expand “command_asgn” to include endless method de fi nitions

    for one-line pattern matchings • Adjust precedence de fi nitions accordingly Change “command_asgn”
  66. • By this change, the parser recognizes pattern matching as

    a body of the method de fi nition • It works as expected Change “command_asgn” stmt = command_asgn pattern_matching in body
  67. • Ruby's grammar manages to use a limited set of

    symbols • It’s dif fi cult to manage grammar which allows some tokens to be omitted • However, it's possible to make them grammatically acceptable by limiting where it can be written Summary
  68. • Tried making a change to Ruby's grammar based on

    discussion in tickets and insights I've gained from looking at grammar fi les • By investigating the resulting con fl icts, understand some of the characteristics of Ruby's grammar Summary of this session
  69. • Consistency is important in Ruby's grammar • You can

    write the same content for both actual arguments and the default values of formal parameters in Ruby • Avoid a grammar which allows limited binary operators can be written • On the other hand, there are also areas where consistency is not kept • call_args allows “command” to be written only when it is a single argument Learnings through grammar modi fi cations
  70. • In Ruby's grammar, some tokens can be omitted •

    “command” is method call w/o parentheses • Endless method de fi nition has no “end” token • In pattern matching, brackets and braces of patterns can be omitted • Symbols are very limited resources • “|” is used as as binary operator and separator of block arguments/patterns • “,” is used as separator in many grammar rules • When these speci fi cations are combined, it sometimes causes con fl icts Learnings through grammar modi fi cations
  71. Con fl ict is not a bug, but rather an

    insight into the ambiguity of the grammar Conclusion
  72. Let's get comfortable with con fl icts. By doing so,

    you'll also be able to get along with grammar. Conclusion