state's strategy of how means (military and nonmilitary) can be used to advance and achieve national interests in the long- term. https://en.wikipedia.org/wiki/Grand_strategy
large range of languages Major parser algorithm To be precise, LR-attributed grammar I believe grammar easy for human is close to LR grammar LL parser Has has less power than LR parser PEG It’s difficult to create Error Tolerant parser A rule failure doesn’t imply a parsing failure like in context free grammars
gives accurate feedback for grammar BNF is very declarative No gap between grammar and parser implementation LR parser is based on theory of computer science
Argument is optional Parentheses around arguments are optional Block is optional (The symbol of pattern matching, `in` or `=>) Need to discuss grammar rules as group E.g. “a == b”, “1 + 2” and “1..2” are in same “arg” group If change “arg” rules, need to consider the impact on “expr” and “stmt” too
Parser implementation is a combination of parts Parser generator: combination of rules Recursive Descent Parser: combination of functions, e.g. “parse_pattern_matching”, “parse_arguments”
practice Divide the difficulties However it requires mechanism to integrate these parts LR parser generator has the mechanism, conflict detection Hand written parser doesn’t have such mechanism
be added Existing syntax will change by feedback Parser generator works as checker/linter for grammar Can not keep soundness of grammar without the help from computer science
LR(1) for the Deterministic Parsing of Composite Languages”, May 2010. https://tigerprints.clemson.edu/cgi/ viewcontent.cgi?article=1519&context=all_dissertations Lukas Diekmann and Laurence Tratt. “Don’t Panic! Better, Fewer, Syntax Errors for LR Parsers”, July 2020. https:// arxiv.org/pdf/1804.07133.pdf Joe Zimmerman “Practical LR Parser Generation”, Sep 2022 https://arxiv.org/pdf/2209.08383.pdf
parsing. “On the translation of languages from left to right” 1975: Yacc is published 1985: GNU Bison initial release 1989: Berkeley Yacc initial release 2006: GCC migrates it’s parser from Bison to hand- written recursive-descent parsers (C++ was 2004) 2015: Go migrates it’s parser from Bison to hand- written recursive-descent parsers
Errors for LR Parsers” 2022: “Practical LR Parser Generation” 2023: “The future vision of Ruby Parser” in RubyKaigi 2023 2023: Lrama replaces Bison in CRuby 2023: New era of LR parser !! 2024: “The grand strategy of Ruby Parser” in RubyKaigi 2024
Ruby Lrama (Llama) inherits the name from Yacc (Yak) and Bison It’s LR parser generator, then not “LL”ama but “LR”ama Ruby uses Lrama to generate parser from 3.3
hack parse.y We need more and more features Bison is not easy to enhance new features Ruby build system depends on Bison installed on your machine Lrama is installed into ruby/ruby tool directory then we can use latest features Bison is difficult to manage It was broken even though we didn't do anything when we released Ruby 2.7.7 Especially installing Bison on Windows is not easy task
Provide platform for LSP and other tools Provide Universal parser Keep both Ruby grammar and parser to be maintainable Solution LR parser and parser generator are the best friends for Ruby Lrama is new foundation for Ruby parser instead of Bison
as Object. Therefore some types of node, e.g. NODE_OP_ASGN2, has difficult structure. Need to re-design each node structure to be more easily understandable. More information is needed More location information Comments But there is a blocker of this goal…
CRuby interpreted AST nodes directly. Therefore some optimizations happen on parser even today. Need to delete such optimizations so that the result of parse keeps the structure of input text. But there is a blocker of this goal too…
single struct definition There is no flexibility to add new field to specific type of node It’s not straightforward to cast each field based on node type Need to change data structure from union base struct to dedicated struct for each node
based error recovery Lukas Diekmann and Laurence Tratt. “Don’t Panic! Better, Fewer, Syntax Errors for LR Parsers”, July 2020 https://arxiv.org/pdf/1804.07133.pdf
syntax valid Leverage the fact LR parser has clear automaton structure How CPCT+ works? IF … END IF expr_value THEN compstmt if_tail END IF expr_value THEN … END true then
friendly node structure Ef fi cient data structure (Cactuses) Delete operation support Integration to parse.y More accurate recovery Error tolerance Parser Generator (Lrama) Parser ✅ 💪 Universal Parser parse.y for Under graduate
operations They are effective in a rule, e.g. class or method definition Embed state management into grammar rule POC is implemented Want to introduce this feature to Ruby 3.4 https://github.com/ruby/lrama/pull/231
lexer then change to manage states on parser side Joel E. Denny. “PSLR(1): Pseudo-Scannerless Minimal LR(1) for the Deterministic Parsing of Composite Languages”, May 2010. https://tigerprints.clemson.edu/cgi/ viewcontent.cgi?article=1519&context=all_dissertations
to generate loosely coupled scanners and parsers, so the user must maintain these tightly coupled scanner and parser specifications separately but consistently. > Scanner and parser specifications would be significantly more maintainable if all sub-language transitions were instead computed from a grammar by a parser generator and recognized automatically by the scanner using the parser’s stack.
friendly node structure Ef fi cient data structure (Cactuses) Delete operation support Integration to parse.y More accurate recovery Error tolerance Parser Generator (Lrama) Parser ✅ 💪 Universal Parser parse.y for Under graduate
(Node) User friendly node structure parse.y for Under graduate More declarative parser Ef fi cient data structure (Cactuses) Delete operation support Integration to parse.y More accurate recovery Parameterizing rules Replace hand written parser with Racc User de fi ned stack Scanner state update syntax Scannerless parser IELR Error tolerance Parser Generator (Lrama) Parser ✅ ✅ ✅ ✅ 💪 💪 💪 💪 Universal Parser
Other Ruby implementation by C JRuby, TruffleRuby, ruruby: Other Ruby implementation by non-C Implementing 100 % compatible Ruby parser is a bit difficult Managing parser for each version is difficult
GC as imemo object imemo is “Internal memo object” managed by GC Ruby’s GC is useful. It frees memory which is not used anymore Before this goal, it needs to remove objects from nodes Objects on nodes are GC marked via AST structure
(Node) User friendly node structure parse.y for Under graduate More declarative parser Ef fi cient data structure (Cactuses) Delete operation support Integration to parse.y More accurate recovery Parameterizing rules Replace hand written parser with Racc User de fi ned stack Scanner state update syntax Scannerless parser IELR Error tolerance Parser Generator (Lrama) Parser ✅ ✅ ✅ ✅ 💪 💪 💪 💪
Refactoring Ripper LSP Delete parser level optimization Union to Struct (Node) User friendly node structure parse.y for Under graduate More declarative parser Ef fi cient data structure (Cactuses) Delete operation support Integration to parse.y More accurate recovery Parameterizing rules Replace hand written parser with Racc User de fi ned stack Scanner state update syntax Scannerless parser IELR Error tolerance Parser Generator (Lrama) Parser ✅ ✅ ✅ ✅ ✅ ✅ 💪 💪 💪 💪 💪
Refactoring Ripper LSP Delete parser level optimization Union to Struct (Node) User friendly node structure parse.y for Under graduate More declarative parser Ef fi cient data structure (Cactuses) Delete operation support Integration to parse.y More accurate recovery Parameterizing rules Replace hand written parser with Racc User de fi ned stack Scanner state update syntax Scannerless parser IELR Error tolerance Parser Generator (Lrama) Parser ✅ ✅ ✅ ✅ ✅ ✅ 💪 💪 💪 💪 💪
Refactoring Ripper LSP Optimize Node memory management Delete parser level optimization Union to Struct (Node) User friendly node structure parse.y for Under graduate More declarative parser Ef fi cient data structure (Cactuses) Delete operation support Integration to parse.y More accurate recovery Parameterizing rules Replace hand written parser with Racc User de fi ned stack Scanner state update syntax Scannerless parser IELR RBS Error tolerance Parser Generator (Lrama) Parser ✅ ✅ ✅ ✅ ✅ ✅ ✅ 💪 💪 💪 💪 💪 💪
Refactoring Ripper LSP Optimize Node memory management Delete parser level optimization Union to Struct (Node) User friendly node structure parse.y for Under graduate More declarative parser Ef fi cient data structure (Cactuses) Delete operation support Integration to parse.y More accurate recovery Parameterizing rules Replace hand written parser with Racc User de fi ned stack Scanner state update syntax Scannerless parser IELR RBS Error tolerance Parser Generator (Lrama) Parser ✅ ✅ ✅ ✅ ✅ ✅ ✅ 💪 💪 💪 💪 💪 💪
written by each programing language Grammar C Java JavaScript Parser (C) Parser (Java) Parser (JavaScript) Parser Generator Generate Other languages Parser (Other languages)
syntax in *.rbinc source files if parse.y can be transformed to Ruby parser array.rb array.rbinc (C fi le) mk_builtin_loader.rb (ripper) + baseruby parse.y parse.rb Lrama + baseruby array.rb array.rbinc (C fi le) parse.rb + baseruby
Ruby Parser Long term goals Provide platform for LSP and other tools Provide Universal parser Keep both Ruby grammar and parser to be maintainable Solution LR parser and parser generator are the best approach for Ruby
Refactoring Ripper LSP Optimize Node memory management Delete parser level optimization Union to Struct (Node) User friendly node structure parse.y for Under graduate More declarative parser Ef fi cient data structure (Cactuses) Delete operation support Integration to parse.y More accurate recovery Parameterizing rules Replace hand written parser with Racc User de fi ned stack Scanner state update syntax Scannerless parser IELR RBS Error tolerance Parser Generator (Lrama) Parser
Refactoring Ripper LSP Optimize Node memory management Delete parser level optimization Union to Struct (Node) User friendly node structure parse.y for Under graduate More declarative parser Ef fi cient data structure (Cactuses) Delete operation support Integration to parse.y More accurate recovery Parameterizing rules Replace hand written parser with Racc User de fi ned stack Scanner state update syntax Scannerless parser IELR RBS Error tolerance Parser Generator (Lrama) Parser ✅ ✅ ✅ ✅ ✅ ✅ ✅ 💪 💪 💪 💪 💪 💪 💪
Refactoring Ripper LSP Optimize Node memory management Delete parser level optimization Union to Struct (Node) User friendly node structure parse.y for Under graduate More declarative parser Ef fi cient data structure (Cactuses) Delete operation support Integration to parse.y More accurate recovery Parameterizing rules Replace hand written parser with Racc User de fi ned stack Scanner state update syntax Scannerless parser IELR RBS Error tolerance Parser Generator (Lrama) Parser ✅ ✅ ✅ ✅ ✅ ✅ ✅ 💪 💪 💪 💪 💪 💪 💪
Parser Roadmap”, https://docs.google.com/presentation/d/ 1E4v9WPHBLjtvkN7QqulHPGJzKkwIweVfcaMsIQ984_Q Yuichiro Kaneko. “Ruby Parser։ൃࢽ (12) - LR parser generatorͷఏڙ͢Δจ๏ͷ݈શੑ”, September 2023. https://yui-knk.hatenablog.com/entry/2023/09/19/191135 Yuichiro Kaneko. “Ruby Parser։ൃࢽ (14) - LR parserશʹཧղͨ͠”, December 2023. https://yui-knk.hatenablog.com/entry/2023/12/06/082203 Lukas Diekmann and Laurence Tratt. “Don’t Panic! Better, Fewer, Syntax Errors for LR Parsers”, July 2020. https://arxiv.org/pdf/1804.07133.pdf Joel E. Denny. “PSLR(1): Pseudo-Scannerless Minimal LR(1) for the Deterministic Parsing of Composite Languages”, May 2010. https://tigerprints.clemson.edu/cgi/viewcontent.cgi? article=1519&context=all_dissertations