Inc. ◦ Agile division and RubyxAgile group ◦ A member of Parser Club • Hobbies ◦ Parsers ◦ Games (Rhythm games / Sim games / Tabletop games) ◦ Speed cubes
into tokens (Tokenization) • Parser ◦ Program that constructs a structure from token stream ▪ Compilers: source code -> Abstract Syntax Tree (AST) ▪ JSON or CSV parsers: text -> some data structure • Parser Generator ◦ Program that generates a parser from grammar files
of linguistics that deals with "Language" in a mathematical and set-theoretical way ◦ Considers how a language is represented as text ▪ Does not consider the semantics ▪ e.g., English is represented as sequences of alphabets, interspersed with symbols and spaces ◦ Composed of Symbols and Grammar
A kind of Formal Language that is represented as follows: ▪ rule: A B C ... | D E F ... ▪ This notation is called Backus-Naur Form (BNF) ◦ Almost Programming Languages belong to this category ◦ Used in the grammar file which is the input of Parser Generator
create its data structures when source code is executed or compiled • A program which generates parser from grammar file is called Parser Generator • The input grammar file of Parser Generator is written in CFG • BNF is one of the representation of CFG
replacement for Bison ◦ https://github.com/ruby/lrama • Presented in RubyKaigi 2023 by Yuichiro Kaneko ◦ https://youtu.be/IhfDsLx784g?si=kO1q6mLpTa1bIRYL • Use in CRuby 3.3 build process ◦ You can try now by building HEAD of Ruby ◦ Ruby's behavior is NOT changed
version ◦ Since Bison versions vary among users, it's necessary to assume that older versions may be installed ◦ Cannot be used even if new features are introduced • Allows for the implementation of Ruby-specific features ◦ Parsing unfinished code for LSP ◦ Making the complex parse.y more readable
rules can be created by attaching a symbol after symbols ▪ sym*: represents a list of 0+ syms ▪ sym+: represents a list of 1+ syms ▪ sym?: represents sym is appeared or not
parser generator developed by GNU, following Yacc ◦ Used for generating CRuby parser from parse.y • Racc ◦ A parser generator developed by Minero Aoki ◦ Used in Parser gem (RuboCop dependency) and others
while the generation algorithms are the same, there are few parts that can be commonly used, and it's less costly to create new ones ◦ Input file grammar ▪ Bison: Yacc-like / Racc: Original ◦ Generated parser's language ▪ Bison: C / Racc: Ruby
only tells you whether the input adheres to the grammar or not ◦ It does not create an AST, nor does it save any information necessary for subsequent processing • You can write programs in {} following each grammar rule • Can use $n or @n, as the values of symbols in grammar ◦ This feature is known as (Numbered) References
understand due to lack of declarativeness ◦ If the grammar changes, it must be rewritten since it specifies by position number • Named References was developed to resolve these issues, enabling the use of values through referencing nonterminal symbol names
a parser generator built with Ruby as a replacement for Bison ◦ Named References is a feature of Bison, allowing the use of symbol names as References within Actions
the symbol names can be associated with calls within Actions, the part for generating code from the association already exists ◦ Decided to associate symbol names with calls, taking inspiration from the implementation of Numbered References
◦ Information about what is currently being parsed • When loading Actions, it checks the location of the rule just read to determine 'which symbols are being referenced', completing the association process • Transferring this entire process to the parser is extremely challenging and does not align with the original design
◦ Possible to tokenize all the input in one go by changing its own state, allowing the process to be divided into phases • If the parser knows it ◦ Since the parser will receive the next token from the lexer based on its state, the lexer can focus on tokenization
• Todo list for Lrama written by Yuichiro Kaneko ◦ https://docs.google.com/document/d/1EAZzYMXBOdzK-6 mMIj2YNJxZZRVcpJxE7-4zXbHn8JA/edit?usp=sharing ◦ (Japanese only)