Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Parsing RBS

Parsing RBS

RubyKaigi 2023

Soutaro Matsumoto

May 13, 2023
Tweet

More Decks by Soutaro Matsumoto

Other Decks in Programming

Transcript

  1. Parsing RBS
    দຊ फଠ࿠ (Soutaro Matsumoto)

    View Slide

  2. Parsing RBS
    দຊ फଠ࿠ (Soutaro Matsumoto)
    Matsumoto is here

    View Slide

  3. Parsing RBS
    দຊ फଠ࿠ (Soutaro Matsumoto)
    Matsumoto is here
    Where is Soutaro?

    View Slide

  4. Where is Soutaro?

    View Slide

  5. Tokyo
    Matsumoto

    View Slide

  6. Tokyo
    Matsumoto
    Soutaro

    View Slide

  7. Tokyo
    Matsumoto
    Soutaro
    Soutaro

    View Slide

  8. Soutaro station
    (Forget the transliteration variations in photos)

    View Slide

  9. Soutaro station
    Soutaro timetable
    (Forget the transliteration variations in photos)

    View Slide

  10. Soutaro station
    Soutaro timetable
    Soutaro bus stop
    (Forget the transliteration variations in photos)

    View Slide

  11. Soutaro station
    Soutaro timetable
    Soutaro bus stop Soutaro cedar trees
    (Forget the transliteration variations in photos)

    View Slide

  12. Soutaro station
    Soutaro timetable
    Soutaro bus stop Soutaro cedar trees
    Soutaro park
    (Forget the transliteration variations in photos)

    View Slide

  13. Parsing RBS
    Soutaro Matsumoto

    View Slide

  14. Recent updates on Steep/RBS
    • RBS 3.1

    • Steep 1.4

    View Slide

  15. New syntaxes in RBS 3.0
    Class/module alias syntax
    Use syntax
    (Import in Java/C# for RBS)
    (RBS) (Ruby)

    View Slide

  16. Steep 1.4
    • RBS 3.0 support

    • Signature help

    • Better completion in RBS

    View Slide

  17. Signature help
    • A method signature list pops up on method calls to help developers
    typing arguments

    View Slide

  18. Signature help
    • A method signature list pops up on method calls to help developers
    typing arguments

    View Slide

  19. Better type name completion
    • Typing chan resolves to Parseg::TokenFactory::change

    • It inserts shorter names based on the current module nesting context

    View Slide

  20. Better type name completion
    • Typing chan resolves to Parseg::TokenFactory::change

    • It inserts shorter names based on the current module nesting context

    View Slide

  21. View Slide

  22. View Slide

  23. Why two di
    ff
    erent type names here? 🤔

    View Slide

  24. • When parameter type is being typed, it has syntax error and the module
    nesting context is lost → Absolute type name is inserted 🤷
    module Parseg


    module ParsingSession


    def intersect?: (Parseg::TokenFactory::change)


    end


    end
    module Parseg


    module ParsingSession


    def intersect?: (c)


    end


    end

    View Slide

  25. • When return type is being typed, it's valid syntax → Relative type name is
    inserted 🙆
    module Parseg


    module ParsingSession


    def intersect?: ... -> TokenFactory::change


    end


    end
    module Parseg


    module ParsingSession


    def intersect?: ... -> c


    end


    end

    View Slide

  26. Parsing broken RBS matters
    • The inconsistency is caused by parsing errors

    • We need a parser that continue working even with syntax errors to provide
    advanced IDE features

    View Slide

  27. View Slide

  28. View Slide

  29. 1. Demo

    2. Top-down parser outline

    3. Error recovery (1)

    4. Error recovery (2)

    5. Error recovery (3)
    You will be able to write a top-down parser with error recovery. 💪

    View Slide

  30. Error tolerant parser generator
    • Generates top-down parser with error recovery

    • Grammar de
    fi
    nition in Ruby DSL

    • (Doesn't generate any parser code yet 😜)
    https://github.com/soutaro/parseg

    View Slide

  31. Grammar De
    fi
    nition
    class_decl ::= class module_name

    class_member*

    end


    module_name ::= UIDENT


    class_member ::= class_decl


    | method_definition


    | attr_reader


    | ...

    View Slide

  32. Grammar De
    fi
    nition
    class_decl ::= class module_name

    class_member*

    end


    module_name ::= UIDENT


    class_member ::= class_decl


    | method_definition


    | attr_reader


    | ...

    View Slide

  33. Grammar De
    fi
    nition
    class_decl ::= class module_name

    class_member*

    end


    module_name ::= UIDENT


    class_member ::= class_decl


    | method_definition


    | attr_reader


    | ...

    View Slide

  34. Grammar De
    fi
    nition
    class_decl ::= class module_name

    class_member*

    end


    module_name ::= UIDENT


    class_member ::= class_decl


    | method_definition


    | attr_reader


    | ...

    View Slide

  35. Grammar De
    fi
    nition
    class_decl ::= class module_name

    class_member*

    end


    module_name ::= UIDENT


    class_member ::= class_decl


    | method_definition


    | attr_reader


    | ...

    View Slide

  36. Grammar De
    fi
    nition
    class_decl ::= class module_name

    class_member*

    end


    module_name ::= UIDENT


    class_member ::= class_decl


    | method_definition


    | attr_reader


    | ...

    View Slide

  37. Output

    View Slide

  38. Output

    View Slide

  39. Output

    View Slide

  40. Output

    View Slide

  41. Output

    View Slide

  42. Output

    View Slide

  43. Output

    View Slide

  44. Parser implementation
    class_decl ::=


    class module_name

    class_member*

    end
    • Each non-terminal symbols has corresponding method

    • Call the parsing methods to construct the content of the tree

    View Slide

  45. Parser implementation
    class_decl ::=


    class module_name

    class_member*

    end
    • Each non-terminal symbols has corresponding method

    • Call the parsing methods to construct the content of the tree

    View Slide

  46. Parser implementation
    class_decl ::=


    class module_name

    class_member*

    end
    • Each non-terminal symbols has corresponding method

    • Call the parsing methods to construct the content of the tree

    View Slide

  47. Parser implementation
    class_decl ::=


    class module_name

    class_member*

    end
    • Each non-terminal symbols has corresponding method

    • Call the parsing methods to construct the content of the tree

    View Slide

  48. Parser implementation
    class_decl ::=


    class module_name

    class_member*

    end
    • Each non-terminal symbols has corresponding method

    • Call the parsing methods to construct the content of the tree

    View Slide

  49. Parser implementation
    class_decl ::=


    class module_name

    class_member*

    end
    • Each non-terminal symbols has corresponding method

    • Call the parsing methods to construct the content of the tree

    View Slide

  50. • Alternation is implemented with case analysis on the
    fi
    rst token of the
    input
    class_member ::= class_decl


    | method_definition


    | attr_reader


    | ...


    View Slide

  51. • Alternation is implemented with case analysis on the
    fi
    rst token of the
    input
    class_member ::= class_decl


    | method_definition


    | attr_reader


    | ...


    View Slide

  52. • Alternation is implemented with case analysis on the
    fi
    rst token of the
    input
    class_member ::= class_decl


    | method_definition


    | attr_reader


    | ...


    View Slide

  53. • Alternation is implemented with case analysis on the
    fi
    rst token of the
    input
    class_member ::= class_decl


    | method_definition


    | attr_reader


    | ...


    View Slide

  54. Parsing result
    class_decl ::= class module_name class_member* end


    method_definition ::= def method_name : method_type

    View Slide

  55. Parsing result
    class_decl ::= class module_name class_member* end


    method_definition ::= def method_name : method_type

    View Slide

  56. Parsing result
    class_decl ::= class module_name class_member* end


    method_definition ::= def method_name : method_type

    View Slide

  57. View Slide

  58. View Slide

  59. View Slide

  60. View Slide

  61. Parsing error
    • We can
    fi
    nd some structure from the input, even it has a syntax error

    • There is a class declaration

    • There is a method de
    fi
    nition

    • Non tolerant parser tells you nothing

    View Slide

  62. method_definition ::= def method_name : method_type

    View Slide

  63. method_definition ::= def method_name : method_type

    View Slide

  64. method_definition ::= def method_name : method_type

    View Slide

  65. Introduce MissingTree

    View Slide

  66. method_definition ::= def method_name : method_type

    View Slide

  67. method_definition ::= def method_name : method_type

    View Slide

  68. View Slide

  69. View Slide

  70. View Slide

  71. Error tolerant parser (1)
    • Inserts MissingTree instead of raising errors
    😃

    View Slide

  72. #initialize de
    fi
    nition disappeared
    🤔

    View Slide

  73. → MissingTree

    View Slide

  74. → MissingTree
    → MissingTree

    View Slide

  75. → MissingTree
    → MissingTree
    → MissingTree

    View Slide

  76. → MissingTree
    → MissingTree
    → MissingTree
    The -> token stays at the beginning of the input

    View Slide

  77. Skip tokens
    • One token blocks further parsing when no rule handles the token

    • 💡 Skip that tokens to continue parsing

    View Slide

  78. • Tokens that may be consumed by the parsing methods are:

    • Possible
    fi
    rst tokens of type (UIDENT, void, untyped, ...)

    • class, attr_reader, and def for next class_member

    • end for closing the class declaration

    • class for next class declaration
    attr_reader ::= attr_reader attribute_name : type

    View Slide

  79. Implementation
    • Skips tokens that cannot be consumed in the rule before processing every
    rule

    • (And calculate the consumable tokens set)

    View Slide

  80. View Slide

  81. Error tolerant parser (2)
    • Inserts MissingTree instead of raising errors

    • Skip tokens that cannot be consumed with other possible rules
    😃
    This is well-known error recovery strategy for top-down parsers. 

    (https://github.com/microsoft/tolerant-php-parser)

    View Slide

  82. 🤔

    View Slide

  83. View Slide

  84. View Slide

  85. View Slide

  86. View Slide

  87. Nested declaration
    • Inner class declaration eats the following method de
    fi
    nition

    • Conference#initialize disappears and unexpected type error will
    be detected

    • Better error recovery is to close the Talk de
    fi
    nition immediately

    View Slide

  88. What was happening?
    class Conference


    def initialize: (String, Integer) -> void


    end

    View Slide

  89. What was happening?
    class Conference


    def initialize: (String, Integer) -> void


    end
    class Conference


    class Talk


    def initialize: (String, Integer) -> void


    end

    View Slide

  90. What was happening?
    class Conference


    def initialize: (String, Integer) -> void


    end
    class Conference


    class Talk


    def initialize: (String, Integer) -> void


    end
    class Conference


    class Talk


    end


    def initialize: (String, Integer) -> void


    end

    View Slide

  91. What was happening?
    class Conference


    def initialize: (String, Integer) -> void


    end
    class Conference


    class Talk


    def initialize: (String, Integer) -> void


    end
    class Conference


    class Talk


    end


    def initialize: (String, Integer) -> void


    end
    class Conference




    View Slide

  92. Key ideas
    • Let parser use the changes made on input since the last successful
    parsing result

    • Avoid moving existing elements into new trees

    View Slide

  93. Key ideas
    • Let parser use the changes made on input since the last successful
    parsing result

    • Avoid moving existing elements into new trees
    😵 😁

    View Slide

  94. Change based error recovery
    • Identify which tokens are changed since the last successful parsing

    • Closes the declaration at the end of change
    Inserted tokens
    Close the declaration

    View Slide

  95. class Conference def initialize : ...

    View Slide

  96. class Conference def initialize : ...
    Text inserted
    class Talk

    View Slide

  97. class Conference def initialize : ...
    class Conference class Talk def initialize : ...
    Changed tokens
    Text inserted
    class Talk

    View Slide

  98. class Conference def initialize : ...
    class Conference class Talk def initialize : ...
    Changed tokens
    Text inserted
    class Talk
    class Conference class Talk [EOC] def initialize : ...
    Inserts a marker token

    View Slide

  99. class Conference class Talk [EOC] def initialize : ...

    View Slide

  100. class Conference class Talk [EOC] def initialize : ...

    View Slide

  101. class Conference class Talk [EOC] def initialize : ...

    View Slide

  102. class Conference class Talk [EOC] def initialize : ...

    View Slide

  103. class Conference class Talk [EOC] def initialize : ...

    View Slide

  104. class Conference class Talk [EOC] def initialize : ...

    View Slide

  105. class Conference class Talk [EOC] def initialize : ...

    View Slide

  106. class Conference class Talk [EOC] def initialize : ...

    View Slide

  107. class Conference class Talk [EOC] def initialize : ...

    View Slide

  108. class Conference class Talk [EOC] def initialize : ...

    View Slide

  109. class Conference class Talk [EOC] def initialize : ...

    View Slide

  110. class Conference class Talk [EOC] def initialize : ...

    View Slide

  111. Change based error recovery
    • The error recovery runs only after normal parsing fails to keep successful
    results identical to the results of original parser

    View Slide

  112. View Slide

  113. View Slide

  114. Change based error recovery
    • 👍 Minimal grammar modi
    fi
    cation

    • 👍 Token based change detection

    • No tree di
    ff
    calculation required

    • Changed tokens are easily detected by LSP edit noti
    fi
    cations

    • 😵 Unsupported text editing patterns may result in confusing errors

    View Slide

  115. Error tolerant parser (3)
    • Inserts MissingTree instead of raising errors

    • Skip tokens that cannot be consumed with other possible rules

    • Avoid moving existing elements if parsing fails
    😃

    View Slide

  116. 🎉

    View Slide

  117. Open problems
    • Translating the concrete syntax tree to AST

    • AST de
    fi
    nes a successful parsing result
    Attribute declarations must have names and types

    View Slide

  118. Summary
    • Planning to replace RBS parser for better development experience

    • Making a top-down parser error tolerant

    • Generates parsing tree even with syntax errors

    • Change based error recovery

    View Slide

  119. View Slide

  120. • Trial 1: Based on ML's type inference (2007)

    • Trial 2: Based on control
    fl
    ow analysis (2009)

    • (Break until Oedo RubyKaigi 2017)

    • Trial 3: Steep -- introducing type declarations
    My 15 years for type checking Ruby programs

    View Slide

  121. • Trial 1: Based on ML's type inference (2007)

    • Trial 2: Based on control
    fl
    ow analysis (2009)

    • (Break until Oedo RubyKaigi 2017)

    • Trial 3: Steep -- introducing type declarations
    My 15 years for type checking Ruby programs

    View Slide

  122. • Trial 1: Based on ML's type inference (2007)

    • Trial 2: Based on control
    fl
    ow analysis (2009)

    • (Break until Oedo RubyKaigi 2017)

    • Trial 3: Steep -- introducing type declarations
    My 15 years for type checking Ruby programs

    View Slide

  123. View Slide

  124. It was called .rbi

    View Slide

  125. No class
    It was called .rbi

    View Slide

  126. Dedicated syntax for types in Ruby
    No class
    It was called .rbi

    View Slide

  127. Ruby with Steep is the best Ruby programming
    experience to me ⭐

    View Slide

  128. View Slide

  129. • @soutaro on GitHub/Twitter

    • @[email protected]

    [email protected]

    View Slide