Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Parsing RBS

Parsing RBS

RubyKaigi 2023

Soutaro Matsumoto

May 13, 2023
Tweet

More Decks by Soutaro Matsumoto

Other Decks in Programming

Transcript

  1. Parsing RBS
    দຊ फଠ࿠ (Soutaro Matsumoto)

    View full-size slide

  2. Parsing RBS
    দຊ फଠ࿠ (Soutaro Matsumoto)
    Matsumoto is here

    View full-size slide

  3. Parsing RBS
    দຊ फଠ࿠ (Soutaro Matsumoto)
    Matsumoto is here
    Where is Soutaro?

    View full-size slide

  4. Where is Soutaro?

    View full-size slide

  5. Tokyo
    Matsumoto

    View full-size slide

  6. Tokyo
    Matsumoto
    Soutaro

    View full-size slide

  7. Tokyo
    Matsumoto
    Soutaro
    Soutaro

    View full-size slide

  8. Soutaro station
    (Forget the transliteration variations in photos)

    View full-size slide

  9. Soutaro station
    Soutaro timetable
    (Forget the transliteration variations in photos)

    View full-size slide

  10. Soutaro station
    Soutaro timetable
    Soutaro bus stop
    (Forget the transliteration variations in photos)

    View full-size slide

  11. Soutaro station
    Soutaro timetable
    Soutaro bus stop Soutaro cedar trees
    (Forget the transliteration variations in photos)

    View full-size slide

  12. Soutaro station
    Soutaro timetable
    Soutaro bus stop Soutaro cedar trees
    Soutaro park
    (Forget the transliteration variations in photos)

    View full-size slide

  13. Parsing RBS
    Soutaro Matsumoto

    View full-size slide

  14. Recent updates on Steep/RBS
    • RBS 3.1

    • Steep 1.4

    View full-size slide

  15. New syntaxes in RBS 3.0
    Class/module alias syntax
    Use syntax
    (Import in Java/C# for RBS)
    (RBS) (Ruby)

    View full-size slide

  16. Steep 1.4
    • RBS 3.0 support

    • Signature help

    • Better completion in RBS

    View full-size slide

  17. Signature help
    • A method signature list pops up on method calls to help developers
    typing arguments

    View full-size slide

  18. Signature help
    • A method signature list pops up on method calls to help developers
    typing arguments

    View full-size slide

  19. Better type name completion
    • Typing chan resolves to Parseg::TokenFactory::change

    • It inserts shorter names based on the current module nesting context

    View full-size slide

  20. Better type name completion
    • Typing chan resolves to Parseg::TokenFactory::change

    • It inserts shorter names based on the current module nesting context

    View full-size slide

  21. Why two di
    ff
    erent type names here? 🤔

    View full-size slide

  22. • When parameter type is being typed, it has syntax error and the module
    nesting context is lost → Absolute type name is inserted 🤷
    module Parseg


    module ParsingSession


    def intersect?: (Parseg::TokenFactory::change)


    end


    end
    module Parseg


    module ParsingSession


    def intersect?: (c)


    end


    end

    View full-size slide

  23. • When return type is being typed, it's valid syntax → Relative type name is
    inserted 🙆
    module Parseg


    module ParsingSession


    def intersect?: ... -> TokenFactory::change


    end


    end
    module Parseg


    module ParsingSession


    def intersect?: ... -> c


    end


    end

    View full-size slide

  24. Parsing broken RBS matters
    • The inconsistency is caused by parsing errors

    • We need a parser that continue working even with syntax errors to provide
    advanced IDE features

    View full-size slide

  25. 1. Demo

    2. Top-down parser outline

    3. Error recovery (1)

    4. Error recovery (2)

    5. Error recovery (3)
    You will be able to write a top-down parser with error recovery. 💪

    View full-size slide

  26. Error tolerant parser generator
    • Generates top-down parser with error recovery

    • Grammar de
    fi
    nition in Ruby DSL

    • (Doesn't generate any parser code yet 😜)
    https://github.com/soutaro/parseg

    View full-size slide

  27. Grammar De
    fi
    nition
    class_decl ::= class module_name

    class_member*

    end


    module_name ::= UIDENT


    class_member ::= class_decl


    | method_definition


    | attr_reader


    | ...

    View full-size slide

  28. Grammar De
    fi
    nition
    class_decl ::= class module_name

    class_member*

    end


    module_name ::= UIDENT


    class_member ::= class_decl


    | method_definition


    | attr_reader


    | ...

    View full-size slide

  29. Grammar De
    fi
    nition
    class_decl ::= class module_name

    class_member*

    end


    module_name ::= UIDENT


    class_member ::= class_decl


    | method_definition


    | attr_reader


    | ...

    View full-size slide

  30. Grammar De
    fi
    nition
    class_decl ::= class module_name

    class_member*

    end


    module_name ::= UIDENT


    class_member ::= class_decl


    | method_definition


    | attr_reader


    | ...

    View full-size slide

  31. Grammar De
    fi
    nition
    class_decl ::= class module_name

    class_member*

    end


    module_name ::= UIDENT


    class_member ::= class_decl


    | method_definition


    | attr_reader


    | ...

    View full-size slide

  32. Grammar De
    fi
    nition
    class_decl ::= class module_name

    class_member*

    end


    module_name ::= UIDENT


    class_member ::= class_decl


    | method_definition


    | attr_reader


    | ...

    View full-size slide

  33. Parser implementation
    class_decl ::=


    class module_name

    class_member*

    end
    • Each non-terminal symbols has corresponding method

    • Call the parsing methods to construct the content of the tree

    View full-size slide

  34. Parser implementation
    class_decl ::=


    class module_name

    class_member*

    end
    • Each non-terminal symbols has corresponding method

    • Call the parsing methods to construct the content of the tree

    View full-size slide

  35. Parser implementation
    class_decl ::=


    class module_name

    class_member*

    end
    • Each non-terminal symbols has corresponding method

    • Call the parsing methods to construct the content of the tree

    View full-size slide

  36. Parser implementation
    class_decl ::=


    class module_name

    class_member*

    end
    • Each non-terminal symbols has corresponding method

    • Call the parsing methods to construct the content of the tree

    View full-size slide

  37. Parser implementation
    class_decl ::=


    class module_name

    class_member*

    end
    • Each non-terminal symbols has corresponding method

    • Call the parsing methods to construct the content of the tree

    View full-size slide

  38. Parser implementation
    class_decl ::=


    class module_name

    class_member*

    end
    • Each non-terminal symbols has corresponding method

    • Call the parsing methods to construct the content of the tree

    View full-size slide

  39. • Alternation is implemented with case analysis on the
    fi
    rst token of the
    input
    class_member ::= class_decl


    | method_definition


    | attr_reader


    | ...


    View full-size slide

  40. • Alternation is implemented with case analysis on the
    fi
    rst token of the
    input
    class_member ::= class_decl


    | method_definition


    | attr_reader


    | ...


    View full-size slide

  41. • Alternation is implemented with case analysis on the
    fi
    rst token of the
    input
    class_member ::= class_decl


    | method_definition


    | attr_reader


    | ...


    View full-size slide

  42. • Alternation is implemented with case analysis on the
    fi
    rst token of the
    input
    class_member ::= class_decl


    | method_definition


    | attr_reader


    | ...


    View full-size slide

  43. Parsing result
    class_decl ::= class module_name class_member* end


    method_definition ::= def method_name : method_type

    View full-size slide

  44. Parsing result
    class_decl ::= class module_name class_member* end


    method_definition ::= def method_name : method_type

    View full-size slide

  45. Parsing result
    class_decl ::= class module_name class_member* end


    method_definition ::= def method_name : method_type

    View full-size slide

  46. Parsing error
    • We can
    fi
    nd some structure from the input, even it has a syntax error

    • There is a class declaration

    • There is a method de
    fi
    nition

    • Non tolerant parser tells you nothing

    View full-size slide

  47. method_definition ::= def method_name : method_type

    View full-size slide

  48. method_definition ::= def method_name : method_type

    View full-size slide

  49. method_definition ::= def method_name : method_type

    View full-size slide

  50. Introduce MissingTree

    View full-size slide

  51. method_definition ::= def method_name : method_type

    View full-size slide

  52. method_definition ::= def method_name : method_type

    View full-size slide

  53. Error tolerant parser (1)
    • Inserts MissingTree instead of raising errors
    😃

    View full-size slide

  54. #initialize de
    fi
    nition disappeared
    🤔

    View full-size slide

  55. → MissingTree

    View full-size slide

  56. → MissingTree
    → MissingTree

    View full-size slide

  57. → MissingTree
    → MissingTree
    → MissingTree

    View full-size slide

  58. → MissingTree
    → MissingTree
    → MissingTree
    The -> token stays at the beginning of the input

    View full-size slide

  59. Skip tokens
    • One token blocks further parsing when no rule handles the token

    • 💡 Skip that tokens to continue parsing

    View full-size slide

  60. • Tokens that may be consumed by the parsing methods are:

    • Possible
    fi
    rst tokens of type (UIDENT, void, untyped, ...)

    • class, attr_reader, and def for next class_member

    • end for closing the class declaration

    • class for next class declaration
    attr_reader ::= attr_reader attribute_name : type

    View full-size slide

  61. Implementation
    • Skips tokens that cannot be consumed in the rule before processing every
    rule

    • (And calculate the consumable tokens set)

    View full-size slide

  62. Error tolerant parser (2)
    • Inserts MissingTree instead of raising errors

    • Skip tokens that cannot be consumed with other possible rules
    😃
    This is well-known error recovery strategy for top-down parsers. 

    (https://github.com/microsoft/tolerant-php-parser)

    View full-size slide

  63. Nested declaration
    • Inner class declaration eats the following method de
    fi
    nition

    • Conference#initialize disappears and unexpected type error will
    be detected

    • Better error recovery is to close the Talk de
    fi
    nition immediately

    View full-size slide

  64. What was happening?
    class Conference


    def initialize: (String, Integer) -> void


    end

    View full-size slide

  65. What was happening?
    class Conference


    def initialize: (String, Integer) -> void


    end
    class Conference


    class Talk


    def initialize: (String, Integer) -> void


    end

    View full-size slide

  66. What was happening?
    class Conference


    def initialize: (String, Integer) -> void


    end
    class Conference


    class Talk


    def initialize: (String, Integer) -> void


    end
    class Conference


    class Talk


    end


    def initialize: (String, Integer) -> void


    end

    View full-size slide

  67. What was happening?
    class Conference


    def initialize: (String, Integer) -> void


    end
    class Conference


    class Talk


    def initialize: (String, Integer) -> void


    end
    class Conference


    class Talk


    end


    def initialize: (String, Integer) -> void


    end
    class Conference




    View full-size slide

  68. Key ideas
    • Let parser use the changes made on input since the last successful
    parsing result

    • Avoid moving existing elements into new trees

    View full-size slide

  69. Key ideas
    • Let parser use the changes made on input since the last successful
    parsing result

    • Avoid moving existing elements into new trees
    😵 😁

    View full-size slide

  70. Change based error recovery
    • Identify which tokens are changed since the last successful parsing

    • Closes the declaration at the end of change
    Inserted tokens
    Close the declaration

    View full-size slide

  71. class Conference def initialize : ...

    View full-size slide

  72. class Conference def initialize : ...
    Text inserted
    class Talk

    View full-size slide

  73. class Conference def initialize : ...
    class Conference class Talk def initialize : ...
    Changed tokens
    Text inserted
    class Talk

    View full-size slide

  74. class Conference def initialize : ...
    class Conference class Talk def initialize : ...
    Changed tokens
    Text inserted
    class Talk
    class Conference class Talk [EOC] def initialize : ...
    Inserts a marker token

    View full-size slide

  75. class Conference class Talk [EOC] def initialize : ...

    View full-size slide

  76. class Conference class Talk [EOC] def initialize : ...

    View full-size slide

  77. class Conference class Talk [EOC] def initialize : ...

    View full-size slide

  78. class Conference class Talk [EOC] def initialize : ...

    View full-size slide

  79. class Conference class Talk [EOC] def initialize : ...

    View full-size slide

  80. class Conference class Talk [EOC] def initialize : ...

    View full-size slide

  81. class Conference class Talk [EOC] def initialize : ...

    View full-size slide

  82. class Conference class Talk [EOC] def initialize : ...

    View full-size slide

  83. class Conference class Talk [EOC] def initialize : ...

    View full-size slide

  84. class Conference class Talk [EOC] def initialize : ...

    View full-size slide

  85. class Conference class Talk [EOC] def initialize : ...

    View full-size slide

  86. class Conference class Talk [EOC] def initialize : ...

    View full-size slide

  87. Change based error recovery
    • The error recovery runs only after normal parsing fails to keep successful
    results identical to the results of original parser

    View full-size slide

  88. Change based error recovery
    • 👍 Minimal grammar modi
    fi
    cation

    • 👍 Token based change detection

    • No tree di
    ff
    calculation required

    • Changed tokens are easily detected by LSP edit noti
    fi
    cations

    • 😵 Unsupported text editing patterns may result in confusing errors

    View full-size slide

  89. Error tolerant parser (3)
    • Inserts MissingTree instead of raising errors

    • Skip tokens that cannot be consumed with other possible rules

    • Avoid moving existing elements if parsing fails
    😃

    View full-size slide

  90. Open problems
    • Translating the concrete syntax tree to AST

    • AST de
    fi
    nes a successful parsing result
    Attribute declarations must have names and types

    View full-size slide

  91. Summary
    • Planning to replace RBS parser for better development experience

    • Making a top-down parser error tolerant

    • Generates parsing tree even with syntax errors

    • Change based error recovery

    View full-size slide

  92. • Trial 1: Based on ML's type inference (2007)

    • Trial 2: Based on control
    fl
    ow analysis (2009)

    • (Break until Oedo RubyKaigi 2017)

    • Trial 3: Steep -- introducing type declarations
    My 15 years for type checking Ruby programs

    View full-size slide

  93. • Trial 1: Based on ML's type inference (2007)

    • Trial 2: Based on control
    fl
    ow analysis (2009)

    • (Break until Oedo RubyKaigi 2017)

    • Trial 3: Steep -- introducing type declarations
    My 15 years for type checking Ruby programs

    View full-size slide

  94. • Trial 1: Based on ML's type inference (2007)

    • Trial 2: Based on control
    fl
    ow analysis (2009)

    • (Break until Oedo RubyKaigi 2017)

    • Trial 3: Steep -- introducing type declarations
    My 15 years for type checking Ruby programs

    View full-size slide

  95. It was called .rbi

    View full-size slide

  96. No class
    It was called .rbi

    View full-size slide

  97. Dedicated syntax for types in Ruby
    No class
    It was called .rbi

    View full-size slide

  98. Ruby with Steep is the best Ruby programming
    experience to me ⭐

    View full-size slide

  99. • @soutaro on GitHub/Twitter

    • @[email protected]

    [email protected]

    View full-size slide