Slide 1

Slide 1 text

Parsing RBS দຊ फଠ࿠ (Soutaro Matsumoto)

Slide 2

Slide 2 text

Parsing RBS দຊ फଠ࿠ (Soutaro Matsumoto) Matsumoto is here

Slide 3

Slide 3 text

Parsing RBS দຊ फଠ࿠ (Soutaro Matsumoto) Matsumoto is here Where is Soutaro?

Slide 4

Slide 4 text

Where is Soutaro?

Slide 5

Slide 5 text

Tokyo Matsumoto

Slide 6

Slide 6 text

Tokyo Matsumoto Soutaro

Slide 7

Slide 7 text

Tokyo Matsumoto Soutaro Soutaro

Slide 8

Slide 8 text

Soutaro station (Forget the transliteration variations in photos)

Slide 9

Slide 9 text

Soutaro station Soutaro timetable (Forget the transliteration variations in photos)

Slide 10

Slide 10 text

Soutaro station Soutaro timetable Soutaro bus stop (Forget the transliteration variations in photos)

Slide 11

Slide 11 text

Soutaro station Soutaro timetable Soutaro bus stop Soutaro cedar trees (Forget the transliteration variations in photos)

Slide 12

Slide 12 text

Soutaro station Soutaro timetable Soutaro bus stop Soutaro cedar trees Soutaro park (Forget the transliteration variations in photos)

Slide 13

Slide 13 text

Parsing RBS Soutaro Matsumoto

Slide 14

Slide 14 text

Recent updates on Steep/RBS • RBS 3.1 • Steep 1.4

Slide 15

Slide 15 text

New syntaxes in RBS 3.0 Class/module alias syntax Use syntax (Import in Java/C# for RBS) (RBS) (Ruby)

Slide 16

Slide 16 text

Steep 1.4 • RBS 3.0 support • Signature help • Better completion in RBS

Slide 17

Slide 17 text

Signature help • A method signature list pops up on method calls to help developers typing arguments

Slide 18

Slide 18 text

Signature help • A method signature list pops up on method calls to help developers typing arguments

Slide 19

Slide 19 text

Better type name completion • Typing chan resolves to Parseg::TokenFactory::change • It inserts shorter names based on the current module nesting context

Slide 20

Slide 20 text

Better type name completion • Typing chan resolves to Parseg::TokenFactory::change • It inserts shorter names based on the current module nesting context

Slide 21

Slide 21 text

No content

Slide 22

Slide 22 text

No content

Slide 23

Slide 23 text

Why two di ff erent type names here? 🤔

Slide 24

Slide 24 text

• When parameter type is being typed, it has syntax error and the module nesting context is lost → Absolute type name is inserted 🤷 module Parseg module ParsingSession def intersect?: (Parseg::TokenFactory::change) end end module Parseg module ParsingSession def intersect?: (c) end end

Slide 25

Slide 25 text

• When return type is being typed, it's valid syntax → Relative type name is inserted 🙆 module Parseg module ParsingSession def intersect?: ... -> TokenFactory::change end end module Parseg module ParsingSession def intersect?: ... -> c end end

Slide 26

Slide 26 text

Parsing broken RBS matters • The inconsistency is caused by parsing errors • We need a parser that continue working even with syntax errors to provide advanced IDE features

Slide 27

Slide 27 text

No content

Slide 28

Slide 28 text

No content

Slide 29

Slide 29 text

1. Demo 2. Top-down parser outline 3. Error recovery (1) 4. Error recovery (2) 5. Error recovery (3) You will be able to write a top-down parser with error recovery. 💪

Slide 30

Slide 30 text

Error tolerant parser generator • Generates top-down parser with error recovery • Grammar de fi nition in Ruby DSL • (Doesn't generate any parser code yet 😜) https://github.com/soutaro/parseg

Slide 31

Slide 31 text

Grammar De fi nition class_decl ::= class module_name 
 class_member* 
 end module_name ::= UIDENT class_member ::= class_decl | method_definition | attr_reader | ...

Slide 32

Slide 32 text

Grammar De fi nition class_decl ::= class module_name 
 class_member* 
 end module_name ::= UIDENT class_member ::= class_decl | method_definition | attr_reader | ...

Slide 33

Slide 33 text

Grammar De fi nition class_decl ::= class module_name 
 class_member* 
 end module_name ::= UIDENT class_member ::= class_decl | method_definition | attr_reader | ...

Slide 34

Slide 34 text

Grammar De fi nition class_decl ::= class module_name 
 class_member* 
 end module_name ::= UIDENT class_member ::= class_decl | method_definition | attr_reader | ...

Slide 35

Slide 35 text

Grammar De fi nition class_decl ::= class module_name 
 class_member* 
 end module_name ::= UIDENT class_member ::= class_decl | method_definition | attr_reader | ...

Slide 36

Slide 36 text

Grammar De fi nition class_decl ::= class module_name 
 class_member* 
 end module_name ::= UIDENT class_member ::= class_decl | method_definition | attr_reader | ...

Slide 37

Slide 37 text

Output

Slide 38

Slide 38 text

Output

Slide 39

Slide 39 text

Output

Slide 40

Slide 40 text

Output

Slide 41

Slide 41 text

Output

Slide 42

Slide 42 text

Output

Slide 43

Slide 43 text

Output

Slide 44

Slide 44 text

Parser implementation class_decl ::= class module_name 
 class_member* 
 end • Each non-terminal symbols has corresponding method • Call the parsing methods to construct the content of the tree

Slide 45

Slide 45 text

Parser implementation class_decl ::= class module_name 
 class_member* 
 end • Each non-terminal symbols has corresponding method • Call the parsing methods to construct the content of the tree

Slide 46

Slide 46 text

Parser implementation class_decl ::= class module_name 
 class_member* 
 end • Each non-terminal symbols has corresponding method • Call the parsing methods to construct the content of the tree

Slide 47

Slide 47 text

Parser implementation class_decl ::= class module_name 
 class_member* 
 end • Each non-terminal symbols has corresponding method • Call the parsing methods to construct the content of the tree

Slide 48

Slide 48 text

Parser implementation class_decl ::= class module_name 
 class_member* 
 end • Each non-terminal symbols has corresponding method • Call the parsing methods to construct the content of the tree

Slide 49

Slide 49 text

Parser implementation class_decl ::= class module_name 
 class_member* 
 end • Each non-terminal symbols has corresponding method • Call the parsing methods to construct the content of the tree

Slide 50

Slide 50 text

• Alternation is implemented with case analysis on the fi rst token of the input class_member ::= class_decl | method_definition | attr_reader | ...

Slide 51

Slide 51 text

• Alternation is implemented with case analysis on the fi rst token of the input class_member ::= class_decl | method_definition | attr_reader | ...

Slide 52

Slide 52 text

• Alternation is implemented with case analysis on the fi rst token of the input class_member ::= class_decl | method_definition | attr_reader | ...

Slide 53

Slide 53 text

• Alternation is implemented with case analysis on the fi rst token of the input class_member ::= class_decl | method_definition | attr_reader | ...

Slide 54

Slide 54 text

Parsing result class_decl ::= class module_name class_member* end method_definition ::= def method_name : method_type

Slide 55

Slide 55 text

Parsing result class_decl ::= class module_name class_member* end method_definition ::= def method_name : method_type

Slide 56

Slide 56 text

Parsing result class_decl ::= class module_name class_member* end method_definition ::= def method_name : method_type

Slide 57

Slide 57 text

No content

Slide 58

Slide 58 text

No content

Slide 59

Slide 59 text

No content

Slide 60

Slide 60 text

No content

Slide 61

Slide 61 text

Parsing error • We can fi nd some structure from the input, even it has a syntax error • There is a class declaration • There is a method de fi nition • Non tolerant parser tells you nothing

Slide 62

Slide 62 text

method_definition ::= def method_name : method_type

Slide 63

Slide 63 text

method_definition ::= def method_name : method_type

Slide 64

Slide 64 text

method_definition ::= def method_name : method_type

Slide 65

Slide 65 text

Introduce MissingTree

Slide 66

Slide 66 text

method_definition ::= def method_name : method_type

Slide 67

Slide 67 text

method_definition ::= def method_name : method_type

Slide 68

Slide 68 text

No content

Slide 69

Slide 69 text

No content

Slide 70

Slide 70 text

No content

Slide 71

Slide 71 text

Error tolerant parser (1) • Inserts MissingTree instead of raising errors 😃

Slide 72

Slide 72 text

#initialize de fi nition disappeared 🤔

Slide 73

Slide 73 text

→ MissingTree

Slide 74

Slide 74 text

→ MissingTree → MissingTree

Slide 75

Slide 75 text

→ MissingTree → MissingTree → MissingTree

Slide 76

Slide 76 text

→ MissingTree → MissingTree → MissingTree The -> token stays at the beginning of the input

Slide 77

Slide 77 text

Skip tokens • One token blocks further parsing when no rule handles the token • 💡 Skip that tokens to continue parsing

Slide 78

Slide 78 text

• Tokens that may be consumed by the parsing methods are: • Possible fi rst tokens of type (UIDENT, void, untyped, ...) • class, attr_reader, and def for next class_member • end for closing the class declaration • class for next class declaration attr_reader ::= attr_reader attribute_name : type

Slide 79

Slide 79 text

Implementation • Skips tokens that cannot be consumed in the rule before processing every rule • (And calculate the consumable tokens set)

Slide 80

Slide 80 text

No content

Slide 81

Slide 81 text

Error tolerant parser (2) • Inserts MissingTree instead of raising errors • Skip tokens that cannot be consumed with other possible rules 😃 This is well-known error recovery strategy for top-down parsers. 
 (https://github.com/microsoft/tolerant-php-parser)

Slide 82

Slide 82 text

🤔

Slide 83

Slide 83 text

No content

Slide 84

Slide 84 text

No content

Slide 85

Slide 85 text

No content

Slide 86

Slide 86 text

No content

Slide 87

Slide 87 text

Nested declaration • Inner class declaration eats the following method de fi nition • Conference#initialize disappears and unexpected type error will be detected • Better error recovery is to close the Talk de fi nition immediately

Slide 88

Slide 88 text

What was happening? class Conference def initialize: (String, Integer) -> void end

Slide 89

Slide 89 text

What was happening? class Conference def initialize: (String, Integer) -> void end class Conference class Talk def initialize: (String, Integer) -> void end

Slide 90

Slide 90 text

What was happening? class Conference def initialize: (String, Integer) -> void end class Conference class Talk def initialize: (String, Integer) -> void end class Conference class Talk end def initialize: (String, Integer) -> void end

Slide 91

Slide 91 text

What was happening? class Conference def initialize: (String, Integer) -> void end class Conference class Talk def initialize: (String, Integer) -> void end class Conference class Talk end def initialize: (String, Integer) -> void end class Conference

Slide 92

Slide 92 text

Key ideas • Let parser use the changes made on input since the last successful parsing result • Avoid moving existing elements into new trees

Slide 93

Slide 93 text

Key ideas • Let parser use the changes made on input since the last successful parsing result • Avoid moving existing elements into new trees 😵 😁

Slide 94

Slide 94 text

Change based error recovery • Identify which tokens are changed since the last successful parsing • Closes the declaration at the end of change Inserted tokens Close the declaration

Slide 95

Slide 95 text

class Conference def initialize : ...

Slide 96

Slide 96 text

class Conference def initialize : ... Text inserted class Talk

Slide 97

Slide 97 text

class Conference def initialize : ... class Conference class Talk def initialize : ... Changed tokens Text inserted class Talk

Slide 98

Slide 98 text

class Conference def initialize : ... class Conference class Talk def initialize : ... Changed tokens Text inserted class Talk class Conference class Talk [EOC] def initialize : ... Inserts a marker token

Slide 99

Slide 99 text

class Conference class Talk [EOC] def initialize : ...

Slide 100

Slide 100 text

class Conference class Talk [EOC] def initialize : ...

Slide 101

Slide 101 text

class Conference class Talk [EOC] def initialize : ...

Slide 102

Slide 102 text

class Conference class Talk [EOC] def initialize : ...

Slide 103

Slide 103 text

class Conference class Talk [EOC] def initialize : ...

Slide 104

Slide 104 text

class Conference class Talk [EOC] def initialize : ...

Slide 105

Slide 105 text

class Conference class Talk [EOC] def initialize : ...

Slide 106

Slide 106 text

class Conference class Talk [EOC] def initialize : ...

Slide 107

Slide 107 text

class Conference class Talk [EOC] def initialize : ...

Slide 108

Slide 108 text

class Conference class Talk [EOC] def initialize : ...

Slide 109

Slide 109 text

class Conference class Talk [EOC] def initialize : ...

Slide 110

Slide 110 text

class Conference class Talk [EOC] def initialize : ...

Slide 111

Slide 111 text

Change based error recovery • The error recovery runs only after normal parsing fails to keep successful results identical to the results of original parser

Slide 112

Slide 112 text

No content

Slide 113

Slide 113 text

No content

Slide 114

Slide 114 text

Change based error recovery • 👍 Minimal grammar modi fi cation • 👍 Token based change detection • No tree di ff calculation required • Changed tokens are easily detected by LSP edit noti fi cations • 😵 Unsupported text editing patterns may result in confusing errors

Slide 115

Slide 115 text

Error tolerant parser (3) • Inserts MissingTree instead of raising errors • Skip tokens that cannot be consumed with other possible rules • Avoid moving existing elements if parsing fails 😃

Slide 116

Slide 116 text

🎉

Slide 117

Slide 117 text

Open problems • Translating the concrete syntax tree to AST • AST de fi nes a successful parsing result Attribute declarations must have names and types

Slide 118

Slide 118 text

Summary • Planning to replace RBS parser for better development experience • Making a top-down parser error tolerant • Generates parsing tree even with syntax errors • Change based error recovery

Slide 119

Slide 119 text

No content

Slide 120

Slide 120 text

• Trial 1: Based on ML's type inference (2007) • Trial 2: Based on control fl ow analysis (2009) • (Break until Oedo RubyKaigi 2017) • Trial 3: Steep -- introducing type declarations My 15 years for type checking Ruby programs

Slide 121

Slide 121 text

• Trial 1: Based on ML's type inference (2007) • Trial 2: Based on control fl ow analysis (2009) • (Break until Oedo RubyKaigi 2017) • Trial 3: Steep -- introducing type declarations My 15 years for type checking Ruby programs

Slide 122

Slide 122 text

• Trial 1: Based on ML's type inference (2007) • Trial 2: Based on control fl ow analysis (2009) • (Break until Oedo RubyKaigi 2017) • Trial 3: Steep -- introducing type declarations My 15 years for type checking Ruby programs

Slide 123

Slide 123 text

No content

Slide 124

Slide 124 text

It was called .rbi

Slide 125

Slide 125 text

No class It was called .rbi

Slide 126

Slide 126 text

Dedicated syntax for types in Ruby No class It was called .rbi

Slide 127

Slide 127 text

Ruby with Steep is the best Ruby programming experience to me ⭐

Slide 128

Slide 128 text

No content

Slide 129

Slide 129 text

• @soutaro on GitHub/Twitter • @[email protected][email protected]