Slide 1

Slide 1 text

Extended Context-Free Grammars Parsing with Generalized LL Author: Artem Gorokhov Saint Petersburg University Programming Languages and Tools Lab, JetBrains March 4,2017 Artem Gorokhov (SPbU) March 4,2017 1 / 15

Slide 2

Slide 2 text

Artem Gorokhov (SPbU) March 4,2017 2 / 15

Slide 3

Slide 3 text

Extended Context-Free Grammar S = a M* M = a? (B K)+ | u B B = c | Artem Gorokhov (SPbU) March 4,2017 3 / 15

Slide 4

Slide 4 text

=⇒ Artem Gorokhov (SPbU) March 4,2017 4 / 15

Slide 5

Slide 5 text

Artem Gorokhov (SPbU) March 4,2017 5 / 15

Slide 6

Slide 6 text

Artem Gorokhov (SPbU) March 4,2017 5 / 15

Slide 7

Slide 7 text

Existing solutions ANTLR, Yacc, Bison Artem Gorokhov (SPbU) March 4,2017 6 / 15

Slide 8

Slide 8 text

Existing solutions ANTLR, Yacc, Bison Can’t use ECFG without transformation Admit only subclass of Context-Free languages (LL(k), LR(k)) Artem Gorokhov (SPbU) March 4,2017 6 / 15

Slide 9

Slide 9 text

Existing solutions ANTLR, Yacc, Bison Can’t use ECFG without transformation Admit only subclass of Context-Free languages (LL(k), LR(k)) Some research on ECFG parsing Artem Gorokhov (SPbU) March 4,2017 6 / 15

Slide 10

Slide 10 text

Existing solutions ANTLR, Yacc, Bison Can’t use ECFG without transformation Admit only subclass of Context-Free languages (LL(k), LR(k)) Some research on ECFG parsing No tools LL(k), LR(k) Artem Gorokhov (SPbU) March 4,2017 6 / 15

Slide 11

Slide 11 text

Existing solutions ANTLR, Yacc, Bison Can’t use ECFG without transformation Admit only subclass of Context-Free languages (LL(k), LR(k)) Some research on ECFG parsing No tools LL(k), LR(k) Generalized LL Artem Gorokhov (SPbU) March 4,2017 6 / 15

Slide 12

Slide 12 text

Existing solutions ANTLR, Yacc, Bison Can’t use ECFG without transformation Admit only subclass of Context-Free languages (LL(k), LR(k)) Some research on ECFG parsing No tools LL(k), LR(k) Generalized LL Admit arbitrary CFG (including ambiguous) Can’t use ECFG without transformation Artem Gorokhov (SPbU) March 4,2017 6 / 15

Slide 13

Slide 13 text

Existing solutions ANTLR, Yacc, Bison Can’t use ECFG without transformation Admit only subclass of Context-Free languages (LL(k), LR(k)) Some research on ECFG parsing No tools LL(k), LR(k) Generalized LL Admit arbitrary CFG (including ambiguous) Can’t use ECFG without transformation Artem Gorokhov (SPbU) March 4,2017 6 / 15

Slide 14

Slide 14 text

Automata and ECFGs Grammar G 0 S = a*S b? | c =⇒ RA for grammar G 0 c S S a b ε ε Artem Gorokhov (SPbU) March 4,2017 7 / 15

Slide 15

Slide 15 text

Recursive Automata Minimization Grammar G 1 S = K K K K K K |K a K K K K K = S K | a K | a Automaton for G 1 K a K K K K S K K K K K a K K K S a Minimized automaton for G 1 a S a K K S K K K K K K Artem Gorokhov (SPbU) March 4,2017 8 / 15

Slide 16

Slide 16 text

Derivation Trees for Recursive Automata Input: aacb Automaton: c S S a b a Derivation trees: S,0,4 b,3,4 a,0,1 a,1,2 c,2,3 S,2,3 S,0,4 b,3,4 a,0,1 a,1,2 c,2,3 S,2,3 S,1,3 S,0,4 b,3,4 a,0,1 a,1,2 c,2,3 S,1,4 S,2,3 Artem Gorokhov (SPbU) March 4,2017 9 / 15

Slide 17

Slide 17 text

SPPF for Recursive Automata Input: aacb Automaton: c S S a b a Shared Packed Parse Forest: S,0,4 b,3,4 a,0,1 a,1,2 3,1,3 c,2,3 S,1,4 S,2,3 3,0,3 S,1,3 2,0,2 Artem Gorokhov (SPbU) March 4,2017 10 / 15

Slide 18

Slide 18 text

SPPF for Recursive Automata Input: aacb Automaton: c S S a b a Shared Packed Parse Forest: S,0,4 b,3,4 a,0,1 a,1,2 3,1,3 c,2,3 S,1,4 S,2,3 3,0,3 S,1,3 2,0,2 Artem Gorokhov (SPbU) March 4,2017 10 / 15

Slide 19

Slide 19 text

SPPF for Recursive Automata Input: aacb Automaton: c S S a b a Shared Packed Parse Forest: S,0,4 b,3,4 a,0,1 a,1,2 3,1,3 c,2,3 S,1,4 S,2,3 3,0,3 S,1,3 2,0,2 Artem Gorokhov (SPbU) March 4,2017 10 / 15

Slide 20

Slide 20 text

SPPF for Recursive Automata Input: aacb Automaton: c S S a b a Shared Packed Parse Forest: S,0,4 b,3,4 a,0,1 a,1,2 3,1,3 c,2,3 S,1,4 S,2,3 3,0,3 S,1,3 2,0,2 Artem Gorokhov (SPbU) March 4,2017 10 / 15

Slide 21

Slide 21 text

Input processing Descriptors queue Descriptor (G, i, U, T) uniquely defines parsing process state G - position in grammar i - position in input U - stack node T - current parse forest root Artem Gorokhov (SPbU) March 4,2017 11 / 15

Slide 22

Slide 22 text

Input processing Descriptors queue Descriptor (G, i, U, T) uniquely defines parsing process state G - position in grammar state of RA i - position in input U - stack node T - current parse forest root Artem Gorokhov (SPbU) March 4,2017 11 / 15

Slide 23

Slide 23 text

Input processing Input : bc Grammar: S = (a | b | S) c? Artem Gorokhov (SPbU) March 4,2017 12 / 15

Slide 24

Slide 24 text

Input processing Input : bc Grammar: S = a C_opt | b C_opt | S C_opt C_opt = | c Artem Gorokhov (SPbU) March 4,2017 12 / 15

Slide 25

Slide 25 text

Input processing Input : ∙ bc Grammar: S = ∙ a C_opt | b C_opt | S C_opt C_opt = | c Descriptors queue S = ∙ a C_opt, 0, . . . , . . . Artem Gorokhov (SPbU) March 4,2017 12 / 15

Slide 26

Slide 26 text

Input processing Input : ∙ bc Grammar: S = a C_opt | ∙ b C_opt | S C_opt C_opt = | c Descriptors queue S = ∙ b C_opt, 0, . . . , . . . S = ∙ a C_opt, 0, . . . , . . . Artem Gorokhov (SPbU) March 4,2017 12 / 15

Slide 27

Slide 27 text

Input processing Input : ∙ bc Grammar: S = a C_opt | b C_opt | ∙ S C_opt C_opt = | c Descriptors queue S = ∙ S C_opt, 0, . . . , . . . S = ∙ b C_opt, 0, . . . , . . . S = ∙ a C_opt, 0, . . . , . . . Artem Gorokhov (SPbU) March 4,2017 12 / 15

Slide 28

Slide 28 text

Input processing Input : ∙ bc Grammar: S = ∙ a C_opt | b C_opt | S C_opt C_opt = | c Descriptors queue S = ∙ S C_opt, 0, . . . , . . . S = ∙ b C_opt, 0, . . . , . . . S = ∙ a C_opt, 0, . . . , . . . Artem Gorokhov (SPbU) March 4,2017 12 / 15

Slide 29

Slide 29 text

Input processing Input : ∙ bc Grammar: S = a C_opt | ∙ b C_opt | S C_opt C_opt = | c Descriptors queue S = ∙ S C_opt, 0, . . . , . . . S = ∙ b C_opt, 0, . . . , . . . S = ∙ a C_opt, 0, . . . , . . . Artem Gorokhov (SPbU) March 4,2017 12 / 15

Slide 30

Slide 30 text

Input processing Input : b ∙ c Grammar: S = a C_opt | b ∙ C_opt | S C_opt C_opt = | c Descriptors queue S = ∙ S C_opt, 0, . . . , . . . S = ∙ b C_opt, 0, . . . , . . . S = ∙ a C_opt, 0, . . . , . . . b,0,1 Artem Gorokhov (SPbU) March 4,2017 12 / 15

Slide 31

Slide 31 text

Input processing Input : b ∙ c Grammar: S = a C_opt | b C_opt | S C_opt C_opt = ∙ | c Descriptors queue C_opt = ∙, 1, . . . , . . . S = ∙ S C_opt, 0, . . . , . . . S = ∙ b C_opt, 0, . . . , . . . S = ∙ a C_opt, 0, . . . , . . . Artem Gorokhov (SPbU) March 4,2017 12 / 15

Slide 32

Slide 32 text

Input processing Input : b ∙ c Grammar: S = a C_opt | b C_opt | S C_opt C_opt = | ∙ c Descriptors queue C_opt = ∙c, 1, . . . , . . . C_opt = ∙, 1, . . . , . . . S = ∙ S C_opt, 0, . . . , . . . S = ∙ b C_opt, 0, . . . , . . . S = ∙ a C_opt, 0, . . . , . . . Artem Gorokhov (SPbU) March 4,2017 12 / 15

Slide 33

Slide 33 text

Input processing Input : b ∙ c Grammar: S = a C_opt | b C_opt | S C_opt C_opt = | ∙ c Descriptors queue C_opt = ∙c, 1, . . . , . . . C_opt = ∙, 1, . . . , . . . S = ∙ S C_opt, 0, . . . , . . . S = ∙ b C_opt, 0, . . . , . . . S = ∙ a C_opt, 0, . . . , . . . Artem Gorokhov (SPbU) March 4,2017 12 / 15

Slide 34

Slide 34 text

Input processing Input : bc∙ Grammar: S = a C_opt | b C_opt | S C_opt C_opt = | c ∙ Descriptors queue C_opt = ∙c, 1, . . . , . . . C_opt = ∙, 1, . . . , . . . S = ∙ S C_opt, 0, . . . , . . . S = ∙ b C_opt, 0, . . . , . . . S = ∙ a C_opt, 0, . . . , . . . C_opt,1,2 c,1,2 Artem Gorokhov (SPbU) March 4,2017 12 / 15

Slide 35

Slide 35 text

Input processing Input : bc Automaton : c a b S S Artem Gorokhov (SPbU) March 4,2017 13 / 15

Slide 36

Slide 36 text

Input processing Input : ∙ bc Automaton : c a b S S Descriptors queue S, 0, . . . , . . . Artem Gorokhov (SPbU) March 4,2017 13 / 15

Slide 37

Slide 37 text

Input processing Input : ∙ bc Automaton : c a b S S b,0,1 S,0,1 b,0,1 Artem Gorokhov (SPbU) March 4,2017 13 / 15

Slide 38

Slide 38 text

Evaluation Grammar G 1 S = K K K K K K |K a K K K K K = S K | a K | a RA for grammar G 1 a S a K K S K K K K K K Experiment results for input a40 Memory usage Time,sec Descriptors Stack Edges SPPF Nodes Grammar 7,940 6,974 111,127,244 81 RA 5,830 4,234 74,292,078 54 Ratio 27% 39% 33 % 35 % Artem Gorokhov (SPbU) March 4,2017 14 / 15

Slide 39

Slide 39 text

Applicability Graph parsing: all input strings in one graph abcd abfd =⇒ b a c d f Graph parsing results Memory usage Time, min Descriptors Stack Edges Stack Nodes Grammar 21,134,080 7,482,789 2,731,529 02.26 RA 9,153,352 2,792,330 839,148 01.25 Ratio 57% 63% 69 % 45 % Artem Gorokhov (SPbU) March 4,2017 15 / 15