Upgrade to Pro — share decks privately, control downloads, hide ads and more …

TMPA-2017: Extended Context-Free Grammars Parsing with Generalized LL

TMPA-2017: Extended Context-Free Grammars Parsing with Generalized LL

TMPA-2017: Tools and Methods of Program Analysis
3-4 March, 2017, Hotel Holiday Inn Moscow Vinogradovo, Moscow

Extended Context-Free Grammars Parsing with Generalized LL
Semyon Grigorev, Artem Gorokhov, Saint Petersburg State University

For video follow the link: https://youtu.be/hImdO2WgW4U
Would like to know more?
Visit our website:
www.tmpaconf.org
www.exactprosystems.com/events/tmpa

Follow us:
https://www.linkedin.com/company/exactpro-systems-llc?trk=biz-companies-cym
https://twitter.com/exactpro

Exactpro

March 23, 2017
Tweet

More Decks by Exactpro

Other Decks in Technology

Transcript

  1. Extended Context-Free Grammars Parsing with Generalized LL Author: Artem Gorokhov

    Saint Petersburg University Programming Languages and Tools Lab, JetBrains March 4,2017 Artem Gorokhov (SPbU) March 4,2017 1 / 15
  2. Extended Context-Free Grammar S = a M* M = a?

    (B K)+ | u B B = c | Artem Gorokhov (SPbU) March 4,2017 3 / 15
  3. Existing solutions ANTLR, Yacc, Bison Can’t use ECFG without transformation

    Admit only subclass of Context-Free languages (LL(k), LR(k)) Artem Gorokhov (SPbU) March 4,2017 6 / 15
  4. Existing solutions ANTLR, Yacc, Bison Can’t use ECFG without transformation

    Admit only subclass of Context-Free languages (LL(k), LR(k)) Some research on ECFG parsing Artem Gorokhov (SPbU) March 4,2017 6 / 15
  5. Existing solutions ANTLR, Yacc, Bison Can’t use ECFG without transformation

    Admit only subclass of Context-Free languages (LL(k), LR(k)) Some research on ECFG parsing No tools LL(k), LR(k) Artem Gorokhov (SPbU) March 4,2017 6 / 15
  6. Existing solutions ANTLR, Yacc, Bison Can’t use ECFG without transformation

    Admit only subclass of Context-Free languages (LL(k), LR(k)) Some research on ECFG parsing No tools LL(k), LR(k) Generalized LL Artem Gorokhov (SPbU) March 4,2017 6 / 15
  7. Existing solutions ANTLR, Yacc, Bison Can’t use ECFG without transformation

    Admit only subclass of Context-Free languages (LL(k), LR(k)) Some research on ECFG parsing No tools LL(k), LR(k) Generalized LL Admit arbitrary CFG (including ambiguous) Can’t use ECFG without transformation Artem Gorokhov (SPbU) March 4,2017 6 / 15
  8. Existing solutions ANTLR, Yacc, Bison Can’t use ECFG without transformation

    Admit only subclass of Context-Free languages (LL(k), LR(k)) Some research on ECFG parsing No tools LL(k), LR(k) Generalized LL Admit arbitrary CFG (including ambiguous) Can’t use ECFG without transformation Artem Gorokhov (SPbU) March 4,2017 6 / 15
  9. Automata and ECFGs Grammar G 0 S = a*S b?

    | c =⇒ RA for grammar G 0 c S S a b ε ε Artem Gorokhov (SPbU) March 4,2017 7 / 15
  10. Recursive Automata Minimization Grammar G 1 S = K K

    K K K K |K a K K K K K = S K | a K | a Automaton for G 1 K a K K K K S K K K K K a K K K S a Minimized automaton for G 1 a S a K K S K K K K K K Artem Gorokhov (SPbU) March 4,2017 8 / 15
  11. Derivation Trees for Recursive Automata Input: aacb Automaton: c S

    S a b a Derivation trees: S,0,4 b,3,4 a,0,1 a,1,2 c,2,3 S,2,3 S,0,4 b,3,4 a,0,1 a,1,2 c,2,3 S,2,3 S,1,3 S,0,4 b,3,4 a,0,1 a,1,2 c,2,3 S,1,4 S,2,3 Artem Gorokhov (SPbU) March 4,2017 9 / 15
  12. SPPF for Recursive Automata Input: aacb Automaton: c S S

    a b a Shared Packed Parse Forest: S,0,4 b,3,4 a,0,1 a,1,2 3,1,3 c,2,3 S,1,4 S,2,3 3,0,3 S,1,3 2,0,2 Artem Gorokhov (SPbU) March 4,2017 10 / 15
  13. SPPF for Recursive Automata Input: aacb Automaton: c S S

    a b a Shared Packed Parse Forest: S,0,4 b,3,4 a,0,1 a,1,2 3,1,3 c,2,3 S,1,4 S,2,3 3,0,3 S,1,3 2,0,2 Artem Gorokhov (SPbU) March 4,2017 10 / 15
  14. SPPF for Recursive Automata Input: aacb Automaton: c S S

    a b a Shared Packed Parse Forest: S,0,4 b,3,4 a,0,1 a,1,2 3,1,3 c,2,3 S,1,4 S,2,3 3,0,3 S,1,3 2,0,2 Artem Gorokhov (SPbU) March 4,2017 10 / 15
  15. SPPF for Recursive Automata Input: aacb Automaton: c S S

    a b a Shared Packed Parse Forest: S,0,4 b,3,4 a,0,1 a,1,2 3,1,3 c,2,3 S,1,4 S,2,3 3,0,3 S,1,3 2,0,2 Artem Gorokhov (SPbU) March 4,2017 10 / 15
  16. Input processing Descriptors queue Descriptor (G, i, U, T) uniquely

    defines parsing process state G - position in grammar i - position in input U - stack node T - current parse forest root Artem Gorokhov (SPbU) March 4,2017 11 / 15
  17. Input processing Descriptors queue Descriptor (G, i, U, T) uniquely

    defines parsing process state G - position in grammar state of RA i - position in input U - stack node T - current parse forest root Artem Gorokhov (SPbU) March 4,2017 11 / 15
  18. Input processing Input : bc Grammar: S = (a |

    b | S) c? Artem Gorokhov (SPbU) March 4,2017 12 / 15
  19. Input processing Input : bc Grammar: S = a C_opt

    | b C_opt | S C_opt C_opt = | c Artem Gorokhov (SPbU) March 4,2017 12 / 15
  20. Input processing Input : ∙ bc Grammar: S = ∙

    a C_opt | b C_opt | S C_opt C_opt = | c Descriptors queue S = ∙ a C_opt, 0, . . . , . . . Artem Gorokhov (SPbU) March 4,2017 12 / 15
  21. Input processing Input : ∙ bc Grammar: S = a

    C_opt | ∙ b C_opt | S C_opt C_opt = | c Descriptors queue S = ∙ b C_opt, 0, . . . , . . . S = ∙ a C_opt, 0, . . . , . . . Artem Gorokhov (SPbU) March 4,2017 12 / 15
  22. Input processing Input : ∙ bc Grammar: S = a

    C_opt | b C_opt | ∙ S C_opt C_opt = | c Descriptors queue S = ∙ S C_opt, 0, . . . , . . . S = ∙ b C_opt, 0, . . . , . . . S = ∙ a C_opt, 0, . . . , . . . Artem Gorokhov (SPbU) March 4,2017 12 / 15
  23. Input processing Input : ∙ bc Grammar: S = ∙

    a C_opt | b C_opt | S C_opt C_opt = | c Descriptors queue S = ∙ S C_opt, 0, . . . , . . . S = ∙ b C_opt, 0, . . . , . . . S = ∙ a C_opt, 0, . . . , . . . Artem Gorokhov (SPbU) March 4,2017 12 / 15
  24. Input processing Input : ∙ bc Grammar: S = a

    C_opt | ∙ b C_opt | S C_opt C_opt = | c Descriptors queue S = ∙ S C_opt, 0, . . . , . . . S = ∙ b C_opt, 0, . . . , . . . S = ∙ a C_opt, 0, . . . , . . . Artem Gorokhov (SPbU) March 4,2017 12 / 15
  25. Input processing Input : b ∙ c Grammar: S =

    a C_opt | b ∙ C_opt | S C_opt C_opt = | c Descriptors queue S = ∙ S C_opt, 0, . . . , . . . S = ∙ b C_opt, 0, . . . , . . . S = ∙ a C_opt, 0, . . . , . . . b,0,1 Artem Gorokhov (SPbU) March 4,2017 12 / 15
  26. Input processing Input : b ∙ c Grammar: S =

    a C_opt | b C_opt | S C_opt C_opt = ∙ | c Descriptors queue C_opt = ∙, 1, . . . , . . . S = ∙ S C_opt, 0, . . . , . . . S = ∙ b C_opt, 0, . . . , . . . S = ∙ a C_opt, 0, . . . , . . . Artem Gorokhov (SPbU) March 4,2017 12 / 15
  27. Input processing Input : b ∙ c Grammar: S =

    a C_opt | b C_opt | S C_opt C_opt = | ∙ c Descriptors queue C_opt = ∙c, 1, . . . , . . . C_opt = ∙, 1, . . . , . . . S = ∙ S C_opt, 0, . . . , . . . S = ∙ b C_opt, 0, . . . , . . . S = ∙ a C_opt, 0, . . . , . . . Artem Gorokhov (SPbU) March 4,2017 12 / 15
  28. Input processing Input : b ∙ c Grammar: S =

    a C_opt | b C_opt | S C_opt C_opt = | ∙ c Descriptors queue C_opt = ∙c, 1, . . . , . . . C_opt = ∙, 1, . . . , . . . S = ∙ S C_opt, 0, . . . , . . . S = ∙ b C_opt, 0, . . . , . . . S = ∙ a C_opt, 0, . . . , . . . Artem Gorokhov (SPbU) March 4,2017 12 / 15
  29. Input processing Input : bc∙ Grammar: S = a C_opt

    | b C_opt | S C_opt C_opt = | c ∙ Descriptors queue C_opt = ∙c, 1, . . . , . . . C_opt = ∙, 1, . . . , . . . S = ∙ S C_opt, 0, . . . , . . . S = ∙ b C_opt, 0, . . . , . . . S = ∙ a C_opt, 0, . . . , . . . C_opt,1,2 c,1,2 Artem Gorokhov (SPbU) March 4,2017 12 / 15
  30. Input processing Input : bc Automaton : c a b

    S S Artem Gorokhov (SPbU) March 4,2017 13 / 15
  31. Input processing Input : ∙ bc Automaton : c a

    b S S Descriptors queue S, 0, . . . , . . . Artem Gorokhov (SPbU) March 4,2017 13 / 15
  32. Input processing Input : ∙ bc Automaton : c a

    b S S b,0,1 S,0,1 b,0,1 Artem Gorokhov (SPbU) March 4,2017 13 / 15
  33. Evaluation Grammar G 1 S = K K K K

    K K |K a K K K K K = S K | a K | a RA for grammar G 1 a S a K K S K K K K K K Experiment results for input a40 Memory usage Time,sec Descriptors Stack Edges SPPF Nodes Grammar 7,940 6,974 111,127,244 81 RA 5,830 4,234 74,292,078 54 Ratio 27% 39% 33 % 35 % Artem Gorokhov (SPbU) March 4,2017 14 / 15
  34. Applicability Graph parsing: all input strings in one graph abcd

    abfd =⇒ b a c d f Graph parsing results Memory usage Time, min Descriptors Stack Edges Stack Nodes Grammar 21,134,080 7,482,789 2,731,529 02.26 RA 9,153,352 2,792,330 839,148 01.25 Ratio 57% 63% 69 % 45 % Artem Gorokhov (SPbU) March 4,2017 15 / 15