Upgrade to Pro — share decks privately, control downloads, hide ads and more …

GText: A Language Workbench based on GLL and Te...

Ali Afroozeh
August 16, 2012

GText: A Language Workbench based on GLL and Term Rewriting

Final presentation of the Master's thesis at Eindhoven University of Technology

Ali Afroozeh

August 16, 2012
Tweet

More Decks by Ali Afroozeh

Other Decks in Research

Transcript

  1. A language workbench based on GLL and Term Rewriting E

    E + E + E E 1 E 2 3 by: Ali Afroozeh supervisor: Prof. Mark van den Brand
  2. “Language workbenches are, in essence, tools that help you build

    your own DSLs and provide tool support for them in the style of modern IDEs. The idea is that these tools don't just provide an IDE to help create DSLs; they support building IDEs for editing these DSLs.” 
 
 Martin Fowler Language Workbench
  3. Previous Work Master Thesis mlBNF – A Syntax Formalism for

    Domain Specific Languages M.W. Manders BSc April 5, 2011 Master Thesis mlBNF – A Syntax Formalism for Domain Specific Languages M.W. Manders BSc April 5, 2011 Supervisors Prof. Dr. M.G.J (Mark) van den Brand Prof. A (Adrian) Johnstone
  4. Gtext: Goals • EMF mapping • Ambiguous grammars • Difficult

    Parsing problems • Language embeddings • IDEs for complex DSLs
  5. GLL

  6. Generalized LL Parsing • by Elizabeth Scott and 
 Adrian

    Johnstone • Generalizes recursive-descent parsing
  7. Generalized LL Parsing • by Elizabeth Scott and 
 Adrian

    Johnstone • Generalizes recursive-descent parsing • Supports the full class of context-free grammars
  8. Generalized LL Parsing • by Elizabeth Scott and 
 Adrian

    Johnstone • Generalizes recursive-descent parsing • Supports the full class of context-free grammars • Supports grammars with left recursion
  9. An Example S ::= A d | B A ::=

    a B ::= b Nonterminal
  10. An Example S ::= A d | B A ::=

    a B ::= b Terminal
  11. An Example S ::= A d | B A ::=

    a B ::= b Body Head
  12. An Example S ::= A d | B A ::=

    a B ::= b Alternate
  13. An Example S ::= A d | B A ::=

    a B ::= b Produced language: ad, b
  14. main() { i = 0; if(I[i] in {a,b}) parseS(); else

    error(); if(I[i] == $) return success(); else error(); } // S ::= A d | B parseS() { if(I[i] in {a}) parseA(); if(I[i] == d) i++; else error(); else if(I[i] in {b}) B(); else error(); else error(); } // A ::= a parseA() { if(I(i) == a) i++; else error(); } // B ::= b parseB() { if(I(i) == b) i++; else error();}
  15. main() { i = 0; if(I[i] in {a,b}) parseS(); else

    error(); if(I[i] == $) return success(); else error(); } // S ::= A d | B parseS() { if(I[i] in {a}) parseA(); if(I[i] == d) i++; else error(); else if(I[i] in {b}) B(); else error(); else error(); } // A ::= a parseA() { if(I(i) == a) i++; else error(); } // B ::= b parseB() { if(I(i) == b) i++; else error();}
  16. main() { i = 0; if(I[i] in {a,b}) parseS(); else

    error(); if(I[i] == $) return success(); else error(); } // S ::= A d | B parseS() { if(I[i] in {a}) parseA(); if(I[i] == d) i++; else error(); else if(I[i] in {b}) B(); else error(); else error(); } // A ::= a parseA() { if(I(i) == a) i++; else error(); } // B ::= b parseB() { if(I(i) == b) i++; else error();}
  17. main() { i = 0; if(I[i] in {a,b}) parseS(); else

    error(); if(I[i] == $) return success(); else error(); } // S ::= A d | B parseS() { if(I[i] in {a}) parseA(); if(I[i] == d) i++; else error(); else if(I[i] in {b}) B(); else error(); else error(); } // A ::= a parseA() { if(I(i) == a) i++; else error(); } // B ::= b parseB() { if(I(i) == b) i++; else error();}
  18. main() { i = 0; if(I[i] in {a,b}) parseS(); else

    error(); if(I[i] == $) return success(); else error(); } // S ::= A d | B parseS() { if(I[i] in {a}) parseA(); if(I[i] == d) i++; else error(); else if(I[i] in {b}) B(); else error(); else error(); } // A ::= a parseA() { if(I(i) == a) i++; else error(); } // B ::= b parseB() { if(I(i) == b) i++; else error();}
  19. main() { i = 0; if(I[i] in {a,b}) parseS(); else

    error(); if(I[i] == $) return success(); else error(); } // S ::= A d | B parseS() { if(I[i] in {a}) parseA(); if(I[i] == d) i++; else error(); else if(I[i] in {b}) B(); else error(); else error(); } // A ::= a parseA() { if(I(i) == a) i++; else error(); } // B ::= b parseB() { if(I(i) == b) i++; else error();}
  20. non-LL(1) grammars S ::= A S d | B S

    | ε A ::= a | c B ::= a | b E ::= E + E | E - E | E * E | E / E | Digit
  21. non-LL(1) grammars S ::= A S d | B S

    | ε A ::= a | c B ::= a | b E ::= E + E | E - E | E * E | E / E | Digit Multiple choices
  22. non-LL(1) grammars S ::= A S d | B S

    | ε A ::= a | c B ::= a | b E ::= E + E | E - E | E * E | E / E | Digit Multiple choices Left recursion
  23. SPPF E ::= E “+” E | Digit Digit ::=

    [1-9]+ Input: 1 + 2 E E E + Digit: 1 Digit: 2
  24. SPPF E ::= E “+” E | Digit Digit ::=

    [1-9]+ Input: 1 + 2 E E E + Digit: 1 Digit: 2
  25. SPPF E ::= E “+” E | Digit Digit ::=

    [1-9]+ Input: 1 + 2 E E E + Digit: 1 Digit: 2
  26. SPPF E ::= E “+” E | Digit Digit ::=

    [1-9]+ Input: 1 + 2 E E E + Digit: 1 Digit: 2
  27. SPPF E ::= E “+” E | Digit Digit ::=

    [1-9]+ Input: 1 + 2 + 3 E E E E + Digit: 1 E + Digit: 2 Digit: 3 E
  28. Removing unnecessary nodes E E + E + E E

    Digit: 1 E Digit: 2 Digit: 3 E E E E + Digit: 1 E + Digit: 2 Digit: 3 E
  29. E E + E + E E Digit: 1 E

    Digit: 2 Digit: 3 Two Derivations
  30. Two Derivations E E + E E + E Digit:

    3 Digit: 1 Digit: 2 E E + E Digit: 1 E + E Digit: 2 Digit: 3
  31. Disambiguation Rules • Remove rule, removing an illegal pattern.
 remove

    [pattern]
 • Prefer rule, preferring one pattern to another.
 prefer [pattern1], [pattern2]
  32. The Dangling Else Ambiguity E::= "expr" S ::= "if" E

    "then" S+ | "if" E "then" S+ "else" S+ | "other"
  33. The Dangling Else Ambiguity E::= "expr" S ::= "if" E

    "then" S+ | "if" E "then" S+ "else" S+ | "other" Input: if expr then if expr then other else other
  34. The Dangling Else Ambiguity E::= "expr" S ::= "if" E

    "then" S+ | "if" E "then" S+ "else" S+ | "other" Input: if expr then if expr then other else other
  35. The Dangling Else Ambiguity E::= "expr" S ::= "if" E

    "then" S+ | "if" E "then" S+ "else" S+ | "other" Input: if expr then if expr then other else other
  36. The Dangling Else Ambiguity S if E then S else

    S S expr if E then S expr other other
  37. The Dangling Else Ambiguity S if E then S else

    S expr if E then S other expr other S if E then S expr if E then S else S expr other other ["if", E, "then", S] ["if", E, "then", S, “else”, S]
  38. The Dangling Else Ambiguity S if E then S else

    S expr if E then S other expr other S if E then S expr if E then S else S expr other other ["if", E, "then", S] ["if", E, "then", S, “else”, S] , prefer
  39. The Island-Water Ambiguity Program ::= Chunk* Chunk ::= Island |

    Water Island ::= Id “=” Digit “;” Water ::= Id | Integer | String | Char | SpecialChar SpecialChar ::= [; + - * / = ...] public static void main(String[] args) { int x = 0; x = 10; if( x > 1) { x = 15; } }
  40. The Island-Water Ambiguity Input: x = 10; Program Chunk_* Chunk

    Chunk Chunk Chunk Chunk Water: x Water: = Water: 10 Water: ; Island Id: x = Digit: 10 ;
  41. The Island-Water Ambiguity Program Chunk_* Chunk Chunk Chunk Chunk Chunk

    Water: x Water: = Water: 10 Water: ; Island Id: x = Digit: 10 ; prefer [ Chunk(Island) ], [ Chunk(Water), _* ]
  42. SPPF as an Algebraic Type SPPFNode ::= SymbolNode(label, children:SPPFNodeList) |

    PackedNode(children:SPPFNodeList) | IntermediateNode(children:SPPFNodeList) SPPFNodeList ::= concNode(SPPFNode*)
  43. SPPF as a Term SymbolNode("E", concNode( PackedNode(concNode(SymbolNode("E", concNode(SymbolNode("digit", concNode()))), SymbolNode("+",

    concNode()), SymbolNode("E", concNode(SymbolNode("E", concNode(SymbolNode("Digit"))), SymbolNode("+", concNode()), SymbolNode("E", concNode(SymbolNode("Digit", concNode()))))))), PackedNode(concNode(SymbolNode("E", concNode(SymbolNode("E", concNode(SymbolNode("Digit"))), SymbolNode("+", concNode()), SymbolNode("E", concNode(SymbolNode("Digit", concNode()))))), SymbolNode("E", concNode(SymbolNode("+", concNode()))), SymbolNode("E", concNode(SymbolNode("Digit", concNode()))))))) E E + E + E E Digit: 1 E Digit: 2 Digit: 3
  44. Disambiguation rules as Rewrite Rules SymbolNode("E", concNode(z1*, PackedNode(concNode(SymbolNode("E", concNode(SymbolNode("digit", concNode()))),

    SymbolNode("+", concNode()), SymbolNode("E", concNode(SymbolNode("E", concNode(SymbolNode("digit"))), SymbolNode("+", concNode()), SymbolNode("E", concNode(SymbolNode("digit", concNode()))))))), z2*)) -> SymbolNode(z1*, z2*) E E + E Digit: 1 E + E Digit: 2 Digit: 3
  45. Parsing Language Embeddings Island Grammar-based Parsing using GLL and Tom

    Ali Afroozeh1, Jean-Christophe Bach2 , 3, Mark van den Brand1, Adrian Johnstone4, Maarten Manders1, Pierre-Etienne Moreau2 , 3, and Elizabeth Scott4 1 Eindhoven University of Technology, NL-5612 AZ Eindhoven, The Netherlands 2 Inria, Villers-l` es-Nancy, 54600, France 3 Universit´ e de Lorraine, LORIA, Vandœuvre-l` es-Nancy, 54500, France 4 Department of Computer Science, Royal Holloway, University of London, Egham, Surrey, United Kingdom Abstract. Extending a language by embedding within it another lan- guage presents significant parsing challenges, especially if the embedding is recursive. The composite grammar is likely to be nondeterministic as a result of tokens that are valid in both the host and the embed- ded language. In this paper we examine the challenges of embedding the Tom language into a variety of general-purpose high level languages. Tom provides syntax and semantics for advanced pattern matching and tree rewriting facilities. Embedded Tom constructs are translated into the host language by a preprocessor, the output of which is a compos- ite program written purely in the host language. Tom implementations exist for Java , C , C# , Python and Caml . The current parser is com- plex and di cult to maintain. In this paper, we describe how Tom can be parsed using island grammars implemented with the Generalised LL