Upgrade to Pro — share decks privately, control downloads, hide ads and more …

sbt-rats: Packrat Parser Generation for Scala

sbt-rats: Packrat Parser Generation for Scala

Talk at ScalaSyd August 2012

Tony Sloane

August 21, 2014
Tweet

More Decks by Tony Sloane

Other Decks in Programming

Transcript

  1. sbt-rats: Packrat Parser Generation for Scala Anthony M. Sloane Programming

    Languages Research Group Department of Computing Macquarie University [email protected] [email protected] @inkytonik August 8, 2012
  2. Rats! Powerful parser generator by Robert Grimm (New York University).

    Based on parsing expression grammars (PEGs). Part of eXTensible Compiler (xtc) project: http://cs.nyu.edu/rgrimm/xtc/ http://cs.nyu.edu/rgrimm/xtc/rats-intro.html Particular focus on modularity and extensibility of grammars.
  3. Parsing expression grammars (PEGs) Very similar to context-free grammars, but

    with ordered choice. Ambiguity is avoided at the cost of having to be aware of the order of alternatives. Stm = "if" ’(’ Exp ’)’ Stm | "if" ’(’ Exp ’)’ Stm "else" Stm
  4. Packrat parsing Memoise parse results to avoid re-parsing. Results in

    parsing complexity that is linear in size of input, at the cost of space overhead. Stm = "if" ’(’ Exp ’)’ Stm "else" Stm | "if" ’(’ Exp ’)’ Stm
  5. Example PEG: white space and comments String Spacing = (Space

    / Comment)*; String Space = ’ ’ / ’\t’ / ’\f’ / EOL; String EOL = ’\r’ ’\n’ / ’\r’ / ’\n’; String EOF = !_; String Comment = SLComment / MLComment; String SLComment = "//" (!EOL _)* EOL; String MLComment = "/*" (MLComment / !"*/" _)* "*/";
  6. sbt-rats Simple build tool (sbt) plugin available from community plugin

    repository. Add .rats specification to project. Plugin uses Rats! to generate a parser implemented in Java. Optional Scala-specific customisation: use Scala lists instead of Rats! pairs use Scala positions instead of Rats! locations use Scala options instead of null
  7. Rats! grammar notation verbosity sealed abstract class Stm case class

    Block (optStms : List[Stm]) extends Stm public Stm Stm = "{" Stm* "}"
  8. Rats! grammar notation verbosity sealed abstract class Stm case class

    Block (optStms : List[Stm]) extends Stm public Stm Stm = void:"{":Symbol v1:Stm* void:"}":Symbol
  9. Rats! grammar notation verbosity sealed abstract class Stm case class

    Block (optStms : List[Stm]) extends Stm public Stm Stm = void:"{":Symbol v1:Stm* void:"}":Symbol { yyValue = new Block (v1); }
  10. sbt-rats syntax descriptions Generate Rats! specification from a high-level syntax

    description. Stm = Tipe Loc ’;’ {Decl} | ’;’ {EmptyStm} | Assign ’;’ {AsgnStm} | Lab ’:’ Stm {LabStm} | "break" Lab ’;’ {Break} | "continue" Lab ’;’ {Continue} | "if" ’(’ Exp ’)’ Stm "else" Stm {If} | "while" ’(’ Exp ’)’ Stm {While} | "{" Stm* "}" {Block}.
  11. Left recursion Top-down parsing methods don’t like left recursion and

    the natural context-free grammars for expressions are ambiguous. Exp = Exp "+" Exp {Add} | Exp "*" Exp {Mul} | Lit | ’(’ Exp ’)’.
  12. Left recursion Augment alternatives with associativity and precedence annotations. Exp

    = Exp "+" Exp {Add, left, 2} | Exp "*" Exp {Mul, left, 1} | Lit | ’(’ Exp ’)’.
  13. Left recursion: transformation to iteration Replace left recursion with iteration

    (and take care of tree construction). Exp = Exp2. Exp2 = Exp1 ("+" Exp1)*. Exp1 = Exp0 ("*" Exp0)*. Exp0 = Lit | "(" Exp ")".
  14. Abstract syntax Infer case class definitions from syntax description. sealed

    abstract class Stm case class Decl (tipe : Tipe, loc : Loc) extends Stm case class EmptyStm () extends Stm case class AsgnStm (assign : Exp) extends Stm case class LabStm (lab : String, stm : Stm) extends Stm case class Break (lab : String) extends Stm case class Continue (lab : String) extends Stm case class If (exp : Exp, stm1 : Stm, stm2 : Stm) extends Stm case class While (exp : Exp, stm : Stm) extends Stm case class Block (optStms : List[Stm]) extends Stm
  15. Abstract syntax tree if (v) v = 0; else v

    = 1; while (v) { boolean b; b = true; } If ( Use (Loc ("v")), AsgnStm (Assign (Loc ("v"), IntLit (0))), AsgnStm (Assign (Loc ("v"), IntLit (1)))) While ( Use (Loc ("v")), Block ( List ( Decl (BooleanType (), Loc ("b")), AsgnStm (Assign (Loc ("b"), True ())))))
  16. Pretty printing Augment syntax description with pretty printing semantics print

    components in order of definition literals either get a space after them (when double quoted) or don’t (when single quoted) Directives to customise defaults: sp: space \n: possible newline nest(s): indent s relative to its parent
  17. Add pretty printing directives Stm {line} = Tipe Loc ’;’

    {Decl} | ’;’ {EmptyStm} | Assign ’;’ {AsgnStm} | Lab ’:’ Stm {LabStm} | "break" Lab ’;’ {Break} | "continue" Lab ’;’ {Continue} | "if" ’(’ Exp ’)’ nest (Stm) \n "else" nest (Stm) {If} | "while" ’(’ Exp ’)’ sp Stm {While} | "{" nest (Stm*) \n "}" {Block}.
  18. Pretty printing output (width 75) { int v; v =

    true; v = (1 + 2) * 3; v = 1 + 2 * 3; if (v) v = 0; else v = 1; while (v) { boolean b; b = true; } v = 42; }
  19. Pretty printing output (width 1000) {{{ { int v; v

    = true; v = (1 + 2) * 3; v = 1 + 2 * 3; if (v) }}}
  20. Wait, there’s more! Default definitions of white space handling and

    comments. Standard definition of identifiers. Automatic handling of keywords. Override or escape to Rats! specification if needed. Drawbacks: Currently only one syntax description per project. No direct support for modularity yet.