Slide 1

Slide 1 text

sbt-rats: Packrat Parser Generation for Scala Anthony M. Sloane Programming Languages Research Group Department of Computing Macquarie University [email protected] [email protected] @inkytonik August 8, 2012

Slide 2

Slide 2 text

Overview Figure: Parse structured text into an abstract syntax tree form and pretty print back to text.

Slide 3

Slide 3 text

Rats! Powerful parser generator by Robert Grimm (New York University). Based on parsing expression grammars (PEGs). Part of eXTensible Compiler (xtc) project: http://cs.nyu.edu/rgrimm/xtc/ http://cs.nyu.edu/rgrimm/xtc/rats-intro.html Particular focus on modularity and extensibility of grammars.

Slide 4

Slide 4 text

Parsing expression grammars (PEGs) Very similar to context-free grammars, but with ordered choice. Ambiguity is avoided at the cost of having to be aware of the order of alternatives. Stm = "if" ’(’ Exp ’)’ Stm | "if" ’(’ Exp ’)’ Stm "else" Stm

Slide 5

Slide 5 text

Packrat parsing Memoise parse results to avoid re-parsing. Results in parsing complexity that is linear in size of input, at the cost of space overhead. Stm = "if" ’(’ Exp ’)’ Stm "else" Stm | "if" ’(’ Exp ’)’ Stm

Slide 6

Slide 6 text

Example PEG: white space and comments String Spacing = (Space / Comment)*; String Space = ’ ’ / ’\t’ / ’\f’ / EOL; String EOL = ’\r’ ’\n’ / ’\r’ / ’\n’; String EOF = !_; String Comment = SLComment / MLComment; String SLComment = "//" (!EOL _)* EOL; String MLComment = "/*" (MLComment / !"*/" _)* "*/";

Slide 7

Slide 7 text

sbt-rats Simple build tool (sbt) plugin available from community plugin repository. Add .rats specification to project. Plugin uses Rats! to generate a parser implemented in Java. Optional Scala-specific customisation: use Scala lists instead of Rats! pairs use Scala positions instead of Rats! locations use Scala options instead of null

Slide 8

Slide 8 text

Rats! grammar notation verbosity sealed abstract class Stm case class Block (optStms : List[Stm]) extends Stm public Stm Stm = "{" Stm* "}"

Slide 9

Slide 9 text

Rats! grammar notation verbosity sealed abstract class Stm case class Block (optStms : List[Stm]) extends Stm public Stm Stm = void:"{":Symbol v1:Stm* void:"}":Symbol

Slide 10

Slide 10 text

Rats! grammar notation verbosity sealed abstract class Stm case class Block (optStms : List[Stm]) extends Stm public Stm Stm = void:"{":Symbol v1:Stm* void:"}":Symbol { yyValue = new Block (v1); }

Slide 11

Slide 11 text

sbt-rats syntax descriptions Generate Rats! specification from a high-level syntax description. Stm = Tipe Loc ’;’ {Decl} | ’;’ {EmptyStm} | Assign ’;’ {AsgnStm} | Lab ’:’ Stm {LabStm} | "break" Lab ’;’ {Break} | "continue" Lab ’;’ {Continue} | "if" ’(’ Exp ’)’ Stm "else" Stm {If} | "while" ’(’ Exp ’)’ Stm {While} | "{" Stm* "}" {Block}.

Slide 12

Slide 12 text

Left recursion Top-down parsing methods don’t like left recursion and the natural context-free grammars for expressions are ambiguous. Exp = Exp "+" Exp {Add} | Exp "*" Exp {Mul} | Lit | ’(’ Exp ’)’.

Slide 13

Slide 13 text

Left recursion Augment alternatives with associativity and precedence annotations. Exp = Exp "+" Exp {Add, left, 2} | Exp "*" Exp {Mul, left, 1} | Lit | ’(’ Exp ’)’.

Slide 14

Slide 14 text

Left recursion: transformation to iteration Replace left recursion with iteration (and take care of tree construction). Exp = Exp2. Exp2 = Exp1 ("+" Exp1)*. Exp1 = Exp0 ("*" Exp0)*. Exp0 = Lit | "(" Exp ")".

Slide 15

Slide 15 text

Abstract syntax Infer case class definitions from syntax description. sealed abstract class Stm case class Decl (tipe : Tipe, loc : Loc) extends Stm case class EmptyStm () extends Stm case class AsgnStm (assign : Exp) extends Stm case class LabStm (lab : String, stm : Stm) extends Stm case class Break (lab : String) extends Stm case class Continue (lab : String) extends Stm case class If (exp : Exp, stm1 : Stm, stm2 : Stm) extends Stm case class While (exp : Exp, stm : Stm) extends Stm case class Block (optStms : List[Stm]) extends Stm

Slide 16

Slide 16 text

Abstract syntax tree if (v) v = 0; else v = 1; while (v) { boolean b; b = true; } If ( Use (Loc ("v")), AsgnStm (Assign (Loc ("v"), IntLit (0))), AsgnStm (Assign (Loc ("v"), IntLit (1)))) While ( Use (Loc ("v")), Block ( List ( Decl (BooleanType (), Loc ("b")), AsgnStm (Assign (Loc ("b"), True ())))))

Slide 17

Slide 17 text

Pretty printing Augment syntax description with pretty printing semantics print components in order of definition literals either get a space after them (when double quoted) or don’t (when single quoted) Directives to customise defaults: sp: space \n: possible newline nest(s): indent s relative to its parent

Slide 18

Slide 18 text

Add pretty printing directives Stm {line} = Tipe Loc ’;’ {Decl} | ’;’ {EmptyStm} | Assign ’;’ {AsgnStm} | Lab ’:’ Stm {LabStm} | "break" Lab ’;’ {Break} | "continue" Lab ’;’ {Continue} | "if" ’(’ Exp ’)’ nest (Stm) \n "else" nest (Stm) {If} | "while" ’(’ Exp ’)’ sp Stm {While} | "{" nest (Stm*) \n "}" {Block}.

Slide 19

Slide 19 text

Pretty printing output (width 75) { int v; v = true; v = (1 + 2) * 3; v = 1 + 2 * 3; if (v) v = 0; else v = 1; while (v) { boolean b; b = true; } v = 42; }

Slide 20

Slide 20 text

Pretty printing output (width 1000) {{{ { int v; v = true; v = (1 + 2) * 3; v = 1 + 2 * 3; if (v) }}}

Slide 21

Slide 21 text

Wait, there’s more! Default definitions of white space handling and comments. Standard definition of identifiers. Automatic handling of keywords. Override or escape to Rats! specification if needed. Drawbacks: Currently only one syntax description per project. No direct support for modularity yet.

Slide 22

Slide 22 text

More Information http://sbt-rats.googlecode.com/ Related work: eXTensbile Compiler: http://cs.nyu.edu/rgrimm/xtc/ Rats!: http://cs.nyu.edu/rgrimm/xtc/rats-intro.html Kiama: http://kiama.googlecode.com/ Supporters of this project: TU/Eindhoven