Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Rattler - Jason Arhart

Las Vegas Ruby Group
August 01, 2012
38

Rattler - Jason Arhart

Las Vegas Ruby Group

August 01, 2012
Tweet

Transcript

  1. What is Rattler? Rattler is a parser generator for Ruby

    WTF is a parser generator? Thursday, August 1, 13
  2. Parsing • Informal term for “syntactic analysis” • Analyzing a

    “sentence” in terms of grammatical constituents • Recognizing implicit structure in a linear sequence of “words” Thursday, August 1, 13
  3. Syntax • A syntax is a set of rules that

    govern the sentence structure of a language • Syntax refers only to the structure of a sentence; not the meaning, or “semantics” • A formal description of a language’s syntax is called a “grammar” Thursday, August 1, 13
  4. Grammar • A grammar is a set of rules that

    define the syntax of a language • A formal grammar can be generative or analytic • Generative grammars define how to form valid sentences • Analytic grammars define how to recognize valid sentences Thursday, August 1, 13
  5. Arithmetic Grammar expression -> number expression -> “(“ expression “)”

    expression -> expression “+” expression expression -> expression “-” expression expression -> expression “*” expression expression -> expression “/” expression number -> digit number -> digit number digit -> “0” digit -> “1” ... Thursday, August 1, 13
  6. Parser • A parser is a program that recognizes grammatical

    structure • Turns a linear sequence of characters into an explicit structure • Output is typically a tree structure • Usually a component of a compiler or interpreter (or parser generator) Thursday, August 1, 13
  7. Parser Generator • A parser generator is a program that

    generates a parser • The input is a formal description of the language to parse • That formal description is a grammar • The parser generator starts by parsing the grammar Thursday, August 1, 13
  8. Ruby Parser Generators • Racc - LALR(1) • Treetop -

    PEG (packrat) • Citrus - PEG (packrat) • ANTLR for Ruby - LL • Ragel - finite state machine Thursday, August 1, 13
  9. Why Rattler? • Racc is LALR, which is a PITA

    • Treetop and Citrus are pretty basic and don’t handle left-recursion • ANTLR for Ruby is slow and doesn’t handle left-recursion • Ragel is not really designed for syntactic analysis Thursday, August 1, 13
  10. Why Rattler? • LR parser generators are a PITA •

    Require a separate token grammar • Difficult to debug • Difficult to produce useful error messages • Generative grammars tend to be ambiguous Thursday, August 1, 13
  11. Dangling “else” “if A then if B then C else

    D” or Thursday, August 1, 13
  12. Parsing Expression Grammars • analytic grammars (vs. generative) • the

    choice operator is ordered • express a recursive descent parsing algorithm explicitly • usually used for packrat parsing Thursday, August 1, 13
  13. Dangling “else” solved if_expr <- “if” expr “then” expr “else”

    expr / “if” expr “then” expr This PEG rule parses if-then-else expressions correctly and unambiguously Thursday, August 1, 13
  14. Why Rattler? • PEG-based parser generators have their own disadvantages

    • Can’t handle left-recursive rules • Common parsing problems are hard • handling whitespace • matching tokens • delimited lists Thursday, August 1, 13
  15. Why Rattler? The Bottom Line: Existing tools are too hard

    Parsing in Ruby should be easy Thursday, August 1, 13
  16. Rattler • PEG-based, but adds many convenient grammar features •

    DRY whitespace handling • simplified keyword matching • delimited lists • back references Thursday, August 1, 13
  17. Rattler • Supports left-recursive grammars • Useful error messages •

    Generates efficient pure-ruby parsers • RSpec matchers for testing parsers • Outputs parse trees using GraphViz Thursday, August 1, 13
  18. Philosophy of Rattler • Parsing in Ruby should be easy

    and maybe even fun! • Experimenting should be encouraged • Common parsing problems should be easy to solve • Write expressive grammars, let Rattler optimize the parser Thursday, August 1, 13
  19. Future • Separate parser generator & runtime • Multiple targets

    • Operator precedence parsing • Lazy semantic actions • Compiler back-end • More optimizations Thursday, August 1, 13