Upgrade to Pro — share decks privately, control downloads, hide ads and more …

SableCC

 SableCC

TA in Compilers.

Aggelos Biboudis

March 01, 2012
Tweet

More Decks by Aggelos Biboudis

Other Decks in Programming

Transcript

  1. Some History • [1975] Lex (token parser-stream of chars to

    tokens) – Lex builds a function implementing a deterministic finite automaton to recognize regular expressions in linera time – Lex is able to read arbitrary input, and determine what each part of the input is. This is called 'Tokenizing'. • [1975] YACC (parser generator that uses LALR(1) via table-based bottom up parsing) – Parses stream of tokens • [1989] PCCTS (parser generator that uses LL(*)) that builds recursive descent parsers Aggelos Biboudis 2
  2. What is SableCC SableCC is an OO-framework that is based

    only on the lexical and grammatical definition of the compiled language •Parser automatically builds the AST •AST nodes are strictly typed •Analysis is written in each own class •Analysis is separate from nodes Aggelos Biboudis 3
  3. General Steps 1. Creation of a SableCC specification file containing

    lexical definitions and the grammar 2. We launch the SableCC with the specification file as input 3. We create working classes 4. We create a main class to activate the lexer, parser and working classes 5. We compile everything with java compiler Aggelos Biboudis 4
  4. Specification files • Lexical and grammar definitions only • A

    destination root java package (where to put the generated files?) • Lexical definitions use regular expressions • Grammar is written in BNF Aggelos Biboudis 5
  5. Generated Files • Four packages are generated: lexer, parser, node

    and analysis – Lexer and exceptions – Parser and exceptions – Node classes for a typed AST – Analysis contains one interface and three classes for AST walking Aggelos Biboudis 6
  6. Lexer • Package declaration • Characters and character sets –

    Char, Decimal, Hex, Range, Union, Difference etc • Regular Expressions – line comment = '/' '/' [[0 .. 0xFFFF] - [10 + 13]]* (10 |13 | 10 13) • Helpers (not macros) – h = ‘a’ | ‘b’, t = ‘a’ h ‘b’ (t can be “aab”, “abb”, textual replacement would be a pitfall) • Tokens with optional lookahead • States (e.g. bol, inline, incomment) Aggelos Biboudis 11
  7. Parser • Parser class that builds a typed AST automatically

    while processing the input • Productions – EBNF syntax (*,+,?) • Optional: x? • Just list: x* • Non-empty list: x+ – No action code in specification (what is action?) – Naming rules (part1_part2_...)->PPart1Part2 Aggelos Biboudis 12