Upgrade to Pro — share decks privately, control downloads, hide ads and more …

JavaCC & JTB tutorial

JavaCC & JTB tutorial

TA in Compilers.

Aggelos Biboudis

April 01, 2012
Tweet

More Decks by Aggelos Biboudis

Other Decks in Programming

Transcript

  1. JavaCC • lexical analyzer (token manager) • generates a top-down

    parser (LL(k)) • a recursive descent parser • building trees with JJTree and JTB Aggelos Biboudis 2
  2. JavaCC file • Options • PARSER_BEGIN(name) - PARSER_END(name) • Lexical

    specifications – SKIP, TOKEN, SPECIAL_TOKEN, MORE • List of productions – non terminal declaration followed by ; – declarations and statements within {} – lexical tokens as strings or regular expressions – use non terminals with [...] – actions Aggelos Biboudis 4
  3. Generated files • <MyParser>.java: The generate parser. • <MyParser>TokenManager.java: The

    generated token manager (or scanner/lexical analyzer). • <MyParser>Constants.java: A bunch of useful constants. • Also some boilerplate at "Token.java", "ParseException.java" Aggelos Biboudis 5
  4. Options • LOOKAHEAD • CHOICE_AMBIGUITY_CHEC K • OTHER_AMBIGUITY_CHECK • STATIC

    • SUPPORT_CLASS_VISIBILIT Y_PUBLIC • DEBUG_PARSER • DEBUG_LOOKAHEAD • DEBUG_TOKEN_MANAGER • ERROR_REPORTING • JAVA_UNICODE_ESCAPE • UNICODE_INPUT • IGNORE_CASE • USER_TOKEN_MANAGER • USER_CHAR_STREAM • BUILD_PARSER • BUILD_TOKEN_MANAGER • TOKEN_EXTENDS • TOKEN_FACTORY • TOKEN_MANAGER_USES_P ARSER • SANITY_CHECK • FORCE_LA_CHECK • COMMON_TOKEN_ACTION • CACHE_TOKENS • OUTPUT_DIRECTORY Aggelos Biboudis 6
  5. Productions • javacode_production – code instead EBNF when non context

    free production or difficult grammar in general – black box • bnf_production – local_lookahead – java_block – "(" expansion_choices ")" [ "+" | "*" | "?" ] – "[" expansion_choices "]" – [ java_assignment_lhs "=" ] regular_expression – [ java_assignment_lhs "=" ] java_identifier "(" java_expression_list ")" • regular_expr_production • token_manager_decls Aggelos Biboudis 7
  6. Regular Expressions in JavaCC • < ID: ["a"-"z","A"-"Z","_"] ( ["a"-"z","A"-"Z","_","0"-"9"]

    )* > • ( ... )+ • ( ... )? • ( r1 | r2 | ... ) • ["a"-"z"] • ~[] (any character) • ~["\n","\r"] (any character exception the new line characters) Aggelos Biboudis 8
  7. Choice conflict (2) • Warning: Choice conflict involving two expansions

    at line 25, column 3 and line 31, column 3 respectively. A common prefix is: <ID> Consider using a lookahead of 2 for earlier expansion. Aggelos Biboudis 10
  8. Choice Conflict (3) • Turn it into LL(1) • Make

    use of LOOKAHEAD – Global lookahead via option (do not inc it without good reason!) – Local lookahead at choice point Aggelos Biboudis 11
  9. Java Tree Builder • Consumes a jj grammar file and

    generates – Syntax tree classes based on productions in grammar – Visitor design pattern – Visitor and GJVisitor interfaces – Two depth-first visitors: DepthFirstVisitor (simply for visiting) and GJDepthFirst (with generic return and args) – A JavaCC grammar (output jtb.out.jj. It builds the tree during parsing). Aggelos Biboudis 13
  10. How to use (from jtb example) • jtb subscheme.jj •

    Code a Visitor that is used in the jj (FreeVarFinderVisitor in the example) • javacc jtb.out.jj • javac SubScheme.java • java SubScheme < inputfile Aggelos Biboudis 15
  11. Tree Node Interface and Classes Aggelos Biboudis 16 • Based

    on the RHS of the production the public fields that are generated are of the following types – Node – NodeListInterface – NodeChoise – NodeList – NodeListOptional – NodeOptional – NodeSequence – NodeToken
  12. JavaCC vs SableCC • SableCC supports LALR(1) • JavaCC supports

    LL(1) • SableCC more OOP, JavaCC more imperative • Lookahead seems like a hack but it is more straightforward. Aggelos Biboudis 17