Upgrade to Pro — share decks privately, control downloads, hide ads and more …

ANTLR

 ANTLR

Introduction to the ANTLR recognizer generator.

Honza Javorek

May 19, 2012
Tweet

More Decks by Honza Javorek

Other Decks in Technology

Transcript

  1. ANTLR ANother Tool for Language Recognition since 1989 free software

    (BSD license) Terence Parr, University of San Francisco
  2. What is it? Parser generator using LL-parsing. Input: EBNF (extended

    Backus Naur form) CF grammar Output: scanner, parser, tree walker For: C, C++, C#, Java, Objective-C, Python, … CF grammar recognizer ANTLR
  3. Further introduction ANTLR does similar job as scanner: lex/flex, JFlex

    parser: yacc/Bison, JCup, javaCC, sableCC, SLADE ANTLR generates LL recognizers first, follow, syntax conditions, predictive, … ANTLR uses following terminology lexer: scanner, lexical analyser, tokenizer parser: syntactical analyser tree parser: tree walker (code generation)
  4. Getting started Web mostly Java tutorials http://www.antlr.org/wiki/display/ANTLR3/FAQ+-+Getting+Started Book The Definitive

    ANTLR Reference (by Terence Parr) Plugins ANTLR DL for Eclipse, ANTLRv3 IDE for Eclipse
  5. How to begin? Web mostly Java tutorials http://www.antlr.org/wiki/display/ANTLR3/FAQ+-+Getting+Started Book The

    Definitive ANTLR Reference (by Terence Parr) Plugins ANTLR DL for Eclipse, ANTLRv3 IDE for Eclipse ha, antlers again!
  6. Used to version 2? Current version of ANTLR is 3.

    It was completely rewritten in 4 years of research and coding (after 15 years of experience). Mostly clean up. Changes: (http://www.antlr.org/wiki/pages/viewpage.action?pageId=719) LL(k) → LL(*), does not require you to specify a lookahead depth Auto backtracking mode, improved speed of backtracking ANTLRWorks, integrated grammar development environment New syntax for grammars (BC break!) Simplified tree building Retargetable code generator, easy to build backends Improved error reporting Integration of StringTemplate engine (structured text generation)
  7. Overview Tree grammar MyWalker.g Parser grammar MyParser.g Lexer grammar MyLexer.g

    EBNF CF grammars ANTLR ANTLR ANTLR char stream target code Scanner MyLexer.c Parser MyParser.c e.g. Code gen. MyWalker.c
  8. Running ANTLR ANTLR is a Java program: java org.antlr.Tool file.g

    By default generates .java files. Download JAR at http://www.antlr.org/download.html Version for Win/Linux/Mac/... (BSD licenced src) Currently (11/2010) supports: C, C#, Action/JavaScript, Java ANTLRWorks: java -jar antlrworks-1.4.jar Integrated, but you can still use the Eclipse plugins.
  9. Sample grammar file [lexer/parser/tree] grammar MyGrammar; options { options for

    entire grammar file } tokens { token definitions } @header { copied into generated files (e.g. Java imports) } @rulecatch { error handling, exceptions } @members { optional class definitions: variables, methods } rulename: all rules for MyGrammar
  10. Sample grammar rule rulename [args] returns [T val] options {

    local options } : alternative 1 | ... | alternative n ; Example: expr : operand (PLUS operand)* ; operand : LPAR expr RPAR | NUMBER ; Optional stuff RegEx containing rulename, token, EBNF operator, code (Java) in braces EBNF operators: A|B = or A* = zero or more A+ = one or more A? = optional
  11. Example: Calculator // calc language example var n: integer; var

    x: integer; n := 2+4-1; x := n+3+7; print(x); declarations, only int assignments printing
  12. Calculator EBNF program ::= declarations statements EOF declarations ::= (declaration

    SEMICOLON)* declaration ::= VAR IDENTIFIER COLON type statements ::= (statement SEMICOLON)+ statement ::= assignment | printStatement assignment ::= lvalue BECOMES expr printStatement ::= PRINT LPAR expr RPAR lvalue ::= IDENTIFIER expr ::= op ((PLUS|MINUS) op)* op ::= IDENTIFIER | NUMBER | LPAR expr RPAR type ::= INTEGER
  13. Let's generate recognizers CalcLexer (extends Lexer) stream of chars stream

    of tokens → CalcParser (extends Parser) stream of tokens stream of tree nodes → CalcChecker (extends TreeParser) stream of tree nodes, checks context contstraints CalcInterpreter (extends TreeParser) stream of tree nodes, executes program
  14. Siamese twins, lexer & parser ANTLR 3 allows to define

    a hybrid grammar combining rules for lexer and parser together. grammar Calculator; options { k = 1; language = Java; output = AST; } tokens { PLUS = '+'; MINUS = '-'; ... }
  15. Parser-specific, lexer-specific rules program : declarations statements EOF ; declarations

    : (declaration SEMICOLON)* ; statements : (statement SEMICOLON)+ ; ... IDENTIFIER : LETTER (LETTER | DIGIT)* ; NUMBER : DIGIT+ ; ... fragment DIGIT : ('0'...'9'); fragment LOWER : ('a'...'z'); ...
  16. Summary ANTLR does the hard work! Generated code is human-readable.

    Same grammar syntax for scanners, parsers, tree walkers. Many languages supported. Under active development, has active user community. Integrated grammar development environment. Eclipse plugins. Java → many platforms (Linux, Windows, Mac, ...).