Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Hacking Go Compiler Internals

Hacking Go Compiler Internals

Moriyoshi Koizumi

November 30, 2014
Tweet

More Decks by Moriyoshi Koizumi

Other Decks in Technology

Transcript

  1. Hacking Go Compiler
    Internals
    Moriyoshi Koizumi

    View full-size slide

  2. Intended Audience
    • An eccentric Go programmer who happens to want
    to add feture XX to the language, knowing her
    patch will never be merged.
    • A keen-minded programmer who wants to know
    how the compiler works.

    View full-size slide

  3. Overall Architecture
    Parser
    Lexer
    Codegen
    Escape Analysis
    Typegen GCproggen

    View full-size slide

  4. Phase 1. Lexer

    View full-size slide

  5. Lexer
    • A lexer scans over the source code and cut it into a
    bunch of meaningful chunks (the first abstraction).
    • Example:
    a := b + c()
    LNAME LASOP +
    LNAME LNAME ( )

    View full-size slide

  6. Lexer
    src/go/cmd/gc/lexer.c
    static int32
    _yylex(void)
    {
    ...
    l0:
    c = getc();
    if(yy_isspace(c)) {
    if(c == '¥n' && curio.nlsemi) {
    ungetc(c);
    DBG("lex: implicit semi¥n");
    return ';';
    }
    goto l0;
    }
    ...

    View full-size slide

  7. Lexer
    ...
    switch(c) {
    ...
    case '+':
    c1 = getc();
    if(c1 == '+') {
    c = LINC;
    goto lx;
    }
    if(c1 == '=') {
    c = OADD;
    goto asop;
    }
    break;
    ....
    }

    View full-size slide

  8. When do you want to hack the
    lexer
    • Modify the keyword such as func and make.
    • Modify the operator only cosmetically (e.g. != →
    ~=)
    • Modify how literals and identifiers are represented.
    • Add a new keyword or operator to the language to
    later use in the parser.

    View full-size slide

  9. Example: Emojis for identifiers
    • http://moriyoshi.hatenablog.com/entry/2014/06/0
    3/121728
    • Go doesn’t treat emojis as part of identifiers.
    • But I wanted to have 寿司 (in the source)
    ./sushi.go:8: invalid identifier character U+1f363

    View full-size slide

  10. Example: Emojis for identifiers
    • Patched the following place to let it accept emojis:
    if(c >= Runeself) {
    ungetc(c);
    rune = getr();
    // 0xb7 · is used for internal names
    if(!isalpharune(rune) && !isdigitrune(rune) &&
    (importpkg == nil || rune != 0xb7))
    yyerror("invalid identifier character U+%04x",
    rune);
    cp += runetochar(cp, &rune);
    } else if(!yy_isalnum(c) && c != '_')
    break;

    View full-size slide

  11. Phase 2. Parser

    View full-size slide

  12. Parser
    • Parser repeatedly calls the lexer to fetch the tokens
    and builds an abstract syntax tree (AST) that
    represents the source code.
    • The AST is retouched (“typecheck”and “walk” sub-
    phase) during type inference and assersion phase
    so it would be less verbose and contain information
    helpful for the later stages.
    • src/cmd/gc/go.y, src/cmd/gc/dcl.c
    src/cmd/gc/typecheck.c,
    src/cmd/gc/walk.c,
    src/cmd/gc/reflect.c

    View full-size slide

  13. Parser
    LNAME LASOP +
    LNAME LNAME ( )
    OAS
    ONAME
    OADD
    ONAME
    OCALL
    ONAME ∅
    Tokens
    AST

    View full-size slide

  14. Parser
    • src/cmd/gc/go.y

    /*
    * expressions
    */
    expr:
    uexpr
    | expr LOROR expr
    {
    $$ = nod(OOROR, $1, $3);
    }
    | expr LANDAND expr
    {
    $$ = nod(OANDAND, $1, $3);
    }

    View full-size slide

  15. Example: Bracket operator
    overload!
    • Let the following code (A) expand to (B)
    • https://gist.github.com/moriyoshi/c0e2b2f9be688
    3e33251
    (A)
    (B)
    a := &struct{}{}
    fmt.Println(a[1])
    a[1] = "test2"
    fmt.Println(a.__getindex(1))
    a.__setindex(1, "test2")

    View full-size slide

  16. Example: Bracket operator
    overload!
    • Things to do:
    • Introduce a new AST node type (e.g. OINDEXINTER)
    • Add a branch point in “typecheck” to handle the case
    where the indexed target is neither a string, array, slice
    nor map type.
    • Supply a code in “walk” to specially treat the assignment
    and dereference that involves that kind of node. The
    code synthesizes the node to invoke the special
    functions, then typecheck and walk over themselves in a
    recursive manner.
    • Don’t forget to take care of evaluation order corrector.

    View full-size slide

  17. Helpful functions to debug your
    hack
    • print(const char *, …)
    • This is actually printf() of standard libc.
    • Accepts the following extra format specifiers:
    • %N (node)
    • %T (type)
    • %E, %J, %H, %L, %O, %S, %V, %Z, %B, %F

    View full-size slide

  18. Roll-up
    • Go’s compiler internals should look complex at first
    glance, but it would turn out pretty straightforward
    and hacker-friendly ;)

    View full-size slide