Intended Audience
• An eccentric Go programmer who happens to want
to add feture XX to the language, knowing her
patch will never be merged.
• A keen-minded programmer who wants to know
how the compiler works.
Lexer
• A lexer scans over the source code and cut it into a
bunch of meaningful chunks (the first abstraction).
• Example:
a := b + c()
LNAME LASOP +
LNAME LNAME ( )
When do you want to hack the
lexer
• Modify the keyword such as func and make.
• Modify the operator only cosmetically (e.g. != →
~=)
• Modify how literals and identifiers are represented.
• Add a new keyword or operator to the language to
later use in the parser.
Slide 9
Slide 9 text
Example: Emojis for identifiers
• http://moriyoshi.hatenablog.com/entry/2014/06/0
3/121728
• Go doesn’t treat emojis as part of identifiers.
• But I wanted to have 寿司 (in the source)
./sushi.go:8: invalid identifier character U+1f363
Slide 10
Slide 10 text
Example: Emojis for identifiers
• Patched the following place to let it accept emojis:
if(c >= Runeself) {
ungetc(c);
rune = getr();
// 0xb7 · is used for internal names
if(!isalpharune(rune) && !isdigitrune(rune) &&
(importpkg == nil || rune != 0xb7))
yyerror("invalid identifier character U+%04x",
rune);
cp += runetochar(cp, &rune);
} else if(!yy_isalnum(c) && c != '_')
break;
Slide 11
Slide 11 text
Phase 2. Parser
Slide 12
Slide 12 text
Parser
• Parser repeatedly calls the lexer to fetch the tokens
and builds an abstract syntax tree (AST) that
represents the source code.
• The AST is retouched (“typecheck”and “walk” sub-
phase) during type inference and assersion phase
so it would be less verbose and contain information
helpful for the later stages.
• src/cmd/gc/go.y, src/cmd/gc/dcl.c
src/cmd/gc/typecheck.c,
src/cmd/gc/walk.c,
src/cmd/gc/reflect.c
Example: Bracket operator
overload!
• Let the following code (A) expand to (B)
• https://gist.github.com/moriyoshi/c0e2b2f9be688
3e33251
(A)
(B)
a := &struct{}{}
fmt.Println(a[1])
a[1] = "test2"
fmt.Println(a.__getindex(1))
a.__setindex(1, "test2")
Slide 16
Slide 16 text
Example: Bracket operator
overload!
• Things to do:
• Introduce a new AST node type (e.g. OINDEXINTER)
• Add a branch point in “typecheck” to handle the case
where the indexed target is neither a string, array, slice
nor map type.
• Supply a code in “walk” to specially treat the assignment
and dereference that involves that kind of node. The
code synthesizes the node to invoke the special
functions, then typecheck and walk over themselves in a
recursive manner.
• Don’t forget to take care of evaluation order corrector.
Slide 17
Slide 17 text
Helpful functions to debug your
hack
• print(const char *, …)
• This is actually printf() of standard libc.
• Accepts the following extra format specifiers:
• %N (node)
• %T (type)
• %E, %J, %H, %L, %O, %S, %V, %Z, %B, %F
Slide 18
Slide 18 text
Roll-up
• Go’s compiler internals should look complex at first
glance, but it would turn out pretty straightforward
and hacker-friendly ;)