Slide 1

Slide 1 text

GopherCon AU Is Go A Good Language to Build Compilers? Sungmin Han Golang Korea | Go GDE

Slide 2

Slide 2 text

GopherCon AU Sungmin Han Google Developer Experts(GDE) for AI/ML and Cloud Google Developer Groups(GDG) for Go Google Cloud Champion Innovator for Modern Architecture F-Lab Python Mentor Former Head of Tech at Riiid Former Research Engineer at Naver Clova Former Software Engineer at IGAWorks Former Software Engineer at Simsimi Speaker

Slide 3

Slide 3 text

GopherCon AU Index • Project Overview • Project Structure • Demo with Code • LLVM & Optimization • Conclusion

Slide 4

Slide 4 text

GopherCon AU Project Overview

Slide 5

Slide 5 text

GopherCon AU This session will cover… The experience of building compilers and interpreters in Go

Slide 6

Slide 6 text

GopherCon AU Why Go? 1 Cross-compile friendly: Minimizing compile issues and docs across envs 2 Easy-to-write: Simple than Java, Obvious than Kotlin, and Easy than Rust 3 Less package-resolving error: No much deps error, versioning issues 4 Speed: A lightweight runtime with a concurrency scheduler 5 Eco-system: The tendency to focus on the backend core development

Slide 7

Slide 7 text

GopherCon AU Why Go? Source: https://github.com/kostya/benchmarks Balanced from CPU usage, Memory usage, and Productivity

Slide 8

Slide 8 text

GopherCon AU Up combines Python-like ease of use with Go’s concurrency strengths, allowing developers to leverage threads without typical complexity. It’s a unique language that invites exploration for those pushing modern programming paradigms. Up Language

Slide 9

Slide 9 text

GopherCon AU Project Structure

Slide 10

Slide 10 text

GopherCon AU Which components are needed for Compiler? Parser / Lexers Runtime Built-ins Grammar Optimizer Virtual Machine

Slide 11

Slide 11 text

GopherCon AU By the perspective of Frontend Backend

Slide 12

Slide 12 text

GopherCon AU By the perspective of Lexical Analysis Syntax Analysis Semantic Analysis Intermediate Code Gen (IR) Machine-level Code Post-Optimizer Frontend Backend Tokens Parse Tree Abstract Syntax Tree (AST) Final Assembly or Object code Assembly or Object code Intermediate Representation

Slide 13

Slide 13 text

GopherCon AU Demo with Code

Slide 14

Slide 14 text

GopherCon AU Repository https://github.com/KennethanCeyer/up

Slide 15

Slide 15 text

GopherCon AU Basic Structure of Compiler Frontend Backend Lexer Parser Runtime Environment Tokens Go Execution AST AST

Slide 16

Slide 16 text

GopherCon AU Demo

Slide 17

Slide 17 text

GopherCon AU func main() { if len(os.Args) != 2 { fmt.Println("Usage: go run . ") for _, arg := range os.Args[1:] { fmt.Println(getAttr(options, arg)) } return } cwd, err := os.Getwd() if err != nil { fmt.Println("Error getting current directory:", err) return } filename := os.Args[1] absolutePath := filepath.Join(cwd, filename) interpreter.Execute(absolutePath, &options) }

Slide 18

Slide 18 text

GopherCon AU func Execute(filepath string, options *Options) { data, err := os.ReadFile(filepath) if err != nil { fmt.Println("Error reading file:", err) return } tokens, err := interpreter.Lexer(string(data)) if err != nil { fmt.Println("Error in lexical analysis:", err) return } ast, err := interpreter.Parse(tokens) if err != nil { fmt.Println("Error in parsing:", err) return } env := interpreter.NewEnvironment() interpreter.ExecuteNode(ast, env) } 1 Lexer 2 Parser 3 Environment / Execution

Slide 19

Slide 19 text

GopherCon AU Basic Structure of Compiler Frontend Backend Lexer Parser Runtime Environment Tokens Go Execution AST AST

Slide 20

Slide 20 text

GopherCon AU type TokenType string const ( FUNC TokenType = "FUNC" LBRACE = "LBRACE" RBRACE = "RBRACE" LPAREN = "LPAREN" RPAREN = "RPAREN" COLON = "COLON" COMMA = "COMMA" ARROW = "ARROW" IDENTIFIER = "IDENTIFIER" INT = "INT" ADD = "ADD" SUB = "SUB" MUL = "MUL" ... )

Slide 21

Slide 21 text

GopherCon AU func Lexer(input string) ([]Token, error) { var tokens []Token i := 0 row, col := 1, 1 for i < len(input) { switch { case isAlpha(input[i]) || input[i] == '_': start := i for isAlpha(input[i]) || isDigit(input[i]) || input[i] == '_' { i++ } identifier := input[start:i] switch identifier { case "func": tokens = append(tokens, Token{Type: FUNC, Value: "func", Row: row, Col: col}) case "return": tokens = append(tokens, Token{Type: RETURN, Value: "return", Row: row, Col: col}) ... default: tokens = append(tokens, Token{Type: IDENTIFIER, Value: identifier, Row: row, Col: col}) } case input[i] == '/': if i+1 < len(input) && input[i+1] == '/' { i += 2 ... } else {

Slide 22

Slide 22 text

GopherCon AU case input[i] == ',': tokens = append(tokens, Token{Type: COMMA, Value: ",", Row: row, Col: col}) i++ col++ case input[i] == '{': tokens = append(tokens, Token{Type: LBRACE, Value: "{", Row: row, Col: col}) i++ col++ case input[i] == '}': tokens = append(tokens, Token{Type: RBRACE, Value: "}", Row: row, Col: col}) i++ col++ case input[i] == '(': tokens = append(tokens, Token{Type: LPAREN, Value: "(", Row: row, Col: col}) i++ col++ case input[i] == ')': tokens = append(tokens, Token{Type: RPAREN, Value: ")", Row: row, Col: col}) i++ col++ case strings.HasPrefix(input[i:], "+="): tokens = append(tokens, Token{Type: ADD_ASSIGN, Value: "+=", Row: row, Col: col}) i += 2 col += 2 case strings.HasPrefix(input[i:], "-="): tokens = append(tokens, Token{Type: SUB_ASSIGN, Value: "-=", Row: row, Col: col}) i += 2 col += 2

Slide 23

Slide 23 text

GopherCon AU Why don’t you use the open parser like LL(*) or ANTLR? ● Purpose of performance experiment ● Implementation difficulty of receiver and inference typing ● Purpose of implementing the front-end optimizer ● Dependency removal

Slide 24

Slide 24 text

GopherCon AU Basic Structure of Compiler Frontend Backend Lexer Parser Runtime Environment Tokens Go Execution AST AST

Slide 25

Slide 25 text

GopherCon AU func (p *Parser) parseProgram() *ProgramNode { var functions []*FuncDeclarationNode for p.current().Type != EOF { function := p.parseFunction() functions = append(functions, function) } return &ProgramNode{Functions: functions} } func NewParser(tokens []Token) *Parser { return &Parser{tokens: tokens, pos: 0} } func Parse(tokens []Token) (*ProgramNode, error) { parser := NewParser(tokens) return parser.parseProgram(), nil }

Slide 26

Slide 26 text

GopherCon AU func (p *Parser) parseExpression() Node { switch p.current().Type { case IDENTIFIER: if p.lookahead(1).Type == LPAREN { return p.parseFunctionCall() } else if isAssignmentOperator(p.lookahead(1).Type) || p.isTypeAssignment(1) { return p.parseAssignment() } return p.parseIdentifier() case INT: return p.parseInt() case STRING: return p.parseString() case FOR: return p.parseForLoop() case ADD, SUB, MUL, DIV: return p.parseBinOp() case RETURN: return p.parseReturn() default: panic(fmt.Sprintf("Unexpected token %s at [%d:%d]", p.current().Type, p.current().Row, p.current().Col)) } }

Slide 27

Slide 27 text

GopherCon AU func (p *Parser) parseForLoop() *ForLoopNode { p.consume(FOR) variable := p.parseIdentifier().Name p.consume(IN) p.consume(RANGE) p.consume(LPAREN) rng := p.parseExpression() p.consume(RPAREN) p.consume(LBRACE) var body []Node for p.current().Type != RBRACE && p.current().Type != EOF { body = append(body, p.parseExpression()) } p.consume(RBRACE) return &ForLoopNode{Variable: variable, Range: rng, Body: body} }

Slide 28

Slide 28 text

GopherCon AU Basic Structure of Compiler Frontend Backend Lexer Parser Runtime Environment Tokens Go Execution AST AST

Slide 29

Slide 29 text

GopherCon AU type Environment struct { store map[string]interface{} outer *Environment } func NewEnvironment() *Environment { s := make(map[string]interface{}) env := &Environment{store: s, outer: nil} // add built-in functions env.store["print"] = BuiltinFunction(func(args []interface{}) interface{} { for _, arg := range args { fmt.Print(arg) } fmt.Println() // newline after print return nil }) return env }

Slide 30

Slide 30 text

GopherCon AU func (e *Environment) Get(name string) (interface{}, bool) { obj, ok := e.store[name] if !ok && e.outer != nil { obj, ok = e.outer.Get(name) } return obj, ok } func (e *Environment) Set(name string, val interface{}) { e.store[name] = val }

Slide 31

Slide 31 text

GopherCon AU Basic Structure of Compiler Frontend Backend Lexer Parser Runtime Environment Tokens Go Execution AST AST

Slide 32

Slide 32 text

GopherCon AU func ExecuteNode(node Node, env *Environment) interface{} { switch n := node.(type) { case *ProgramNode: var result interface{} for _, function := range n.Functions { env.Set(function.Name, function) } if mainFunc, ok := env.Get("main"); ok { if mainFuncObj, isFunc := mainFunc.(*FuncDeclarationNode); isFunc { newEnv := NewEnvironment() newEnv.outer = env for _, stmt := range mainFuncObj.Body { result = ExecuteNode(stmt, newEnv) } } } return result

Slide 33

Slide 33 text

GopherCon AU case *FuncDeclarationNode: return n case *FunctionCallNode: if function, ok := env.Get(n.FunctionName); ok { if funcObj, isUserDefined := function.(*FuncDeclarationNode); isUserDefined { newEnv := NewEnvironment() newEnv.outer = env if len(n.Arguments) != len(funcObj.Parameters) { panic(fmt.Sprintf("Expected %d arguments but got %d", len(funcObj.Parameters), len(n.Arguments))) } for i, param := range funcObj.Parameters { newEnv.Set(param.Name, ExecuteNode(n.Arguments[i], env)) } var result interface{} for _, stmt := range funcObj.Body { result = ExecuteNode(stmt, newEnv) } return result ...

Slide 34

Slide 34 text

GopherCon AU case *AssignmentNode: val := ExecuteNode(n.Value, env) env.Set(n.VarName, val) return val case *BinOpNode: left := ExecuteNode(n.Left, env) right := ExecuteNode(n.Right, env) if lInt, lOk := left.(int); lOk { if rInt, rOk := right.(int); rOk { switch n.Op { case "+": return lInt + rInt ... case "/": if rInt == 0 { panic("Division by zero.") } return lInt / rInt case "%": return lInt % rInt default: panic("Unknown operator: " + n.Op) } } }

Slide 35

Slide 35 text

GopherCon AU LLVM

Slide 36

Slide 36 text

GopherCon AU Previous Compiler looks like Frontend Backend Lexer Parser Runtime Environment Tokens Go Execution AST AST

Slide 37

Slide 37 text

GopherCon AU The problems of the previous design 1 Difficulty in optimizing runtimes per machine (e.g., kernel API, custom runtimes). 2 Cross-compilation requires extensive time to adapt machine instruction sets. 3 Non-standard configurations demand frequent updates with new instruction sets. 4 Additional compilation learning requirements may deter contributors.

Slide 38

Slide 38 text

GopherCon AU Change to this Frontend Backend Lexer Parser Virtual Machine Compiler Tokens LLVM IR AST AST

Slide 39

Slide 39 text

GopherCon AU Introduction of LLVM LLVM is a set of compiler and toolchain technologie that can be used to develop a frontend for any programming language and a backend for any instruction set architecture. LLVM is designed around a language-independent intermediate representation (IR) that serves as a portable, high-level assembly language that can be optimized with a variety of transformations over multiple passes

Slide 40

Slide 40 text

GopherCon AU How LLVM works? LLVM IR x86 ARM

Slide 41

Slide 41 text

GopherCon AU brew install llvm@17 # ~/.zshrc or ~/.bashrc export LDFLAGS="-L/opt/homebrew/opt/llvm@17/lib" export CPPFLAGS="-I/opt/homebrew/opt/llvm@17/include"

Slide 42

Slide 42 text

GopherCon AU func main() { ctx := llvm.NewContext() defer ctx.Dispose() module := ctx.NewModule("example") defer module.Dispose() intType := ctx.Int32Type() funcType := llvm.FunctionType(intType, []llvm.Type{intType, intType}, false) function := llvm.AddFunction(module, "add", funcType) block := ctx.AddBasicBlock(function, "entry") builder := ctx.NewBuilder() defer builder.Dispose() builder.SetInsertPointAtEnd(block) arg1 := function.Param(0) arg2 := function.Param(1) result := builder.CreateAdd(arg1, arg2, "result") builder.CreateRet(result) module.Dump() fmt.Println("LLVM IR generated successfully.") }

Slide 43

Slide 43 text

GopherCon AU go run -tags=llvm17 compiler_demo.go ; ModuleID = 'example' source_filename = "example" define i32 @add(i32 %0, i32 %1) { entry: %result = add i32 %0, %1 ret i32 %result } LLVM IR generated successfully.

Slide 44

Slide 44 text

GopherCon AU Conclusion

Slide 45

Slide 45 text

GopherCon AU To the conclusion… 1 Go is a suitable language for building programs such as compilers 2 There is a clear challenge in using LLVM bindings with Go due to the lack of official documentation on this subject. 3 Developing a programming language broadens your perspective on various languages, so it’s recommended to give it a try

Slide 46

Slide 46 text

GopherCon AU • As an experimental language • Lightweight Threading • Native AsyncIO (backend: io_uring) support • Gradual Typed Language • Using GC (Garbage Collector) Future works

Slide 47

Slide 47 text

GopherCon AU Q&A

Slide 48

Slide 48 text

GopherCon AU Thank you!