Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Is Go A Good Language to Build Compilers?

Sungmin Han
November 19, 2024

Is Go A Good Language to Build Compilers?

This presentation explores the journey of creating an open-source compiler using Go, delving into topics such as syntax analysis, internal optimizations, and integration with the LLVM backend. It highlights Go's unique strengths for compiler development, including its simplicity and efficiency. The session also outlines the process of compiler bootstrapping and emphasizes making compiler development approachable for newcomers to the field.

Sungmin Han

November 19, 2024
Tweet

More Decks by Sungmin Han

Other Decks in Technology

Transcript

  1. GopherCon AU Is Go A Good Language to Build Compilers?

    Sungmin Han Golang Korea | Go GDE
  2. GopherCon AU Sungmin Han Google Developer Experts(GDE) for AI/ML and

    Cloud Google Developer Groups(GDG) for Go Google Cloud Champion Innovator for Modern Architecture F-Lab Python Mentor Former Head of Tech at Riiid Former Research Engineer at Naver Clova Former Software Engineer at IGAWorks Former Software Engineer at Simsimi Speaker
  3. GopherCon AU Index • Project Overview • Project Structure •

    Demo with Code • LLVM & Optimization • Conclusion
  4. GopherCon AU Why Go? 1 Cross-compile friendly: Minimizing compile issues

    and docs across envs 2 Easy-to-write: Simple than Java, Obvious than Kotlin, and Easy than Rust 3 Less package-resolving error: No much deps error, versioning issues 4 Speed: A lightweight runtime with a concurrency scheduler 5 Eco-system: The tendency to focus on the backend core development
  5. GopherCon AU Up combines Python-like ease of use with Go’s

    concurrency strengths, allowing developers to leverage threads without typical complexity. It’s a unique language that invites exploration for those pushing modern programming paradigms. Up Language
  6. GopherCon AU Which components are needed for Compiler? Parser /

    Lexers Runtime Built-ins Grammar Optimizer Virtual Machine
  7. GopherCon AU By the perspective of Lexical Analysis Syntax Analysis

    Semantic Analysis Intermediate Code Gen (IR) Machine-level Code Post-Optimizer Frontend Backend Tokens Parse Tree Abstract Syntax Tree (AST) Final Assembly or Object code Assembly or Object code Intermediate Representation
  8. GopherCon AU Basic Structure of Compiler Frontend Backend Lexer Parser

    Runtime Environment Tokens Go Execution AST AST
  9. GopherCon AU func main() { if len(os.Args) != 2 {

    fmt.Println("Usage: go run . <filename.up>") for _, arg := range os.Args[1:] { fmt.Println(getAttr(options, arg)) } return } cwd, err := os.Getwd() if err != nil { fmt.Println("Error getting current directory:", err) return } filename := os.Args[1] absolutePath := filepath.Join(cwd, filename) interpreter.Execute(absolutePath, &options) }
  10. GopherCon AU func Execute(filepath string, options *Options) { data, err

    := os.ReadFile(filepath) if err != nil { fmt.Println("Error reading file:", err) return } tokens, err := interpreter.Lexer(string(data)) if err != nil { fmt.Println("Error in lexical analysis:", err) return } ast, err := interpreter.Parse(tokens) if err != nil { fmt.Println("Error in parsing:", err) return } env := interpreter.NewEnvironment() interpreter.ExecuteNode(ast, env) } 1 Lexer 2 Parser 3 Environment / Execution
  11. GopherCon AU Basic Structure of Compiler Frontend Backend Lexer Parser

    Runtime Environment Tokens Go Execution AST AST
  12. GopherCon AU type TokenType string const ( FUNC TokenType =

    "FUNC" LBRACE = "LBRACE" RBRACE = "RBRACE" LPAREN = "LPAREN" RPAREN = "RPAREN" COLON = "COLON" COMMA = "COMMA" ARROW = "ARROW" IDENTIFIER = "IDENTIFIER" INT = "INT" ADD = "ADD" SUB = "SUB" MUL = "MUL" ... )
  13. GopherCon AU func Lexer(input string) ([]Token, error) { var tokens

    []Token i := 0 row, col := 1, 1 for i < len(input) { switch { case isAlpha(input[i]) || input[i] == '_': start := i for isAlpha(input[i]) || isDigit(input[i]) || input[i] == '_' { i++ } identifier := input[start:i] switch identifier { case "func": tokens = append(tokens, Token{Type: FUNC, Value: "func", Row: row, Col: col}) case "return": tokens = append(tokens, Token{Type: RETURN, Value: "return", Row: row, Col: col}) ... default: tokens = append(tokens, Token{Type: IDENTIFIER, Value: identifier, Row: row, Col: col}) } case input[i] == '/': if i+1 < len(input) && input[i+1] == '/' { i += 2 ... } else {
  14. GopherCon AU case input[i] == ',': tokens = append(tokens, Token{Type:

    COMMA, Value: ",", Row: row, Col: col}) i++ col++ case input[i] == '{': tokens = append(tokens, Token{Type: LBRACE, Value: "{", Row: row, Col: col}) i++ col++ case input[i] == '}': tokens = append(tokens, Token{Type: RBRACE, Value: "}", Row: row, Col: col}) i++ col++ case input[i] == '(': tokens = append(tokens, Token{Type: LPAREN, Value: "(", Row: row, Col: col}) i++ col++ case input[i] == ')': tokens = append(tokens, Token{Type: RPAREN, Value: ")", Row: row, Col: col}) i++ col++ case strings.HasPrefix(input[i:], "+="): tokens = append(tokens, Token{Type: ADD_ASSIGN, Value: "+=", Row: row, Col: col}) i += 2 col += 2 case strings.HasPrefix(input[i:], "-="): tokens = append(tokens, Token{Type: SUB_ASSIGN, Value: "-=", Row: row, Col: col}) i += 2 col += 2
  15. GopherCon AU Why don’t you use the open parser like

    LL(*) or ANTLR? • Purpose of performance experiment • Implementation difficulty of receiver and inference typing • Purpose of implementing the front-end optimizer • Dependency removal
  16. GopherCon AU Basic Structure of Compiler Frontend Backend Lexer Parser

    Runtime Environment Tokens Go Execution AST AST
  17. GopherCon AU func (p *Parser) parseProgram() *ProgramNode { var functions

    []*FuncDeclarationNode for p.current().Type != EOF { function := p.parseFunction() functions = append(functions, function) } return &ProgramNode{Functions: functions} } func NewParser(tokens []Token) *Parser { return &Parser{tokens: tokens, pos: 0} } func Parse(tokens []Token) (*ProgramNode, error) { parser := NewParser(tokens) return parser.parseProgram(), nil }
  18. GopherCon AU func (p *Parser) parseExpression() Node { switch p.current().Type

    { case IDENTIFIER: if p.lookahead(1).Type == LPAREN { return p.parseFunctionCall() } else if isAssignmentOperator(p.lookahead(1).Type) || p.isTypeAssignment(1) { return p.parseAssignment() } return p.parseIdentifier() case INT: return p.parseInt() case STRING: return p.parseString() case FOR: return p.parseForLoop() case ADD, SUB, MUL, DIV: return p.parseBinOp() case RETURN: return p.parseReturn() default: panic(fmt.Sprintf("Unexpected token %s at [%d:%d]", p.current().Type, p.current().Row, p.current().Col)) } }
  19. GopherCon AU func (p *Parser) parseForLoop() *ForLoopNode { p.consume(FOR) variable

    := p.parseIdentifier().Name p.consume(IN) p.consume(RANGE) p.consume(LPAREN) rng := p.parseExpression() p.consume(RPAREN) p.consume(LBRACE) var body []Node for p.current().Type != RBRACE && p.current().Type != EOF { body = append(body, p.parseExpression()) } p.consume(RBRACE) return &ForLoopNode{Variable: variable, Range: rng, Body: body} }
  20. GopherCon AU Basic Structure of Compiler Frontend Backend Lexer Parser

    Runtime Environment Tokens Go Execution AST AST
  21. GopherCon AU type Environment struct { store map[string]interface{} outer *Environment

    } func NewEnvironment() *Environment { s := make(map[string]interface{}) env := &Environment{store: s, outer: nil} // add built-in functions env.store["print"] = BuiltinFunction(func(args []interface{}) interface{} { for _, arg := range args { fmt.Print(arg) } fmt.Println() // newline after print return nil }) return env }
  22. GopherCon AU func (e *Environment) Get(name string) (interface{}, bool) {

    obj, ok := e.store[name] if !ok && e.outer != nil { obj, ok = e.outer.Get(name) } return obj, ok } func (e *Environment) Set(name string, val interface{}) { e.store[name] = val }
  23. GopherCon AU Basic Structure of Compiler Frontend Backend Lexer Parser

    Runtime Environment Tokens Go Execution AST AST
  24. GopherCon AU func ExecuteNode(node Node, env *Environment) interface{} { switch

    n := node.(type) { case *ProgramNode: var result interface{} for _, function := range n.Functions { env.Set(function.Name, function) } if mainFunc, ok := env.Get("main"); ok { if mainFuncObj, isFunc := mainFunc.(*FuncDeclarationNode); isFunc { newEnv := NewEnvironment() newEnv.outer = env for _, stmt := range mainFuncObj.Body { result = ExecuteNode(stmt, newEnv) } } } return result
  25. GopherCon AU case *FuncDeclarationNode: return n case *FunctionCallNode: if function,

    ok := env.Get(n.FunctionName); ok { if funcObj, isUserDefined := function.(*FuncDeclarationNode); isUserDefined { newEnv := NewEnvironment() newEnv.outer = env if len(n.Arguments) != len(funcObj.Parameters) { panic(fmt.Sprintf("Expected %d arguments but got %d", len(funcObj.Parameters), len(n.Arguments))) } for i, param := range funcObj.Parameters { newEnv.Set(param.Name, ExecuteNode(n.Arguments[i], env)) } var result interface{} for _, stmt := range funcObj.Body { result = ExecuteNode(stmt, newEnv) } return result ...
  26. GopherCon AU case *AssignmentNode: val := ExecuteNode(n.Value, env) env.Set(n.VarName, val)

    return val case *BinOpNode: left := ExecuteNode(n.Left, env) right := ExecuteNode(n.Right, env) if lInt, lOk := left.(int); lOk { if rInt, rOk := right.(int); rOk { switch n.Op { case "+": return lInt + rInt ... case "/": if rInt == 0 { panic("Division by zero.") } return lInt / rInt case "%": return lInt % rInt default: panic("Unknown operator: " + n.Op) } } }
  27. GopherCon AU Previous Compiler looks like Frontend Backend Lexer Parser

    Runtime Environment Tokens Go Execution AST AST
  28. GopherCon AU The problems of the previous design 1 Difficulty

    in optimizing runtimes per machine (e.g., kernel API, custom runtimes). 2 Cross-compilation requires extensive time to adapt machine instruction sets. 3 Non-standard configurations demand frequent updates with new instruction sets. 4 Additional compilation learning requirements may deter contributors.
  29. GopherCon AU Introduction of LLVM LLVM is a set of

    compiler and toolchain technologie that can be used to develop a frontend for any programming language and a backend for any instruction set architecture. LLVM is designed around a language-independent intermediate representation (IR) that serves as a portable, high-level assembly language that can be optimized with a variety of transformations over multiple passes
  30. GopherCon AU brew install llvm@17 # ~/.zshrc or ~/.bashrc export

    LDFLAGS="-L/opt/homebrew/opt/llvm@17/lib" export CPPFLAGS="-I/opt/homebrew/opt/llvm@17/include"
  31. GopherCon AU func main() { ctx := llvm.NewContext() defer ctx.Dispose()

    module := ctx.NewModule("example") defer module.Dispose() intType := ctx.Int32Type() funcType := llvm.FunctionType(intType, []llvm.Type{intType, intType}, false) function := llvm.AddFunction(module, "add", funcType) block := ctx.AddBasicBlock(function, "entry") builder := ctx.NewBuilder() defer builder.Dispose() builder.SetInsertPointAtEnd(block) arg1 := function.Param(0) arg2 := function.Param(1) result := builder.CreateAdd(arg1, arg2, "result") builder.CreateRet(result) module.Dump() fmt.Println("LLVM IR generated successfully.") }
  32. GopherCon AU go run -tags=llvm17 compiler_demo.go ; ModuleID = 'example'

    source_filename = "example" define i32 @add(i32 %0, i32 %1) { entry: %result = add i32 %0, %1 ret i32 %result } LLVM IR generated successfully.
  33. GopherCon AU To the conclusion… 1 Go is a suitable

    language for building programs such as compilers 2 There is a clear challenge in using LLVM bindings with Go due to the lack of official documentation on this subject. 3 Developing a programming language broadens your perspective on various languages, so it’s recommended to give it a try
  34. GopherCon AU • As an experimental language • Lightweight Threading

    • Native AsyncIO (backend: io_uring) support • Gradual Typed Language • Using GC (Garbage Collector) Future works