Slide 1

Slide 1 text

Writing a Tiny Compiler Samuel Giddins

Slide 2

Slide 2 text

Follow Along https://github.com/segiddins/Sipquick

Slide 3

Slide 3 text

Follow Along $ cloc Sipquick ------------------------------------------------------------------------------- Language files blank comment code ------------------------------------------------------------------------------- Swift 6 43 0 310 ------------------------------------------------------------------------------- SUM: 6 43 0 310 -------------------------------------------------------------------------------

Slide 4

Slide 4 text

Why write a compiler?

Slide 5

Slide 5 text

Why write a compiler? → Better idea of how compilers work → It's a well-known problem domain → It's doable in any language → I've never written one before

Slide 6

Slide 6 text

What is a compiler? let compiler: (String) -> Data /* Executable */

Slide 7

Slide 7 text

What is a compiler? A compiler transforms source into an executable

Slide 8

Slide 8 text

What is a compiler? A compiler transforms a program written in one language into another language

Slide 9

Slide 9 text

How do I write a compiler?

Slide 10

Slide 10 text

How do I write a compiler? → Parse → Lex → Semantic Analysis → Optimize → Optimize → Generate Code

Slide 11

Slide 11 text

Compilers are highly functional

Slide 12

Slide 12 text

Sipquick noun a small, energetic bird in The Wise Man's Fear

Slide 13

Slide 13 text

Sipquick A small language invented for the purpose of this talk. → S-expressions → Dynamic → Functional-ish

Slide 14

Slide 14 text

Sipquick (def hello name (+ "Hello, " name)) (print (hello "world!"))

Slide 15

Slide 15 text

Parser A parser-combinator, taking several ideas from Yasuhiro Inami's try! Swift Tokyo talk.

Slide 16

Slide 16 text

Parser let schemeParser: Parser = { return ignoreSecond( sexp(). then(unichar("\n").maybe()). maybeMany(). fmap { $0.map { $0.0 } } .then(dropEndingMetadata().or(empty().andThatsAll()))) }

Slide 17

Slide 17 text

Lexer We turn this mess of Parser generics into our AST (in the form of sexps) inline with the parsing.

Slide 18

Slide 18 text

Lexer _.fmap { Sexp.single($0) } _.fmap { Sexp.many($0) } _.fmap { _ in Sexp.none }

Slide 19

Slide 19 text

Semantic Analysis let sema: ([Sexp]) -> [Expression]

Slide 20

Slide 20 text

Semantic Analysis struct Expression { init(sexp: Sexp) { switch exp { ... } } }

Slide 21

Slide 21 text

Semantic Analysis Figuring out what our expressions actually express. In most compilers, this stage would guarentee the semantics of your program are somehow "well- formed".

Slide 22

Slide 22 text

Optimization let optimizations: Array<[Expression] -> [Expression]>

Slide 23

Slide 23 text

Optimization The Sipquick compiler does none.

Slide 24

Slide 24 text

Code Generation let codeGen: ([Expression]) -> Data

Slide 25

Slide 25 text

Code Generation Transforming expressions into something that is loosely "executable".

Slide 26

Slide 26 text

Code Generation In my opinion, the hardest part of a compiler. Especially if you're not compiling something like C.

Slide 27

Slide 27 text

No content

Slide 28

Slide 28 text

Code Generation I chose to target Ruby, since it's dynamic, has a robust standard library, and I know it better than is healthy.

Slide 29

Slide 29 text

Code Generation extension Expression { func asRuby() -> String { switch self.type { ... } } }

Slide 30

Slide 30 text

Code Generation The gist of our code gen step is that we map Expressions into Ruby code snippets. After joining them together, adding a shebang, and running chmod, we have a file we can run.

Slide 31

Slide 31 text

Code Generation ; fibonacci.spq (def fib x (condition (<= x 1) x (+ (fib (- x 1)) (fib (- x 2))))) (print (fib 10)) #!/usr/bin/env ruby def fib(x) if (x.<=(1)) x else (fib((x.-(1))).+(fib((x.-(2))))) end end print(fib(10))

Slide 32

Slide 32 text

Code Generation extension Expression { func parenthesize(_ s: String) -> String { return "(\(s))" } var isOperatorCall: Bool { switch kind { case .call: guard let funcName = args.first else { return false } switch funcName { case "+", "-", "*", "%", "/", "||", "&&", "&", "|", "==", ">", "<", ">=", "<=", "[]", "..": return true default: return false } default: return false } } func asRuby(depth: Int = 0) -> String { let indent = String(repeating: " ", count: 2 * depth) switch kind { case .bare: return args.joined(separator: " ") case .call: if isOperatorCall { let op = args.first! let rec = children.first! let opArgs = children.dropFirst() return parenthesize(rec.asRuby(depth: depth + 1)) + ".\(op)" + parenthesize(opArgs.map { $0.asRuby(depth: depth + 1) }.joined(separator: ", ")) } else { return args.joined(separator: " ") + "(" + children.map { "(\($0.asRuby(depth: depth + 1)))" }.joined(separator: ",\n\(indent)") + ")" } case .functionDefinition: let name = args.first! let argNames = args.dropFirst() return "def \(name)(\(argNames.joined(separator: ", ")))\n" + children.map { indent + $0.asRuby(depth: depth + 1) }.joined(separator: "\n") + "\nend" case .empty: return "" case .variableDeclaration: let varName = args.joined(separator: " ") return "\(varName) = (\(children.map {$0.asRuby(depth: depth + 1)}.joined(separator: ", ")))" case .conditional: guard children.count == 3 else { fatalError("a conditional must have exactly three arguments") } let conditional = children[0] let positive = children[1] let negative = children[2] return "if \(conditional.asRuby(depth: depth + 1))\n\(indent) \(positive.asRuby(depth: depth + 1))\nelse\n\(indent) \(negative.asRuby(depth: depth + 1))\nend" } } }

Slide 33

Slide 33 text

Code Generation extension Expression { func parenthesize(_ s: String) -> String var isOperatorCall: Bool { get } func asRuby(depth: Int = 0) -> String { switch kind { case .bare: { } case .call: if isOperatorCall { } else { } case .functionDefinition: { } case .empty: { } case .variableDeclaration: { } case .conditional: guard children.count == 3 else { } let conditional = children[0] let positive = children[1] let negative = children[2] { } } } }

Slide 34

Slide 34 text

Using The Compiler $ cat fibonacci.spq (def fib x (condition (<= x 1) x (+ (fib (- x 1)) (fib (- x 2))))) (puts (fib ([] ARGV 0))) $ sipquick fibonacci.spq fibonacci $ ./fibonacci 10 55

Slide 35

Slide 35 text

Testing The Compiler

Slide 36

Slide 36 text

Testing The Compiler → Integration tests → Compile && Run && Verify → Only testing positive cases → 0 unit test coverage

Slide 37

Slide 37 text

Testing The Compiler (def print_even x (condition (== (% x 2) 0) (print "even") (print "odd"))) (print_even 17) (print_even 12) (print_even -1) (print_even 1) (print_even (* 0 1)) ///// it allows branching 0 oddevenoddoddeven

Slide 38

Slide 38 text

Testing The Compiler import Foundation.NSString struct Test { let script: String let name: String let expectedOutput: String let expectedExit: Int init(script: String) { self.script = script let contents = try! String.init(contentsOfFile: script) let metadata = contents.components(separatedBy: "/////\n")[1].components(separatedBy: "\n") self.name = metadata[0] self.expectedExit = Int(metadata[1])! self.expectedOutput = metadata.dropFirst(2).joined(separator: "\n").trimmingCharacters(in: .whitespacesAndNewlines) } func run() -> (Bool, String) { let (compileOutput, compileStatus) = sipquick_test .run(path: sipquick_path, arguments: [script, "/private/var/tmp/sipquick-test \(name).exe"]) guard compileStatus == 0 else { return (false, "failed to compile \(name):\n\(compileOutput)") } let (output, status) = sipquick_test .run(path: "/private/var/tmp/sipquick-test \(name).exe", arguments: []) let success = output == expectedOutput && status == expectedExit let errorMessage = "failed \(name): got \(output.debugDescription) (\(status)), expected \(expectedOutput.debugDescription) (\(expectedExit))" return (success, errorMessage) } }

Slide 39

Slide 39 text

Testing The Compiler let sipquick_path = String(CommandLine.arguments[0].characters.dropLast(5)) let specDirectory = "/Users/segiddins/Desktop/Sipquick/sipquick-spec/" let specFiles = try! FileManager().contentsOfDirectory(atPath: specDirectory).filter { $0.hasSuffix(".spq") }.map { specDirectory + $0 } let tests = specFiles.map(Test.init) let failures = tests.map { $0.run() }.filter { $0.0 == false } if failures.isEmpty { exit(EXIT_SUCCESS) } failures.map { $0.1 }.forEach { print($0) } exit(EXIT_FAILURE)

Slide 40

Slide 40 text

Testing the Compiler (def fizzbuzz x (condition (<= x 0) (return "") (condition (== 0 (% x 15)) (+ (fizzbuzz (- x 1)) "fizzbuzz") (condition (== 0 (% x 3)) (+ (fizzbuzz (- x 1)) "fizz") (condition (== 0 (% x 5)) (+ (fizzbuzz (- x 1)) "buzz") (+ (fizzbuzz (- x 1)) (String x))))))) (def fetch_arg position default (condition (!= nil ([] ARGV position)) ([] ARGV position) (return default))) (print (fizzbuzz (fetch_arg 0 100))) ///// it computes fizzbuzz 0 12fizz4buzzfizz78fizzbuzz11fizz1314fizzbuzz1617fizz19buzzfizz2223fizzbuzz26fizz2 829fizzbuzz3132fizz34buzzfizz3738fizzbuzz41fizz4344fizzbuzz4647fizz49buzzfizz525 3fizzbuzz56fizz5859fizzbuzz6162fizz64buzzfizz6768fizzbuzz71fizz7374fizzbuzz7677f izz79buzzfizz8283fizzbuzz86fizz8889fizzbuzz9192fizz94buzzfizz9798fizzbuzz

Slide 41

Slide 41 text

TODO → Refactor Expression to be an enum → Add optimizations → Allow defining new variables in function scope → Add parsing error messages → Add semantic analysis errors → Compile to machine code

Slide 42

Slide 42 text

TODO → Implement a proper standard library → Actually implementing comment parsing → Add lambdas → Multiple-expression expressions

Slide 43

Slide 43 text

Lessons Learned

Slide 44

Slide 44 text

Lessons Learned → Writing a compiler is hard → Testing a compiler is really, really, really necessary → String parsing needs a better interface → Error messages are hard → The LLVM API is meant for typed languages → Implementing your own language is super rewarding

Slide 45

Slide 45 text

Lessons Learned "Real" programming languages are far superior to anything I can write in a weekend. They take expertise and time and care and discipline. I'll keep that in mind next time I want to complain about swiftc.

Slide 46

Slide 46 text

Thank You! ! @segiddins