Writing a Tiny Compiler

Writing a Tiny Compiler Samuel Giddins

Follow Along https://github.com/segiddins/Sipquick

Follow Along $ cloc Sipquick ------------------------------------------------------------------------------- Language files blank comment
code ------------------------------------------------------------------------------- Swift 6 43 0 310 ------------------------------------------------------------------------------- SUM: 6 43 0 310 -------------------------------------------------------------------------------

Why write a compiler?

Why write a compiler? → Better idea of how compilers
work → It's a well-known problem domain → It's doable in any language → I've never written one before

What is a compiler? let compiler: (String) -> Data /*
Executable */

What is a compiler? A compiler transforms source into an
executable

What is a compiler? A compiler transforms a program written
in one language into another language

How do I write a compiler?

How do I write a compiler? → Parse → Lex
→ Semantic Analysis → Optimize → Optimize → Generate Code

Compilers are highly functional

Sipquick noun a small, energetic bird in The Wise Man's
Fear

Sipquick A small language invented for the purpose of this
talk. → S-expressions → Dynamic → Functional-ish

Sipquick (def hello name (+ "Hello, " name)) (print (hello
"world!"))

Parser A parser-combinator, taking several ideas from Yasuhiro Inami's try!
Swift Tokyo talk.

Parser let schemeParser: Parser<String, [Sexp]> = { return ignoreSecond( sexp().
then(unichar("\n").maybe()). maybeMany(). fmap { $0.map { $0.0 } } .then(dropEndingMetadata().or(empty().andThatsAll()))) }

Lexer We turn this mess of Parser generics into our
AST (in the form of sexps) inline with the parsing.

Lexer _.fmap { Sexp.single($0) } _.fmap { Sexp.many($0) } _.fmap
{ _ in Sexp.none }

Semantic Analysis let sema: ([Sexp]) -> [Expression]

Semantic Analysis struct Expression { init(sexp: Sexp) { switch exp
{ ... } } }

Semantic Analysis Figuring out what our expressions actually express. In
most compilers, this stage would guarentee the semantics of your program are somehow "well- formed".

Optimization let optimizations: Array<[Expression] -> [Expression]>

Optimization The Sipquick compiler does none.

Code Generation let codeGen: ([Expression]) -> Data

Code Generation Transforming expressions into something that is loosely "executable".

Code Generation In my opinion, the hardest part of a
compiler. Especially if you're not compiling something like C.

Code Generation I chose to target Ruby, since it's dynamic,
has a robust standard library, and I know it better than is healthy.

Code Generation extension Expression { func asRuby() -> String {
switch self.type { ... } } }

Code Generation The gist of our code gen step is
that we map Expressions into Ruby code snippets. After joining them together, adding a shebang, and running chmod, we have a ﬁle we can run.

Code Generation ; fibonacci.spq (def fib x (condition (<= x
1) x (+ (fib (- x 1)) (fib (- x 2))))) (print (fib 10)) #!/usr/bin/env ruby def fib(x) if (x.<=(1)) x else (fib((x.-(1))).+(fib((x.-(2))))) end end print(fib(10))

Code Generation extension Expression { func parenthesize(_ s: String) ->
String { return "(\(s))" } var isOperatorCall: Bool { switch kind { case .call: guard let funcName = args.first else { return false } switch funcName { case "+", "-", "*", "%", "/", "||", "&&", "&", "|", "==", ">", "<", ">=", "<=", "[]", "..": return true default: return false } default: return false } } func asRuby(depth: Int = 0) -> String { let indent = String(repeating: " ", count: 2 * depth) switch kind { case .bare: return args.joined(separator: " ") case .call: if isOperatorCall { let op = args.first! let rec = children.first! let opArgs = children.dropFirst() return parenthesize(rec.asRuby(depth: depth + 1)) + ".\(op)" + parenthesize(opArgs.map { $0.asRuby(depth: depth + 1) }.joined(separator: ", ")) } else { return args.joined(separator: " ") + "(" + children.map { "(\($0.asRuby(depth: depth + 1)))" }.joined(separator: ",\n\(indent)") + ")" } case .functionDefinition: let name = args.first! let argNames = args.dropFirst() return "def \(name)(\(argNames.joined(separator: ", ")))\n" + children.map { indent + $0.asRuby(depth: depth + 1) }.joined(separator: "\n") + "\nend" case .empty: return "" case .variableDeclaration: let varName = args.joined(separator: " ") return "\(varName) = (\(children.map {$0.asRuby(depth: depth + 1)}.joined(separator: ", ")))" case .conditional: guard children.count == 3 else { fatalError("a conditional must have exactly three arguments") } let conditional = children[0] let positive = children[1] let negative = children[2] return "if \(conditional.asRuby(depth: depth + 1))\n\(indent) \(positive.asRuby(depth: depth + 1))\nelse\n\(indent) \(negative.asRuby(depth: depth + 1))\nend" } } }

Code Generation extension Expression { func parenthesize(_ s: String) ->
String var isOperatorCall: Bool { get } func asRuby(depth: Int = 0) -> String { switch kind { case .bare: { } case .call: if isOperatorCall { } else { } case .functionDefinition: { } case .empty: { } case .variableDeclaration: { } case .conditional: guard children.count == 3 else { } let conditional = children[0] let positive = children[1] let negative = children[2] { } } } }

Using The Compiler $ cat fibonacci.spq (def fib x (condition
(<= x 1) x (+ (fib (- x 1)) (fib (- x 2))))) (puts (fib ([] ARGV 0))) $ sipquick fibonacci.spq fibonacci $ ./fibonacci 10 55

Testing The Compiler

Testing The Compiler → Integration tests → Compile && Run
&& Verify → Only testing positive cases → 0 unit test coverage

Testing The Compiler (def print_even x (condition (== (% x
2) 0) (print "even") (print "odd"))) (print_even 17) (print_even 12) (print_even -1) (print_even 1) (print_even (* 0 1)) ///// it allows branching 0 oddevenoddoddeven

Testing The Compiler import Foundation.NSString struct Test { let script:
String let name: String let expectedOutput: String let expectedExit: Int init(script: String) { self.script = script let contents = try! String.init(contentsOfFile: script) let metadata = contents.components(separatedBy: "/////\n")[1].components(separatedBy: "\n") self.name = metadata[0] self.expectedExit = Int(metadata[1])! self.expectedOutput = metadata.dropFirst(2).joined(separator: "\n").trimmingCharacters(in: .whitespacesAndNewlines) } func run() -> (Bool, String) { let (compileOutput, compileStatus) = sipquick_test .run(path: sipquick_path, arguments: [script, "/private/var/tmp/sipquick-test \(name).exe"]) guard compileStatus == 0 else { return (false, "failed to compile \(name):\n\(compileOutput)") } let (output, status) = sipquick_test .run(path: "/private/var/tmp/sipquick-test \(name).exe", arguments: []) let success = output == expectedOutput && status == expectedExit let errorMessage = "failed \(name): got \(output.debugDescription) (\(status)), expected \(expectedOutput.debugDescription) (\(expectedExit))" return (success, errorMessage) } }

Testing The Compiler let sipquick_path = String(CommandLine.arguments[0].characters.dropLast(5)) let specDirectory =
"/Users/segiddins/Desktop/Sipquick/sipquick-spec/" let specFiles = try! FileManager().contentsOfDirectory(atPath: specDirectory).filter { $0.hasSuffix(".spq") }.map { specDirectory + $0 } let tests = specFiles.map(Test.init) let failures = tests.map { $0.run() }.filter { $0.0 == false } if failures.isEmpty { exit(EXIT_SUCCESS) } failures.map { $0.1 }.forEach { print($0) } exit(EXIT_FAILURE)

Testing the Compiler (def fizzbuzz x (condition (<= x 0)
(return "") (condition (== 0 (% x 15)) (+ (fizzbuzz (- x 1)) "fizzbuzz") (condition (== 0 (% x 3)) (+ (fizzbuzz (- x 1)) "fizz") (condition (== 0 (% x 5)) (+ (fizzbuzz (- x 1)) "buzz") (+ (fizzbuzz (- x 1)) (String x))))))) (def fetch_arg position default (condition (!= nil ([] ARGV position)) ([] ARGV position) (return default))) (print (fizzbuzz (fetch_arg 0 100))) ///// it computes fizzbuzz 0 12fizz4buzzfizz78fizzbuzz11fizz1314fizzbuzz1617fizz19buzzfizz2223fizzbuzz26fizz2 829fizzbuzz3132fizz34buzzfizz3738fizzbuzz41fizz4344fizzbuzz4647fizz49buzzfizz525 3fizzbuzz56fizz5859fizzbuzz6162fizz64buzzfizz6768fizzbuzz71fizz7374fizzbuzz7677f izz79buzzfizz8283fizzbuzz86fizz8889fizzbuzz9192fizz94buzzfizz9798fizzbuzz

TODO → Refactor Expression to be an enum → Add
optimizations → Allow deﬁning new variables in function scope → Add parsing error messages → Add semantic analysis errors → Compile to machine code

TODO → Implement a proper standard library → Actually implementing
comment parsing → Add lambdas → Multiple-expression expressions

Lessons Learned

Lessons Learned → Writing a compiler is hard → Testing
a compiler is really, really, really necessary → String parsing needs a better interface → Error messages are hard → The LLVM API is meant for typed languages → Implementing your own language is super rewarding

Lessons Learned "Real" programming languages are far superior to anything
I can write in a weekend. They take expertise and time and care and discipline. I'll keep that in mind next time I want to complain about swiftc.

Thank You! ! @segiddins

Writing a Tiny Compiler

Writing a Tiny Compiler

More Decks by Samuel E. Giddins

Other Decks in Technology

Featured

Transcript