Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Writing a Tiny Compiler

Writing a Tiny Compiler

Given at try! Swift NYC on September 1, 2016.
http://www.tryswiftnyc.com
Code at https://github.com/segiddins/Sipquick

Samuel E. Giddins

September 01, 2016
Tweet

More Decks by Samuel E. Giddins

Other Decks in Technology

Transcript

  1. Follow Along $ cloc Sipquick ------------------------------------------------------------------------------- Language files blank comment

    code ------------------------------------------------------------------------------- Swift 6 43 0 310 ------------------------------------------------------------------------------- SUM: 6 43 0 310 -------------------------------------------------------------------------------
  2. Why write a compiler? → Better idea of how compilers

    work → It's a well-known problem domain → It's doable in any language → I've never written one before
  3. What is a compiler? A compiler transforms a program written

    in one language into another language
  4. How do I write a compiler? → Parse → Lex

    → Semantic Analysis → Optimize → Optimize → Generate Code
  5. Sipquick A small language invented for the purpose of this

    talk. → S-expressions → Dynamic → Functional-ish
  6. Parser let schemeParser: Parser<String, [Sexp]> = { return ignoreSecond( sexp().

    then(unichar("\n").maybe()). maybeMany(). fmap { $0.map { $0.0 } } .then(dropEndingMetadata().or(empty().andThatsAll()))) }
  7. Lexer We turn this mess of Parser generics into our

    AST (in the form of sexps) inline with the parsing.
  8. Semantic Analysis Figuring out what our expressions actually express. In

    most compilers, this stage would guarentee the semantics of your program are somehow "well- formed".
  9. Code Generation In my opinion, the hardest part of a

    compiler. Especially if you're not compiling something like C.
  10. Code Generation I chose to target Ruby, since it's dynamic,

    has a robust standard library, and I know it better than is healthy.
  11. Code Generation The gist of our code gen step is

    that we map Expressions into Ruby code snippets. After joining them together, adding a shebang, and running chmod, we have a file we can run.
  12. Code Generation ; fibonacci.spq (def fib x (condition (<= x

    1) x (+ (fib (- x 1)) (fib (- x 2))))) (print (fib 10)) #!/usr/bin/env ruby def fib(x) if (x.<=(1)) x else (fib((x.-(1))).+(fib((x.-(2))))) end end print(fib(10))
  13. Code Generation extension Expression { func parenthesize(_ s: String) ->

    String { return "(\(s))" } var isOperatorCall: Bool { switch kind { case .call: guard let funcName = args.first else { return false } switch funcName { case "+", "-", "*", "%", "/", "||", "&&", "&", "|", "==", ">", "<", ">=", "<=", "[]", "..": return true default: return false } default: return false } } func asRuby(depth: Int = 0) -> String { let indent = String(repeating: " ", count: 2 * depth) switch kind { case .bare: return args.joined(separator: " ") case .call: if isOperatorCall { let op = args.first! let rec = children.first! let opArgs = children.dropFirst() return parenthesize(rec.asRuby(depth: depth + 1)) + ".\(op)" + parenthesize(opArgs.map { $0.asRuby(depth: depth + 1) }.joined(separator: ", ")) } else { return args.joined(separator: " ") + "(" + children.map { "(\($0.asRuby(depth: depth + 1)))" }.joined(separator: ",\n\(indent)") + ")" } case .functionDefinition: let name = args.first! let argNames = args.dropFirst() return "def \(name)(\(argNames.joined(separator: ", ")))\n" + children.map { indent + $0.asRuby(depth: depth + 1) }.joined(separator: "\n") + "\nend" case .empty: return "" case .variableDeclaration: let varName = args.joined(separator: " ") return "\(varName) = (\(children.map {$0.asRuby(depth: depth + 1)}.joined(separator: ", ")))" case .conditional: guard children.count == 3 else { fatalError("a conditional must have exactly three arguments") } let conditional = children[0] let positive = children[1] let negative = children[2] return "if \(conditional.asRuby(depth: depth + 1))\n\(indent) \(positive.asRuby(depth: depth + 1))\nelse\n\(indent) \(negative.asRuby(depth: depth + 1))\nend" } } }
  14. Code Generation extension Expression { func parenthesize(_ s: String) ->

    String var isOperatorCall: Bool { get } func asRuby(depth: Int = 0) -> String { switch kind { case .bare: { } case .call: if isOperatorCall { } else { } case .functionDefinition: { } case .empty: { } case .variableDeclaration: { } case .conditional: guard children.count == 3 else { } let conditional = children[0] let positive = children[1] let negative = children[2] { } } } }
  15. Using The Compiler $ cat fibonacci.spq (def fib x (condition

    (<= x 1) x (+ (fib (- x 1)) (fib (- x 2))))) (puts (fib ([] ARGV 0))) $ sipquick fibonacci.spq fibonacci $ ./fibonacci 10 55
  16. Testing The Compiler → Integration tests → Compile && Run

    && Verify → Only testing positive cases → 0 unit test coverage
  17. Testing The Compiler (def print_even x (condition (== (% x

    2) 0) (print "even") (print "odd"))) (print_even 17) (print_even 12) (print_even -1) (print_even 1) (print_even (* 0 1)) ///// it allows branching 0 oddevenoddoddeven
  18. Testing The Compiler import Foundation.NSString struct Test { let script:

    String let name: String let expectedOutput: String let expectedExit: Int init(script: String) { self.script = script let contents = try! String.init(contentsOfFile: script) let metadata = contents.components(separatedBy: "/////\n")[1].components(separatedBy: "\n") self.name = metadata[0] self.expectedExit = Int(metadata[1])! self.expectedOutput = metadata.dropFirst(2).joined(separator: "\n").trimmingCharacters(in: .whitespacesAndNewlines) } func run() -> (Bool, String) { let (compileOutput, compileStatus) = sipquick_test .run(path: sipquick_path, arguments: [script, "/private/var/tmp/sipquick-test \(name).exe"]) guard compileStatus == 0 else { return (false, "failed to compile \(name):\n\(compileOutput)") } let (output, status) = sipquick_test .run(path: "/private/var/tmp/sipquick-test \(name).exe", arguments: []) let success = output == expectedOutput && status == expectedExit let errorMessage = "failed \(name): got \(output.debugDescription) (\(status)), expected \(expectedOutput.debugDescription) (\(expectedExit))" return (success, errorMessage) } }
  19. Testing The Compiler let sipquick_path = String(CommandLine.arguments[0].characters.dropLast(5)) let specDirectory =

    "/Users/segiddins/Desktop/Sipquick/sipquick-spec/" let specFiles = try! FileManager().contentsOfDirectory(atPath: specDirectory).filter { $0.hasSuffix(".spq") }.map { specDirectory + $0 } let tests = specFiles.map(Test.init) let failures = tests.map { $0.run() }.filter { $0.0 == false } if failures.isEmpty { exit(EXIT_SUCCESS) } failures.map { $0.1 }.forEach { print($0) } exit(EXIT_FAILURE)
  20. Testing the Compiler (def fizzbuzz x (condition (<= x 0)

    (return "") (condition (== 0 (% x 15)) (+ (fizzbuzz (- x 1)) "fizzbuzz") (condition (== 0 (% x 3)) (+ (fizzbuzz (- x 1)) "fizz") (condition (== 0 (% x 5)) (+ (fizzbuzz (- x 1)) "buzz") (+ (fizzbuzz (- x 1)) (String x))))))) (def fetch_arg position default (condition (!= nil ([] ARGV position)) ([] ARGV position) (return default))) (print (fizzbuzz (fetch_arg 0 100))) ///// it computes fizzbuzz 0 12fizz4buzzfizz78fizzbuzz11fizz1314fizzbuzz1617fizz19buzzfizz2223fizzbuzz26fizz2 829fizzbuzz3132fizz34buzzfizz3738fizzbuzz41fizz4344fizzbuzz4647fizz49buzzfizz525 3fizzbuzz56fizz5859fizzbuzz6162fizz64buzzfizz6768fizzbuzz71fizz7374fizzbuzz7677f izz79buzzfizz8283fizzbuzz86fizz8889fizzbuzz9192fizz94buzzfizz9798fizzbuzz
  21. TODO → Refactor Expression to be an enum → Add

    optimizations → Allow defining new variables in function scope → Add parsing error messages → Add semantic analysis errors → Compile to machine code
  22. TODO → Implement a proper standard library → Actually implementing

    comment parsing → Add lambdas → Multiple-expression expressions
  23. Lessons Learned → Writing a compiler is hard → Testing

    a compiler is really, really, really necessary → String parsing needs a better interface → Error messages are hard → The LLVM API is meant for typed languages → Implementing your own language is super rewarding
  24. Lessons Learned "Real" programming languages are far superior to anything

    I can write in a weekend. They take expertise and time and care and discipline. I'll keep that in mind next time I want to complain about swiftc.