Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Writing a Tiny Compiler

Writing a Tiny Compiler

Given at try! Swift NYC on September 1, 2016.
http://www.tryswiftnyc.com
Code at https://github.com/segiddins/Sipquick

Samuel E. Giddins

September 01, 2016
Tweet

More Decks by Samuel E. Giddins

Other Decks in Technology

Transcript

  1. Writing a Tiny
    Compiler
    Samuel Giddins

    View Slide

  2. Follow Along
    https://github.com/segiddins/Sipquick

    View Slide

  3. Follow Along
    $ cloc Sipquick
    -------------------------------------------------------------------------------
    Language files blank comment code
    -------------------------------------------------------------------------------
    Swift 6 43 0 310
    -------------------------------------------------------------------------------
    SUM: 6 43 0 310
    -------------------------------------------------------------------------------

    View Slide

  4. Why write a compiler?

    View Slide

  5. Why write a compiler?
    → Better idea of how compilers work
    → It's a well-known problem domain
    → It's doable in any language
    → I've never written one before

    View Slide

  6. What is a compiler?
    let compiler: (String) -> Data /* Executable */

    View Slide

  7. What is a compiler?
    A compiler transforms source into an executable

    View Slide

  8. What is a compiler?
    A compiler transforms a program written in one
    language into another language

    View Slide

  9. How do I write a compiler?

    View Slide

  10. How do I write a compiler?
    → Parse
    → Lex
    → Semantic Analysis
    → Optimize
    → Optimize
    → Generate Code

    View Slide

  11. Compilers are highly
    functional

    View Slide

  12. Sipquick
    noun a small, energetic bird in The Wise Man's Fear

    View Slide

  13. Sipquick
    A small language invented for the purpose of this
    talk.
    → S-expressions
    → Dynamic
    → Functional-ish

    View Slide

  14. Sipquick
    (def hello name (+ "Hello, " name))
    (print (hello "world!"))

    View Slide

  15. Parser
    A parser-combinator, taking several ideas from
    Yasuhiro Inami's try! Swift Tokyo talk.

    View Slide

  16. Parser
    let schemeParser: Parser = {
    return ignoreSecond(
    sexp().
    then(unichar("\n").maybe()).
    maybeMany().
    fmap { $0.map { $0.0 } }
    .then(dropEndingMetadata().or(empty().andThatsAll())))
    }

    View Slide

  17. Lexer
    We turn this mess of Parser generics into our AST (in
    the form of sexps) inline with the parsing.

    View Slide

  18. Lexer
    _.fmap { Sexp.single($0) }
    _.fmap { Sexp.many($0) }
    _.fmap { _ in Sexp.none }

    View Slide

  19. Semantic Analysis
    let sema: ([Sexp]) -> [Expression]

    View Slide

  20. Semantic Analysis
    struct Expression {
    init(sexp: Sexp) {
    switch exp {
    ...
    }
    }
    }

    View Slide

  21. Semantic Analysis
    Figuring out what our expressions actually express.
    In most compilers, this stage would guarentee the
    semantics of your program are somehow "well-
    formed".

    View Slide

  22. Optimization
    let optimizations: Array<[Expression] -> [Expression]>

    View Slide

  23. Optimization
    The Sipquick compiler does none.

    View Slide

  24. Code Generation
    let codeGen: ([Expression]) -> Data

    View Slide

  25. Code Generation
    Transforming expressions into something that is
    loosely "executable".

    View Slide

  26. Code Generation
    In my opinion, the hardest part of a compiler.
    Especially if you're not compiling something like C.

    View Slide

  27. View Slide

  28. Code Generation
    I chose to target Ruby, since it's dynamic, has a
    robust standard library, and I know it better than is
    healthy.

    View Slide

  29. Code Generation
    extension Expression {
    func asRuby() -> String {
    switch self.type {
    ...
    }
    }
    }

    View Slide

  30. Code Generation
    The gist of our code gen step is that we map
    Expressions into Ruby code snippets. After joining
    them together, adding a shebang, and running chmod,
    we have a file we can run.

    View Slide

  31. Code Generation
    ; fibonacci.spq
    (def fib x (condition (<= x 1) x (+ (fib (- x 1)) (fib (- x 2)))))
    (print (fib 10))
    #!/usr/bin/env ruby
    def fib(x)
    if (x.<=(1))
    x
    else
    (fib((x.-(1))).+(fib((x.-(2)))))
    end
    end
    print(fib(10))

    View Slide

  32. Code Generation
    extension Expression {
    func parenthesize(_ s: String) -> String { return "(\(s))" }
    var isOperatorCall: Bool {
    switch kind {
    case .call:
    guard let funcName = args.first else { return false }
    switch funcName {
    case "+", "-", "*", "%", "/", "||", "&&", "&", "|", "==", ">", "<", ">=", "<=", "[]", "..":
    return true
    default:
    return false
    }
    default:
    return false
    }
    }
    func asRuby(depth: Int = 0) -> String {
    let indent = String(repeating: " ", count: 2 * depth)
    switch kind {
    case .bare:
    return args.joined(separator: " ")
    case .call:
    if isOperatorCall {
    let op = args.first!
    let rec = children.first!
    let opArgs = children.dropFirst()
    return parenthesize(rec.asRuby(depth: depth + 1)) + ".\(op)" + parenthesize(opArgs.map { $0.asRuby(depth: depth + 1) }.joined(separator: ", "))
    }
    else {
    return args.joined(separator: " ") + "(" + children.map { "(\($0.asRuby(depth: depth + 1)))" }.joined(separator: ",\n\(indent)") + ")"
    }
    case .functionDefinition:
    let name = args.first!
    let argNames = args.dropFirst()
    return "def \(name)(\(argNames.joined(separator: ", ")))\n" + children.map { indent + $0.asRuby(depth: depth + 1) }.joined(separator: "\n") + "\nend"
    case .empty:
    return ""
    case .variableDeclaration:
    let varName = args.joined(separator: " ")
    return "\(varName) = (\(children.map {$0.asRuby(depth: depth + 1)}.joined(separator: ", ")))"
    case .conditional:
    guard children.count == 3 else {
    fatalError("a conditional must have exactly three arguments")
    }
    let conditional = children[0]
    let positive = children[1]
    let negative = children[2]
    return "if \(conditional.asRuby(depth: depth + 1))\n\(indent) \(positive.asRuby(depth: depth + 1))\nelse\n\(indent) \(negative.asRuby(depth: depth + 1))\nend"
    }
    }
    }

    View Slide

  33. Code Generation
    extension Expression {
    func parenthesize(_ s: String) -> String
    var isOperatorCall: Bool { get }
    func asRuby(depth: Int = 0) -> String {
    switch kind {
    case .bare: { }
    case .call:
    if isOperatorCall { }
    else { }
    case .functionDefinition: { }
    case .empty: { }
    case .variableDeclaration: { }
    case .conditional:
    guard children.count == 3 else { }
    let conditional = children[0]
    let positive = children[1]
    let negative = children[2]
    { }
    }
    }
    }

    View Slide

  34. Using The Compiler
    $ cat fibonacci.spq
    (def fib x (condition (<= x 1) x (+ (fib (- x 1)) (fib (- x 2)))))
    (puts (fib ([] ARGV 0)))
    $ sipquick fibonacci.spq fibonacci
    $ ./fibonacci 10
    55

    View Slide

  35. Testing The Compiler

    View Slide

  36. Testing The Compiler
    → Integration tests
    → Compile && Run && Verify
    → Only testing positive cases
    → 0 unit test coverage

    View Slide

  37. Testing The Compiler
    (def print_even x (condition (== (% x 2) 0) (print "even") (print "odd")))
    (print_even 17)
    (print_even 12)
    (print_even -1)
    (print_even 1)
    (print_even (* 0 1))
    /////
    it allows branching
    0
    oddevenoddoddeven

    View Slide

  38. Testing The Compiler
    import Foundation.NSString
    struct Test {
    let script: String
    let name: String
    let expectedOutput: String
    let expectedExit: Int
    init(script: String) {
    self.script = script
    let contents = try! String.init(contentsOfFile: script)
    let metadata = contents.components(separatedBy: "/////\n")[1].components(separatedBy: "\n")
    self.name = metadata[0]
    self.expectedExit = Int(metadata[1])!
    self.expectedOutput = metadata.dropFirst(2).joined(separator: "\n").trimmingCharacters(in: .whitespacesAndNewlines)
    }
    func run() -> (Bool, String) {
    let (compileOutput, compileStatus) = sipquick_test
    .run(path: sipquick_path, arguments: [script, "/private/var/tmp/sipquick-test \(name).exe"])
    guard compileStatus == 0 else {
    return (false, "failed to compile \(name):\n\(compileOutput)")
    }
    let (output, status) = sipquick_test
    .run(path: "/private/var/tmp/sipquick-test \(name).exe", arguments: [])
    let success = output == expectedOutput && status == expectedExit
    let errorMessage = "failed \(name): got \(output.debugDescription) (\(status)), expected \(expectedOutput.debugDescription) (\(expectedExit))"
    return (success, errorMessage)
    }
    }

    View Slide

  39. Testing The Compiler
    let sipquick_path = String(CommandLine.arguments[0].characters.dropLast(5))
    let specDirectory = "/Users/segiddins/Desktop/Sipquick/sipquick-spec/"
    let specFiles = try! FileManager().contentsOfDirectory(atPath: specDirectory).filter { $0.hasSuffix(".spq") }.map { specDirectory + $0 }
    let tests = specFiles.map(Test.init)
    let failures = tests.map { $0.run() }.filter { $0.0 == false }
    if failures.isEmpty { exit(EXIT_SUCCESS) }
    failures.map { $0.1 }.forEach { print($0) }
    exit(EXIT_FAILURE)

    View Slide

  40. Testing the Compiler
    (def fizzbuzz x
    (condition (<= x 0) (return "")
    (condition (== 0 (% x 15)) (+ (fizzbuzz (- x 1)) "fizzbuzz")
    (condition (== 0 (% x 3)) (+ (fizzbuzz (- x 1)) "fizz")
    (condition (== 0 (% x 5)) (+ (fizzbuzz (- x 1)) "buzz")
    (+ (fizzbuzz (- x 1)) (String x)))))))
    (def fetch_arg position default
    (condition (!= nil ([] ARGV position))
    ([] ARGV position)
    (return default)))
    (print (fizzbuzz (fetch_arg 0 100)))
    /////
    it computes fizzbuzz
    0
    12fizz4buzzfizz78fizzbuzz11fizz1314fizzbuzz1617fizz19buzzfizz2223fizzbuzz26fizz2
    829fizzbuzz3132fizz34buzzfizz3738fizzbuzz41fizz4344fizzbuzz4647fizz49buzzfizz525
    3fizzbuzz56fizz5859fizzbuzz6162fizz64buzzfizz6768fizzbuzz71fizz7374fizzbuzz7677f
    izz79buzzfizz8283fizzbuzz86fizz8889fizzbuzz9192fizz94buzzfizz9798fizzbuzz

    View Slide

  41. TODO
    → Refactor Expression to be an enum
    → Add optimizations
    → Allow defining new variables in function scope
    → Add parsing error messages
    → Add semantic analysis errors
    → Compile to machine code

    View Slide

  42. TODO
    → Implement a proper standard library
    → Actually implementing comment parsing
    → Add lambdas
    → Multiple-expression expressions

    View Slide

  43. Lessons Learned

    View Slide

  44. Lessons Learned
    → Writing a compiler is hard
    → Testing a compiler is really, really, really necessary
    → String parsing needs a better interface
    → Error messages are hard
    → The LLVM API is meant for typed languages
    → Implementing your own language is super rewarding

    View Slide

  45. Lessons Learned
    "Real" programming languages are far superior to
    anything I can write in a weekend. They take
    expertise and time and care and discipline. I'll keep
    that in mind next time I want to complain about
    swiftc.

    View Slide

  46. Thank You!
    !
    @segiddins

    View Slide