Pro Yearly is on sale from $80 to $50! »

Compilers for Free

Cd9b247e4507fed75312e9a42070125d?s=47 Tom Stuart
November 09, 2013

Compilers for Free

Partial evaluation is a powerful tool for timeshifting some aspects of a program's execution from the future into the present. Among other things, it gives us an automatic way to turn a general, abstract program into a faster, more specialized one.

This math-free talk uses Ruby to explain how partial evaluation works, how it can be used to make programs go faster, and how it compares to ideas like currying and partial application from the world of functional programming. It then investigates what happens when you run a partial evaluator on itself, and reveals some surprising results about how these techniques can be used to automatically generate compilers instead of writing them from scratch.

Given at RubyConf 2013 (http://lanyrd.com/2013/rubyconf). A video and transcript are available at http://codon.com/compilers-for-free.

Cd9b247e4507fed75312e9a42070125d?s=128

Tom Stuart

November 09, 2013
Tweet

Transcript

  1. COMPILERS @tomstuart / RubyConf 2013 / 2013-11-09 fo FREE

  2. I’m fascinated by metaprogramming. Let’s talk about metaprogramming.

  3. As Ruby programmers we already intuitively understand the power of

    metaprogramming: programs that write programs.
  4. But there's another thing that I find even more interesting:

    programs that manipulate representations of other programs.
  5. I’d like to make you look at programs differently.

  6. EXECUTING PROGRAMS

  7. a program a machine

  8. input 1 input 2 output input 3 ⋮

  9. This only works if the program is written in a

    language that the machine understands.
  10. If it’s written in an unfamiliar language, we need an

    interpreter or compiler.
  11. INTERPRETERS

  12. How do interpreters work?

  13. • read some source code • build an AST by

    parsing the source code • evaluate the program by walking over the AST and performing instructions
  14. SIMPLE a simple imperative language) a = 19 + 23

    x = 2; y = x * 3 if (a < 10) { a = 0; b = 0 } else { b = 10 } x = 1; while (x < 5) { x = x * 3 }
  15. None
  16. grammar Simple rule statement sequence end rule sequence first:sequenced_statement ';

    ' second:sequence / sequenced_statement end rule sequenced_statement while / assign / if end rule while 'while (' condition:expression ') { ' body:statement ' }' end rule assign name:[a-z]+ ' = ' expression end rule if
  17. rule if 'if (' condition:expression ') { ' consequence:statement '

    } else { ' alternative:statement ' }' end rule expression less_than end rule less_than left:add ' < ' right:less_than / add end rule add left:multiply ' + ' right:add / multiply end rule multiply left:term ' * ' right:multiply /
  18. rule multiply left:term ' * ' right:multiply / term end

    rule term number / boolean / variable end rule number [0-9]+ end rule boolean ('true' / 'false') ![a-z] end rule variable [a-z]+ end end
  19. >> Treetop.load('simple.treetop') => SimpleParser >> SimpleParser.new.parse('x = 2; y =

    x * 3') => SyntaxNode+Sequence1+Sequence0 offset=0, "x = 2; y = x * 3" (first,second): SyntaxNode+Assign1+Assign0 offset=0, "x = 2" (name,expression): SyntaxNode offset=0, "x": SyntaxNode offset=0, "x" SyntaxNode offset=1, " = " SyntaxNode+Number0 offset=4, "2": SyntaxNode offset=4, "2" SyntaxNode offset=5, "; " SyntaxNode+Assign1+Assign0 offset=7, "y = x * 3" (name,expression): SyntaxNode offset=7, "y": SyntaxNode offset=7, "y" SyntaxNode offset=8, " = " SyntaxNode+Multiply1+Multiply0 offset=11, "x * 3" (left,right): SyntaxNode+Variable0 offset=11, "x": SyntaxNode offset=11, "x" SyntaxNode offset=12, " * " SyntaxNode+Number0 offset=15, "3": SyntaxNode offset=15, "3"
  20. Number = Struct.new :value Boolean = Struct.new :value Variable =

    Struct.new :name Add = Struct.new :left, :right Multiply = Struct.new :left, :right LessThan = Struct.new :left, :right Assign = Struct.new :name, :expression If = Struct.new :condition, :consequence, :alternative Sequence = Struct.new :first, :second While = Struct.new :condition, :body
  21. end rule boolean ('true' / 'false') ![a-z] end rule variable

    [a-z]+ rule number [0-9]+ end { def to_ast Number.new(text_value.to_i) end } { def to_ast Boolean.new(text_value == 'true') end } { def to_ast Variable.new(text_value.to_sym) end }
  22. end rule while 'while (' condition:expression ') { ' body:statement

    ' }' end rule assign name:[a-z]+ ' = ' expression end rule if 'if (' condition:expression ') { ' consequence:statement ' } else { ' alternative:statement ' }' { def to_ast While.new(condition.to_ast, body.to_ast) end } { def to_ast Assign.new(name.text_value.to_sym, expression.to_ast) end } { def to_ast If.new(condition.to_ast, consequence.to_ast, alternative.to_ast) end }
  23. >> SimpleParser.new.parse('x = 2; y = x * 3').to_ast =>

    #<struct Sequence first=#<struct Assign name=:x, expression=#<struct Number value=2> >, second=#<struct Assign name=:y, expression=#<struct Multiply left=#<struct Variable name=:x>, right=#<struct Number value=3> > > >
  24. Sequence Assign Assign Number Multiply Variable Number :x :y :x

    3 2 x = 2; y = x * 3 “ ”
  25. class Number def evaluate(environment) value end end class Boolean def

    evaluate(environment) value end end class Variable def evaluate(environment) environment[name] end end
  26. >> Number.new(3).evaluate({}) => 3 >> Boolean.new(false).evaluate({}) => false >> Variable.new(:y).evaluate({

    x: 7, y: 11 }) => 11 >> Variable.new(:y).evaluate({ x: 7, y: true }) => true
  27. class Add def evaluate(environment) left.evaluate(environment) + right.evaluate(environment) end end class

    Multiply def evaluate(environment) left.evaluate(environment) * right.evaluate(environment) end end class LessThan def evaluate(environment) left.evaluate(environment) < right.evaluate(environment) end end
  28. >> Multiply.new(Variable.new(:x), Variable.new(:y)). evaluate({ x: 2, y: 3 }) =>

    6 >> LessThan.new(Variable.new(:x), Variable.new(:y)). evaluate({ x: 2, y: 3 }) => true
  29. class Assign def evaluate(environment) environment.merge({ name => expression.evaluate(environment) }) end

    end class If def evaluate(environment) if condition.evaluate(environment) consequence.evaluate(environment) else alternative.evaluate(environment) end end end class Sequence def evaluate(environment) second.evaluate(first.evaluate(environment)) end end class While def evaluate(environment) if condition.evaluate(environment) evaluate(body.evaluate(environment)) else environment end end end
  30. >> Assign.new(:x, Number.new(1)).evaluate({}) => {:x=>1} >> Sequence.new( Assign.new(:x, Number.new(1)), Assign.new(:x,

    Number.new(2)) ).evaluate({}) => {:x=>2} >> SimpleParser.new.parse('x = 2; y = x * 3'). to_ast.evaluate({}) => {:x=>2, :y=>6} >> SimpleParser.new.parse('x = 1; while (x < 5) { x = x * 3 }'). to_ast.evaluate({}) => {:x=>9}
  31. Interpreters provide single-stage execution.

  32. source program input output interpreter “run time”

  33. COMPILERS

  34. How do compilers work?

  35. • read some source code • build an AST by

    parsing the source code • generate target code by walking over the AST and emitting instructions
  36. require 'json' class Number def to_javascript "function (e) { return

    #{JSON.dump(value)}; }" end end class Boolean def to_javascript "function (e) { return #{JSON.dump(value)}; }" end end class Variable def to_javascript "function (e) { return e[#{JSON.dump(name)}]; }" end end
  37. class Add def to_javascript "function (e) { return #{left.to_javascript}(e) +

    #{right.to_javascript}(e); }" end end class Multiply def to_javascript "function (e) { return #{left.to_javascript}(e) * #{right.to_javascript}(e); }" end end class LessThan def to_javascript "function (e) { return #{left.to_javascript}(e) < #{right.to_javascript}(e); }" end end
  38. class Assign def to_javascript "function (e) { e[#{JSON.dump(name)}] = #{expression.to_javascript}(e);

    return e; }" end end class If def to_javascript "function (e) { if (#{condition.to_javascript}(e))" + " { return #{consequence.to_javascript}(e); }" + " else { return #{alternative.to_javascript}(e); }" + ' }' end end class Sequence def to_javascript "function (e) { return #{second.to_javascript}(#{first.to_javascript}(e)); }" end end class While def to_javascript 'function (e) {' + " while (#{condition.to_javascript}(e)) { e = #{body.to_javascript}(e); }" + ' return e;' + ' }' end end
  39. >> SimpleParser.new.parse('x = 1; while (x < 5) { x

    = x * 3 }'). to_ast.to_javascript => "function (e) { return function (e) { while (function (e) { return function (e) { return e[\"x\"]; }(e) < function (e) { return 5; }(e); }(e)) { e = function (e) { e[\"x\"] = function (e) { return function (e) { return e[\"x\"]; }(e) * function (e) { return 3; }(e); }(e); return e; }(e); } return e; }(function (e) { e[\"x\"] = function (e) { return 1; }(e); return e; }(e)); }"
  40. > program = function (e) { return function (e) {

    while (function (e) { return function (e) { return e["x"]; }(e) < function (e) { return 5; }(e); }(e)) { e = function (e) { e["x"] = function (e) { return function (e) { return e["x"]; }(e) * function (e) { return 3; }(e); }(e); return e; }(e); } return e; }(function (e) { e["x"] = function (e) { return 1; }(e); return e; }(e)); } [Function] > program({}) { x: 9 }
  41. Compilers provide two-stage execution.

  42. output source program input target program compiler target program “compile

    time” “run time”
  43. The good news: compiled programs are faster. • removes interpretive

    overhead at run time • other performance benefits e.g. clever data structures, clever optimizations
  44. • have to think about two times instead of one

    • have to implement in two languages instead of one • compiling dynamic languages is challenging The bad news: compilation is harder.
  45. Writing an interpreter is easier, but interpreters are slower.

  46. PARTIAL EVALUATORS

  47. A partial evaluator is a cross between an interpreter and

    a compiler.
  48. interpreters compilers execute now generate now, execute later partial evaluators

    execute some now, leave the rest to execute later
  49. You give a partial evaluator your subject program and some

    of its inputs. It evaluates only the parts of the program that depend on those inputs. What’s left is called the residual program.
  50. input 1 input 2 output subject program input 3 ⋮

  51. input 1 ⋮ output residual program partial evaluator residual program

    subject program input 2 input 3
  52. A partial evaluator splits single-stage execution into two stages by

    timeshifting the processing of some input from the future to the present.
  53. How does a partial evaluator work?

  54. • read some source code • build an AST by

    parsing the source code • read some of the program’s inputs • analyse the program by walking over the AST to find the places where known inputs are used • partially evaluate the program by walking over the AST, evaluating fragments of code where possible and generating new code to replace them • emit a residual program
  55. def power(n, x) if n.zero? 1 else x * power(n

    - 1, x) end end def power(n, x) if n.zero? 1 else x * power(n - 1, x) end end power(5, …)
  56. def power(n, x) if n.zero? 1 else x * power(n

    - 1, x) end end def power(n, x) if n.zero? 1 else x * power(n - 1, x) end end power(5, …)
  57. def power(n, x) if n.zero? 1 else x * power(n

    - 1, x) end end def power_5(x) if false 1 else x * power(5 - 1, x) end end power(5, …)
  58. def power(n, x) if n.zero? 1 else x * power(n

    - 1, x) end end end def power_5(x) x * power(5 - 1, x) power(5, …)
  59. def power(n, x) if n.zero? 1 else x * power(n

    - 1, x) end end end def power_5(x) x * power(4, x) power(5, …)
  60. def power(n, x) if n.zero? 1 else x * power(n

    - 1, x) end end def power_5(x) x * if 4.zero? 1 else x * power(4 - 1, x) end end power(5, …)
  61. def power(n, x) if n.zero? 1 else x * power(n

    - 1, x) end end def power_5(x) x * if false 1 else x * power(4 - 1, x) end end power(5, …)
  62. x * power(4 - 1, x) end def power(n, x)

    if n.zero? 1 else x * power(n - 1, x) end end power(5, …) def power_5(x) x *
  63. def power(n, x) if n.zero? 1 else x * power(n

    - 1, x) end end def power_5(x) x * x * power(3, x) end power(5, …)
  64. def power(n, x) if n.zero? 1 else x * power(n

    - 1, x) end end def power_5(x) x * x * x * power(2, x) end power(5, …)
  65. def power(n, x) if n.zero? 1 else x * power(n

    - 1, x) end end def power_5(x) x * x * x * x * power(1, x) end power(5, …)
  66. def power_5(x) x * x * x * x *

    x * end power(0, x) def power(n, x) if n.zero? 1 else x * power(n - 1, x) end end power(5, …)
  67. def power(n, x) if n.zero? 1 else x * power(n

    - 1, x) end end def power_5(x) x * x * x * x * x * if 0.zero? 1 else x * power(0 - 1, x) end end power(5, …)
  68. def power(n, x) if n.zero? 1 else x * power(n

    - 1, x) end end def power_5(x) x * x * x * x * x * if true 1 else x * power(0 - 1, x) end end power(5, …)
  69. def power_5(x) x * x * x * x *

    x * 1 def power(n, x) if n.zero? 1 else x * power(n - 1, x) end end end power(5, …)
  70. def power(n, x) if n.zero? 1 else x * power(n

    - 1, x) end end power(5, …) def power_5(x) x * x * x * x * x * 1 end
  71. def power(n, x) if n.zero? 1 else x * power(n

    - 1, x) end end def power_5(x) x * x * x * x * x end power(5, …)
  72. def power_5(x) power(5, x) end >> power_5(2) => 32 def

    power(n, x) if n.zero? 1 else x * power(n - 1, x) end end This isn’t the same as partial application:
  73. power_5 = method(:power). to_proc. curry. call(5) >> power_5.call(2) => 32

    def power(n, x) if n.zero? 1 else x * power(n - 1, x) end end This isn’t the same as partial application:
  74. APPLICATIONS

  75. • specialize a web server to a particular config file

    • specialize a general ray tracer to a particular scene • specialize OS X OpenGL pipeline to a particular GPU Partial evaluation can take a general program and specialize it for a fixed input.
  76. COOL STORY

  77. In 1971, Yoshihiko Futamura realised something cool.

  78. input 1 input 2 output subject program input 3 ⋮

  79. input 1 ⋮ output residual program partial evaluator residual program

    subject program input 2 input 3
  80. source program input output interpreter

  81. source program output residual program partial evaluator residual program interpreter

    input
  82. source program output residual program partial evaluator residual program interpreter

    input target program target program
  83. source program output residual program partial evaluator residual program interpreter

    input target program target program a compiler!
  84. source, environment = read_source, read_environment Treetop.load('simple.treetop') ast = SimpleParser.new.parse(source).to_ast puts

    ast.evaluate(environment)
  85. source, environment = 'x = 2; y = x *

    3', read_environment Treetop.load('simple.treetop') ast = SimpleParser.new.parse(source).to_ast puts ast.evaluate(environment)
  86. environment = read_environment Treetop.load('simple.treetop') ast = SimpleParser.new.parse('x = 2; y

    = x * 3').to_ast puts ast.evaluate(environment)
  87. environment = read_environment ast = Sequence.new( Assign.new( :x, Number.new(2) ),

    Assign.new( :y, Multiply.new( Variable.new(:x), Number.new(3) ) ) ) puts ast.evaluate(environment)
  88. Sequence Assign Assign Number Multiply Variable Number :x :y :x

    3 2
  89. def evaluate(environment) second.evaluate( first.evaluate(environment) ) end Assign Assign Number Multiply

    Variable Number :x :y :x 3 2
  90. def evaluate(environment) environment.merge({ name => expression. evaluate(environment) }) end def

    evaluate(environment) environment.merge({ name => expression. evaluate(environment) }) end def evaluate(environment) second.evaluate( first.evaluate(environment) ) end Number Multiply Variable Number :x :y :x 3 2
  91. def evaluate(environment) left.evaluate(environment) * right.evaluate(environment) end def evaluate(environment) environment.merge({ name

    => expression. evaluate(environment) }) end def evaluate(environment) environment.merge({ name => expression. evaluate(environment) }) end def evaluate(environment) second.evaluate( first.evaluate(environment) ) end Number Variable Number :x :y :x 3 2
  92. def evaluate(environment) value end def evaluate(environment) value end def evaluate(environment)

    environment[name] end def evaluate(environment) left.evaluate(environment) * right.evaluate(environment) end def evaluate(environment) environment.merge({ name => expression. evaluate(environment) }) end def evaluate(environment) environment.merge({ name => expression. evaluate(environment) }) end def evaluate(environment) second.evaluate( first.evaluate(environment) ) end :x :y :x 3 2
  93. def evaluate(environment) 2 end def evaluate(environment) 3 end def evaluate(environment)

    environment[:x] end def evaluate(environment) left.evaluate(environment) * right.evaluate(environment) end def evaluate(environment) environment.merge({ name => expression. evaluate(environment) }) end def evaluate(environment) environment.merge({ name => expression. evaluate(environment) }) end def evaluate(environment) second.evaluate( first.evaluate(environment) ) end :x :y
  94. def evaluate(environment) environment[:x] * 3 end def evaluate(environment) environment.merge({ name

    => expression. evaluate(environment) }) end def evaluate(environment) environment.merge({ name => 2 }) end def evaluate(environment) second.evaluate( first.evaluate(environment) ) end :x :y
  95. def evaluate(environment) environment.merge({ :y => environment[:x] * 3 }) end

    def evaluate(environment) environment.merge({ :x => 2 }) end def evaluate(environment) second.evaluate( first.evaluate(environment) ) end
  96. def evaluate(environment) environment.merge({ :y => environment[:x] * 3 }) end

    def evaluate(environment) second.evaluate( environment.merge({ :x => 2 }) ) end
  97. def evaluate(environment) environment = environment.merge({ :x => 2 }) environment.merge({

    :y => environment[:x] * 3 }) end
  98. def evaluate(environment) environment = environment.merge({ :x => 2 }) environment.merge({

    :y => environment[:x] * 3 }) end
  99. environment = read_environment ast = Sequence.new( Assign.new( :x, Number.new(2) ),

    Assign.new( :y, Multiply.new( Variable.new(:x), Number.new(3) ) ) ) puts ast.evaluate(environment)
  100. environment = read_environment environment = environment.merge({ :x => 2 })

    puts environment.merge({ :y => environment[:x] * 3 })
  101. environment = read_environment environment = environment.merge({ :x => 2 })

    puts environment.merge({ :y => environment[:x] * 3 }) x = 2; y = x * 3 “ ”
  102. target = partial_evaluator(interpreter, source) FIRST FUTAMURA PROJECTION

  103. source program output partial evaluator interpreter input target program target

    program a compiler!
  104. source program partial evaluator interpreter target program

  105. source program interpreter residual program target program output input partial

    evaluator residual program target program partial evaluator
  106. source program interpreter residual program target program output input partial

    evaluator residual program target program partial evaluator compiler compiler
  107. source program interpreter residual program target program output input partial

    evaluator residual program target program partial evaluator compiler compiler a compiler generator!
  108. compiler = partial_evaluator(partial_evaluator, interpreter) SECOND FUTAMURA PROJECTION

  109. interpreter partial evaluator partial evaluator compiler source program target program

    output input target program compiler a compiler generator!
  110. interpreter partial evaluator partial evaluator compiler

  111. partial evaluator partial evaluator partial evaluator residual program residual program

    interpreter target program compiler source program output target program input compiler
  112. partial evaluator partial evaluator partial evaluator residual program residual program

    interpreter target program compiler source program output target program input compiler compiler generator compiler generator
  113. partial evaluator partial evaluator partial evaluator residual program residual program

    interpreter target program compiler source program output target program input compiler compiler generator compiler generator a compiler generator generator!
  114. compiler_generator = partial_evaluator(partial_evaluator, partial_evaluator) THIRD FUTAMURA PROJECTION

  115. This is a fully general technique for generating compilers, so

    it only removes the interpretive overhead — it doesn’t invent new data structures or exploit anything language- or platform-specific.
  116. MORE‽

  117. • “Partial Evaluation and Automatic Program Generation” — http://codon.io/pebook •

    LLVM’s JIT and the PyPy toolchain formerly Psyco use some of these program specialization techniques • Rubinius is built on top of LLVM • Topaz is a Ruby implementation built on top of the PyPy toolchain (RPython • Rubinius and JRuby have interesting and accessible compilers
  118. From Simple Machines to Impossible Programs Tom Stuart Understanding Computation

    THE END RUBYCONF (50% off ebook, 40% off print http://computationbook.com/ @tomstuart / tom@codon.com shop.oreilly.com discount code