Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Compilers: Not That Hard

Bryce "BonzoESC" Kerley
October 05, 2012
150

Compilers: Not That Hard

A quick intro to compiler writing from Magic Ruby.

Bryce "BonzoESC" Kerley

October 05, 2012
Tweet

Transcript

  1. Compilers: Not That Hard Bryce Kerley Magic Ruby Walt Disney

    World Hollywood Studios October 5 2012 Friday, October 5, 12
  2. Compilers •What is a compiler? •What makes them hard? •Why

    should you write one? •How can I write one? Friday, October 5, 12
  3. What is a Compiler? Input: program in language A Output:

    program in language B Language A Language B Compiler Friday, October 5, 12
  4. Compiler 2913759 Stack-based language 3 5 add printi Should output

    “8” Friday, October 5, 12 You may recognize this as being similar to “Postscript”, “Forth”, HP calculators, or “Factor”
  5. Compiler 2913759 $ ruby simple_cambridge.rb < eight.sc > eight.rb $

    ruby eight.rb 8 $ 3 5 add printi Friday, October 5, 12 Here’s how I’d like it to work: read from stdin, output ruby to stdout
  6. Compiler 2913759 “Output Ruby?” Friday, October 5, 12 C has

    lots of ceremony Java has tons of ceremony Assembly has more ceremony than Java
  7. Compiler 2913759 Reads from STDIN Has a stack Whitespace delimited

    program = $stdin.read puts "@stack = []" split_prog = program.split Friday, October 5, 12 Since we’re outputting to STDOUT I can just use “puts” to output compiled code.
  8. Compiler 2913759 Reads from STDIN Has a stack Whitespace delimited

    program = $stdin.read puts "@stack = []" split_prog = program.split I initialize the stack Friday, October 5, 12 Since we’re outputting to STDOUT I can just use “puts” to output compiled code.
  9. Compiler 2913759 ["3", "5", "add", "printi"] Friday, October 5, 12

    These are “tokens” if you will. Each one is atomic: “3”, “add”, and “printi” can’t be split further.
  10. Compiler 2913759 ["3", "5", "add", "printi"] Push me on to

    the stack Friday, October 5, 12 These are “tokens” if you will. Each one is atomic: “3”, “add”, and “printi” can’t be split further.
  11. Compiler 2913759 ["3", "5", "add", "printi"] Push me on to

    the stack Push me too Friday, October 5, 12 These are “tokens” if you will. Each one is atomic: “3”, “add”, and “printi” can’t be split further.
  12. Compiler 2913759 ["3", "5", "add", "printi"] Push me on to

    the stack Push me too Pop two stack entries, add them, push result Friday, October 5, 12 These are “tokens” if you will. Each one is atomic: “3”, “add”, and “printi” can’t be split further.
  13. Compiler 2913759 ["3", "5", "add", "printi"] Push me on to

    the stack Push me too Pop two stack entries, add them, push result Pop entry, print it Friday, October 5, 12 These are “tokens” if you will. Each one is atomic: “3”, “add”, and “printi” can’t be split further.
  14. Compiler 2913759 ["3", "5", "add", "printi"] split_prog.each do |thing| case

    thing when /^\d+$/ puts "@stack.push #{thing.to_i}" when 'add' puts "@stack.push(@stack.pop + @stack.pop)" when 'printi' puts "puts @stack.pop" end end Friday, October 5, 12
  15. Compiler 2913759 ["3", "5", "add", "printi"] @stack = [] @stack.push

    3 @stack.push 5 @stack.push(@stack.pop + @stack.pop) puts @stack.pop Friday, October 5, 12
  16. Compiler 2913759 program = $stdin.read puts "@stack = []" split_prog

    = program.split split_prog.each do |thing| case thing when /^\d+$/ puts "@stack.push #{thing.to_i}" when 'add' puts "@stack.push(@stack.pop + @stack.pop)" when 'printi' puts "puts @stack.pop" end end Friday, October 5, 12
  17. Compilers •What is a compiler? •What makes them hard? •Why

    should you write one? •How can I write one? Friday, October 5, 12
  18. Epcot Tips • It’s “Soarin’” and not “Sauron” • Don’t

    ride the boat a er the German buffet Friday, October 5, 12
  19. Epcot Tips • It’s “Soarin’” and not “Sauron” • Don’t

    ride the boat a er the German buffet • The Italian soda “Beverley” tastes like grapefruit rind Friday, October 5, 12
  20. Compilers •What is a compiler? •What makes them hard? •Why

    should you write one? •How can I write one? Friday, October 5, 12
  21. Compilers are Hard “I thought you just claimed to have

    made that one in two minutes?” Friday, October 5, 12
  22. Compilers are Hard • Language design • Parsing • Data

    structures • Optimization Friday, October 5, 12 I passed on the hard parts of this. Seriously, look at it.
  23. Compilers are Hard • Language design • Parsing • Data

    structures • Optimization 3 5 add printi Friday, October 5, 12 I passed on the hard parts of this. Seriously, look at it.
  24. Language Design 3 5 add printi puts 3 + 5

    System.out.println(3 + 5); io:format("~w~n", [3 + 5]). Friday, October 5, 12 Different languages let you express things differently.
  25. Language Design -module(hello). -export([start/0]). start() -> spawn(fun() -> loop() end).

    loop() -> receive hello -> io:format("Hello, World!~n"), loop(); goodbye -> ok end. Friday, October 5, 12 Sometimes what’s easy in one language is difficult in another. How would I express this in a stack-based language? Ruby?
  26. O en people, especially computer engineers, focus on the machines.

    They think, "By doing this, the machine will run faster. By doing this, the machine will run more effectively. By doing this, the machine will something something something." They are focusing on machines. But in fact we need to focus on humans, on how humans care about doing programming or operating the application of the machines. Yukihiro Matsumoto, The Philosophy of Ruby, Sept. 29 2003 Friday, October 5, 12 Fundamentally, language design is about making the core concepts of your language usable for humans. Ruby does this with object-oriented design very well. Erlang does this with reliable concurrency very well. Cambridge doesn’t do this very well. Crapshoot does this with simple math, which is a shockingly low bar.
  27. Parsing Application of “Automata Theory” Friday, October 5, 12 You

    may recognize this name from a computer science course catalog.
  28. Parsing •Regular •Context-free •Context-sensitive •Unrestricted Friday, October 5, 12 This

    is the Chomsky hierarchy of formal languages, ordered from “easy” to “hard”
  29. Parsing Regular language No recursive structures Parse with a finite

    state automaton 3 5 add printi Friday, October 5, 12
  30. Parsing Context-free language Recursive structures Parse with a push down

    automaton (1 + 2) * (3 + (4 / 5)) Friday, October 5, 12 A “push down automaton” has a stack, much like the language in previous slides.
  31. Language A List of Tokens Tokenize with Ragel Syntax Tree

    Analyze with Ruby Language B Generate Code with Ruby Parsing Friday, October 5, 12
  32. Language A List of Tokens Tokenize with Ragel Syntax Tree

    Analyze with Ruby Language B Generate Code with Ruby Parsing Friday, October 5, 12
  33. Parsing tokens = scanner.parse expression postfix_tokens = postfixer.postfixify tokens result

    = evaluator.evaluate postfix_tokens http://bit.ly/crapshoot-2011 Friday, October 5, 12
  34. Parsing Ragel Treetop Friday, October 5, 12 Again, parsing is

    huge. The short answer is I like Ragel and haven’t used Treetop much.
  35. Parsing    Number  =  digit+  >_number  %number;    Constant  =

     Number  %constant;    Drop  =  ('^'  |  'v')  %drop;    Series  =  Number  'd'  Number  Drop?  %series;    Arithmetic  =  ('+'  |  '-­‐'  |  '*'  |  '/')  %arithmetic;    UnaryExpression  =  Series  |  Constant;    BinaryExpression  =  UnaryExpression  (space*  Arithmetic  space*   UnaryExpression)+;    Expression  =  UnaryExpression  |  BinaryExpression; Friday, October 5, 12 This is a Ragel grammar.
  36. Compilers are Hard • Language design • Parsing • Data

    structures • Optimization Friday, October 5, 12 So language design is hard, and parsing is hard.
  37. Compilers are Hard • Language design • Parsing • Data

    structures • Optimization 3 5 add printi Friday, October 5, 12 So language design is hard, and parsing is hard.
  38. Data Structures I use objects and lots of Arrays. Friday,

    October 5, 12 my compilers are scrub-tier so i don’t use anything complicated
  39. Data Structures LLVM uses Hash, Set, Array, String, and Bit

    containers. Friday, October 5, 12 or ruby equivalents of them They do lots of complicated optimizations that I don’t, and don’t use a language as powerful as Ruby.
  40. Optimization Can we optimize this? 3 5 add printi Friday,

    October 5, 12 Yes. It’s called constant folding.
  41. Optimization Can we optimize this? 3 5 add printi 8

    printi Friday, October 5, 12 Yes. It’s called constant folding.
  42. Language A List of Tokens Tokenize with Ragel Syntax Tree

    Analyze with Ruby Syntax Tree Fold Constants with Ruby Language B Generate Code with Ruby Optimization Friday, October 5, 12 The easy thing to do would be to just add another pass for it.
  43. Syntax Tree Analyze with Ruby Syntax Tree Fold Constants with

    Ruby Language B Generate Code with Ruby Optimization Friday, October 5, 12
  44. Optimization int main() { printf("%d\n", 3 + 5); return 0;

    } Friday, October 5, 12 For what it’s worth, my 30 second perusal of “man gcc” and compiling sans optimizations doesn’t even show a way to disable constant folding in this case.
  45. Optimization int main() { printf("%d\n", 3 + 5); return 0;

    } movl $8, %eax xorb %cl, %cl leaq L_.str(%rip), %rdx movq %rdx, %rdi movl %eax, %esi movb %cl, %al callq _printf Friday, October 5, 12 For what it’s worth, my 30 second perusal of “man gcc” and compiling sans optimizations doesn’t even show a way to disable constant folding in this case.
  46. Optimization int main() { printf("%d\n", 3 + 5); return 0;

    } movl $8, %eax xorb %cl, %cl leaq L_.str(%rip), %rdx movq %rdx, %rdi movl %eax, %esi movb %cl, %al callq _printf Friday, October 5, 12 For what it’s worth, my 30 second perusal of “man gcc” and compiling sans optimizations doesn’t even show a way to disable constant folding in this case.
  47. Optimization Friday, October 5, 12 The C# compiler does dozens

    of passes, many of them are optimizations. Compiling to a high-level language that has an optimizing compiler is easy and you can get optimizations for free. Mirah does this.
  48. Optimization Lots of optimizations. Friday, October 5, 12 The C#

    compiler does dozens of passes, many of them are optimizations. Compiling to a high-level language that has an optimizing compiler is easy and you can get optimizations for free. Mirah does this.
  49. Optimization Lots of optimizations. Lots of opportunities for research. Friday,

    October 5, 12 The C# compiler does dozens of passes, many of them are optimizations. Compiling to a high-level language that has an optimizing compiler is easy and you can get optimizations for free. Mirah does this.
  50. Optimization Lots of optimizations. Lots of opportunities for research. Computers

    are fast. Friday, October 5, 12 The C# compiler does dozens of passes, many of them are optimizations. Compiling to a high-level language that has an optimizing compiler is easy and you can get optimizations for free. Mirah does this.
  51. Optimization Lots of optimizations. Lots of opportunities for research. Computers

    are fast. Compiling to high-level is okay. Friday, October 5, 12 The C# compiler does dozens of passes, many of them are optimizations. Compiling to a high-level language that has an optimizing compiler is easy and you can get optimizations for free. Mirah does this.
  52. Compilers are Hard • Language design • Parsing • Data

    structures • Optimization Friday, October 5, 12 Compilers are hard.
  53. Compilers are Hard • Language design • Parsing • Data

    structures • Optimization 3 5 add printi Friday, October 5, 12 Compilers are hard.
  54. Compilers •What is a compiler? •What makes them hard? •Why

    should you write one? •How can I write one? Friday, October 5, 12
  55. Friday, October 5, 12 Compilers have a lot of challenging

    parts. Parsing, optimization, language design: college courses, PhD theses, or experience. But before I talked about those, I showed off five minutes’ work. It’s not a cliff-face, it’s a gentle learning curve. With the right mentality, you can jump in and do something quick and dirty.
  56. Friday, October 5, 12 Compilers have a lot of challenging

    parts. Parsing, optimization, language design: college courses, PhD theses, or experience. But before I talked about those, I showed off five minutes’ work. It’s not a cliff-face, it’s a gentle learning curve. With the right mentality, you can jump in and do something quick and dirty.
  57. Friday, October 5, 12 It’s not an entirely academic pursuit.

    Compilers in Ruby tackle problems many of us deal with every day: haml, sass, coffee- script (originally in Ruby) all help me work faster.
  58. rats.each do |r| rat = Rat(r) cat.play_with rat end Friday,

    October 5, 12 Mirah is a JVM language closely related to Ruby. It’s compiled by Ruby, into Java, which is then compiled into JVM bytecode by javac (which is itself mostly implemented in Java), which is usually compiled into machine code during execution. Incidentally, while JRuby performs pretty badly on Android, Mirah performs identically to Java, because it’s Java.
  59. Have you ever worked on code bases that have grown

    inexplicably huge, despite all your best efforts to make them modular and object-oriented? Of course you have. What's the solution? You either learn compilers and start writing your own DSLs, or your get yourself a better language. Steve Yegge, “Rich Programmer Food”, June 21 2007 Friday, October 5, 12 Knowing compilers can help you write better, more succinct code. Even if it’s an internal Ruby DSL where you get parsing for free, the language design is in mapping DSL statements to your codebase.
  60. Friday, October 5, 12 Learning compilers is worthwhile because they

    are hard, and writing compilers will make you better at writing software. We choose to go to the moon in this decade and do the other things, not because they are easy, but because they are hard, because that goal will serve to organize and measure the best of our energies and skills, because that challenge is one that we are willing to accept, one we are unwilling to
  61. We choose to go to the moon in this decade

    and do the other things, not because they are easy, but because they are hard, because that goal will serve to organize and measure the best of our energies and skills, because that challenge is one that we are willing to accept, one we are unwilling to postpone, and one which we intend to win… John F. Kennedy, Speech at Rice University, Sept. 12 1962 Friday, October 5, 12 Learning compilers is worthwhile because they are hard, and writing compilers will make you better at writing software. We choose to go to the moon in this decade and do the other things, not because they are easy, but because they are hard, because that goal will serve to organize and measure the best of our energies and skills, because that challenge is one that we are willing to accept, one we are unwilling to
  62. Compilers •What is a compiler? •What makes them hard? •Why

    should you write one? •How can I write one? Friday, October 5, 12
  63. Start 1.Write a program in your new language. 2.Write a

    test for expected output. 3.Write a Ruby program to pass that test. 4.Refactor. 5.Repeat. Friday, October 5, 12 Red. Green. Refactor.
  64. Crapshoot class  TestCrapshoot  <  Test::Unit::TestCase    context  'the  expression  "4d6

     +  200"'  do        setup  do            @expression  =  "4d6  +  200"        end        should  'have  a  result  over  200'  do            assert  Crapshooot.roll(@expression)  >  200        end    end end Friday, October 5, 12 This is the first integration test for “Crapshoot,” my dice rolling & arithmetic language.
  65. Crapshoot grammar  CrapshootScanner    rule  expression        unary_expression

     /        binary_expression    end    rule  binary_expression        unary_expression  (ows  arith  ows  unary_expression)+    end    rule  unary_expression        series  /  constant    end    rule  series        number  'd'  number    end    rule  number        [\d]+    end    rule  drop        'v'  /  '^'    end    rule  constant        number    end    rule  arith        '+'  /  '-­‐'  /  '*'  /  '/'    end    rule  ows        #  optional  whitespace        [\s]*    end end Friday, October 5, 12 This is the first pass, in treetop
  66. Crapshoot %%{    machine  scanner;    action  _number  {  @mark_num

     =  p  }    action  number  {  @num_stack.push  atos(data[@mark_num..p-­‐1])  }    action  constant  {  @tokens  <<  Tokens::Constant.new(@num_stack.pop)  }    action  series  {        drop  =  @drop_current        @drop_current  =  nil        sides  =  @num_stack.pop        count  =  @num_stack.pop        @tokens  <<  Tokens::Series.new(count,  sides,  drop)    }    action  arithmetic  {  @tokens  <<  Tokens::Arithmetic.new(data[p-­‐1].chr)  }    action  drop  {  @drop_current  =  data[p-­‐1].chr  }    Number  =  digit+  >_number  %number;    Constant  =  Number  %constant;    Drop  =  ('^'  |  'v')  %drop;    Series  =  Number  'd'  Number  Drop?  %series;    Arithmetic  =  ('+'  |  '-­‐'  |  '*'  |  '/')  %arithmetic;    UnaryExpression  =  Series  |  Constant;    BinaryExpression  =  UnaryExpression  (space*  Arithmetic  space*   UnaryExpression)+;    Expression  =  UnaryExpression  |  BinaryExpression;    main  :=  Expression; }%% Friday, October 5, 12 This is what it is today, in Ragel.
  67. Crapshoot context  'The  Crapshoot  module'  do    should_roll  '4d6  +

     200',  '>='=>200 end Friday, October 5, 12 And this is the original integration test, slightly refactored and more stringent.
  68. Crapshoot context  'The  Crapshoot  module'  do    should_roll  '4d6  +

     200',  '>='=>200 end Achievement Unlocked TDD in the Compiler Talk Friday, October 5, 12 And this is the original integration test, slightly refactored and more stringent.
  69. Crapshoot context  'The  Crapshoot  module'  do    should_roll  '4d6  +

     200',  '>='=>200 end Friday, October 5, 12 And this is the original integration test, slightly refactored and more stringent.
  70. Compilers •What is a compiler? •What makes them hard? •Why

    should you write one? •How can I write one? Friday, October 5, 12
  71. Why should I write one? Write compilers because they are

    enjoyably challenging and help you grow. Friday, October 5, 12
  72. How can I write one? Write a program, make it

    compile. Friday, October 5, 12
  73. Name Dropping Compiler 2913759: https://gist.github.com/2913759 Lambda the Ultimate: http://lambda-the-ultimate.org/ Philosophy

    of Ruby: http://www.artima.com/intv/ruby4.html Ragel: http://www.complang.org/ragel/ Treetop: https://github.com/nathansobo/treetop LLVM programming: http://llvm.org/docs/ ProgrammersManual.html Friday, October 5, 12
  74. Name Dropping How many passes in C# compiler? http://blogs.msdn.com/b/ ericlippert/archive/2010/02/04/how-many-passes.aspx

    Steve Yegge, “Rich Programmer Food” http://steve- yegge.blogspot.com/2007/06/rich-programmer-food.html John F. Kennedy, Speech at Rice University http:// www.jfklibrary.org/Research/Ready-Reference/JFK- Speeches/Address-at-Rice-University-on-the-Nations-Space- Effort-September-12-1962.aspx Friday, October 5, 12