Slide 1

Slide 1 text

Compilers: Not That Hard Bryce Kerley Magic Ruby Walt Disney World Hollywood Studios October 5 2012 Friday, October 5, 12

Slide 2

Slide 2 text

Who am I? Bryce Kerley Consulting Engineer at Basho @bonzoesc Friday, October 5, 12

Slide 3

Slide 3 text

Compilers •What is a compiler? •What makes them hard? •Why should you write one? •How can I write one? Friday, October 5, 12

Slide 4

Slide 4 text

What is a Compiler? Friday, October 5, 12

Slide 5

Slide 5 text

What is a Compiler? Input: program in language A Output: program in language B Language A Language B Compiler Friday, October 5, 12

Slide 6

Slide 6 text

Compiler 2913759 Stack-based language 3 5 add printi Should output “8” Friday, October 5, 12 You may recognize this as being similar to “Postscript”, “Forth”, HP calculators, or “Factor”

Slide 7

Slide 7 text

Compiler 2913759 $ ruby simple_cambridge.rb < eight.sc > eight.rb $ ruby eight.rb 8 $ 3 5 add printi Friday, October 5, 12 Here’s how I’d like it to work: read from stdin, output ruby to stdout

Slide 8

Slide 8 text

Compiler 2913759 “Output Ruby?” Friday, October 5, 12 C has lots of ceremony Java has tons of ceremony Assembly has more ceremony than Java

Slide 9

Slide 9 text

Compiler 2913759 Reads from STDIN Has a stack Whitespace delimited program = $stdin.read puts "@stack = []" split_prog = program.split Friday, October 5, 12 Since we’re outputting to STDOUT I can just use “puts” to output compiled code.

Slide 10

Slide 10 text

Compiler 2913759 Reads from STDIN Has a stack Whitespace delimited program = $stdin.read puts "@stack = []" split_prog = program.split I initialize the stack Friday, October 5, 12 Since we’re outputting to STDOUT I can just use “puts” to output compiled code.

Slide 11

Slide 11 text

Compiler 2913759 ["3", "5", "add", "printi"] Friday, October 5, 12 These are “tokens” if you will. Each one is atomic: “3”, “add”, and “printi” can’t be split further.

Slide 12

Slide 12 text

Compiler 2913759 ["3", "5", "add", "printi"] Push me on to the stack Friday, October 5, 12 These are “tokens” if you will. Each one is atomic: “3”, “add”, and “printi” can’t be split further.

Slide 13

Slide 13 text

Compiler 2913759 ["3", "5", "add", "printi"] Push me on to the stack Push me too Friday, October 5, 12 These are “tokens” if you will. Each one is atomic: “3”, “add”, and “printi” can’t be split further.

Slide 14

Slide 14 text

Compiler 2913759 ["3", "5", "add", "printi"] Push me on to the stack Push me too Pop two stack entries, add them, push result Friday, October 5, 12 These are “tokens” if you will. Each one is atomic: “3”, “add”, and “printi” can’t be split further.

Slide 15

Slide 15 text

Compiler 2913759 ["3", "5", "add", "printi"] Push me on to the stack Push me too Pop two stack entries, add them, push result Pop entry, print it Friday, October 5, 12 These are “tokens” if you will. Each one is atomic: “3”, “add”, and “printi” can’t be split further.

Slide 16

Slide 16 text

Compiler 2913759 ["3", "5", "add", "printi"] split_prog.each do |thing| case thing when /^\d+$/ puts "@stack.push #{thing.to_i}" when 'add' puts "@stack.push(@stack.pop + @stack.pop)" when 'printi' puts "puts @stack.pop" end end Friday, October 5, 12

Slide 17

Slide 17 text

Compiler 2913759 ["3", "5", "add", "printi"] Friday, October 5, 12

Slide 18

Slide 18 text

Compiler 2913759 ["3", "5", "add", "printi"] Friday, October 5, 12

Slide 19

Slide 19 text

Compiler 2913759 ["3", "5", "add", "printi"] @stack = [] @stack.push 3 @stack.push 5 @stack.push(@stack.pop + @stack.pop) puts @stack.pop Friday, October 5, 12

Slide 20

Slide 20 text

Compiler 2913759 program = $stdin.read puts "@stack = []" split_prog = program.split split_prog.each do |thing| case thing when /^\d+$/ puts "@stack.push #{thing.to_i}" when 'add' puts "@stack.push(@stack.pop + @stack.pop)" when 'printi' puts "puts @stack.pop" end end Friday, October 5, 12

Slide 21

Slide 21 text

Compilers •What is a compiler? •What makes them hard? •Why should you write one? •How can I write one? Friday, October 5, 12

Slide 22

Slide 22 text

Epcot Tips Friday, October 5, 12

Slide 23

Slide 23 text

Epcot Tips • It’s “Soarin’” and not “Sauron” Friday, October 5, 12

Slide 24

Slide 24 text

Epcot Tips • It’s “Soarin’” and not “Sauron” • Don’t ride the boat a er the German buffet Friday, October 5, 12

Slide 25

Slide 25 text

Epcot Tips • It’s “Soarin’” and not “Sauron” • Don’t ride the boat a er the German buffet • The Italian soda “Beverley” tastes like grapefruit rind Friday, October 5, 12

Slide 26

Slide 26 text

Compilers •What is a compiler? •What makes them hard? •Why should you write one? •How can I write one? Friday, October 5, 12

Slide 27

Slide 27 text

Compilers are Hard “I thought you just claimed to have made that one in two minutes?” Friday, October 5, 12

Slide 28

Slide 28 text

Compilers are Hard • Language design • Parsing • Data structures • Optimization Friday, October 5, 12 I passed on the hard parts of this. Seriously, look at it.

Slide 29

Slide 29 text

Compilers are Hard • Language design • Parsing • Data structures • Optimization 3 5 add printi Friday, October 5, 12 I passed on the hard parts of this. Seriously, look at it.

Slide 30

Slide 30 text

Language Design 3 5 add printi puts 3 + 5 System.out.println(3 + 5); io:format("~w~n", [3 + 5]). Friday, October 5, 12 Different languages let you express things differently.

Slide 31

Slide 31 text

Language Design -module(hello). -export([start/0]). start() -> spawn(fun() -> loop() end). loop() -> receive hello -> io:format("Hello, World!~n"), loop(); goodbye -> ok end. Friday, October 5, 12 Sometimes what’s easy in one language is difficult in another. How would I express this in a stack-based language? Ruby?

Slide 32

Slide 32 text

O en people, especially computer engineers, focus on the machines. They think, "By doing this, the machine will run faster. By doing this, the machine will run more effectively. By doing this, the machine will something something something." They are focusing on machines. But in fact we need to focus on humans, on how humans care about doing programming or operating the application of the machines. Yukihiro Matsumoto, The Philosophy of Ruby, Sept. 29 2003 Friday, October 5, 12 Fundamentally, language design is about making the core concepts of your language usable for humans. Ruby does this with object-oriented design very well. Erlang does this with reliable concurrency very well. Cambridge doesn’t do this very well. Crapshoot does this with simple math, which is a shockingly low bar.

Slide 33

Slide 33 text

Language Design In conclusion, language design is a land of contrast. Friday, October 5, 12

Slide 34

Slide 34 text

Parsing Application of “Automata Theory” Friday, October 5, 12 You may recognize this name from a computer science course catalog.

Slide 35

Slide 35 text

Parsing •Regular •Context-free •Context-sensitive •Unrestricted Friday, October 5, 12 This is the Chomsky hierarchy of formal languages, ordered from “easy” to “hard”

Slide 36

Slide 36 text

Parsing Regular language No recursive structures Parse with a finite state automaton 3 5 add printi Friday, October 5, 12

Slide 37

Slide 37 text

Parsing Context-free language Recursive structures Parse with a push down automaton (1 + 2) * (3 + (4 / 5)) Friday, October 5, 12 A “push down automaton” has a stack, much like the language in previous slides.

Slide 38

Slide 38 text

Parsing “You can’t parse non- regular languages with regular expressions.” Friday, October 5, 12 This is true.

Slide 39

Slide 39 text

Language A List of Tokens Tokenize with Ragel Syntax Tree Analyze with Ruby Language B Generate Code with Ruby Parsing Friday, October 5, 12

Slide 40

Slide 40 text

Language A List of Tokens Tokenize with Ragel Syntax Tree Analyze with Ruby Language B Generate Code with Ruby Parsing Friday, October 5, 12

Slide 41

Slide 41 text

Parsing tokens = scanner.parse expression postfix_tokens = postfixer.postfixify tokens result = evaluator.evaluate postfix_tokens http://bit.ly/crapshoot-2011 Friday, October 5, 12

Slide 42

Slide 42 text

Parsing Ragel Treetop Friday, October 5, 12 Again, parsing is huge. The short answer is I like Ragel and haven’t used Treetop much.

Slide 43

Slide 43 text

Parsing    Number  =  digit+  >_number  %number;    Constant  =  Number  %constant;    Drop  =  ('^'  |  'v')  %drop;    Series  =  Number  'd'  Number  Drop?  %series;    Arithmetic  =  ('+'  |  '-­‐'  |  '*'  |  '/')  %arithmetic;    UnaryExpression  =  Series  |  Constant;    BinaryExpression  =  UnaryExpression  (space*  Arithmetic  space*   UnaryExpression)+;    Expression  =  UnaryExpression  |  BinaryExpression; Friday, October 5, 12 This is a Ragel grammar.

Slide 44

Slide 44 text

Cat Break Friday, October 5, 12

Slide 45

Slide 45 text

Cat Break Friday, October 5, 12

Slide 46

Slide 46 text

Compilers are Hard • Language design • Parsing • Data structures • Optimization Friday, October 5, 12 So language design is hard, and parsing is hard.

Slide 47

Slide 47 text

Compilers are Hard • Language design • Parsing • Data structures • Optimization 3 5 add printi Friday, October 5, 12 So language design is hard, and parsing is hard.

Slide 48

Slide 48 text

Data Structures I use objects and lots of Arrays. Friday, October 5, 12 my compilers are scrub-tier so i don’t use anything complicated

Slide 49

Slide 49 text

Data Structures LLVM uses Hash, Set, Array, String, and Bit containers. Friday, October 5, 12 or ruby equivalents of them They do lots of complicated optimizations that I don’t, and don’t use a language as powerful as Ruby.

Slide 50

Slide 50 text

Optimization Can we optimize this? Friday, October 5, 12 Yes. It’s called constant folding.

Slide 51

Slide 51 text

Optimization Can we optimize this? 3 5 add printi Friday, October 5, 12 Yes. It’s called constant folding.

Slide 52

Slide 52 text

Optimization Can we optimize this? 3 5 add printi 8 printi Friday, October 5, 12 Yes. It’s called constant folding.

Slide 53

Slide 53 text

Language A List of Tokens Tokenize with Ragel Syntax Tree Analyze with Ruby Syntax Tree Fold Constants with Ruby Language B Generate Code with Ruby Optimization Friday, October 5, 12 The easy thing to do would be to just add another pass for it.

Slide 54

Slide 54 text

Syntax Tree Analyze with Ruby Syntax Tree Fold Constants with Ruby Language B Generate Code with Ruby Optimization Friday, October 5, 12

Slide 55

Slide 55 text

Optimization int main() { printf("%d\n", 3 + 5); return 0; } Friday, October 5, 12 For what it’s worth, my 30 second perusal of “man gcc” and compiling sans optimizations doesn’t even show a way to disable constant folding in this case.

Slide 56

Slide 56 text

Optimization int main() { printf("%d\n", 3 + 5); return 0; } movl $8, %eax xorb %cl, %cl leaq L_.str(%rip), %rdx movq %rdx, %rdi movl %eax, %esi movb %cl, %al callq _printf Friday, October 5, 12 For what it’s worth, my 30 second perusal of “man gcc” and compiling sans optimizations doesn’t even show a way to disable constant folding in this case.

Slide 57

Slide 57 text

Optimization int main() { printf("%d\n", 3 + 5); return 0; } movl $8, %eax xorb %cl, %cl leaq L_.str(%rip), %rdx movq %rdx, %rdi movl %eax, %esi movb %cl, %al callq _printf Friday, October 5, 12 For what it’s worth, my 30 second perusal of “man gcc” and compiling sans optimizations doesn’t even show a way to disable constant folding in this case.

Slide 58

Slide 58 text

Optimization Friday, October 5, 12 The C# compiler does dozens of passes, many of them are optimizations. Compiling to a high-level language that has an optimizing compiler is easy and you can get optimizations for free. Mirah does this.

Slide 59

Slide 59 text

Optimization Lots of optimizations. Friday, October 5, 12 The C# compiler does dozens of passes, many of them are optimizations. Compiling to a high-level language that has an optimizing compiler is easy and you can get optimizations for free. Mirah does this.

Slide 60

Slide 60 text

Optimization Lots of optimizations. Lots of opportunities for research. Friday, October 5, 12 The C# compiler does dozens of passes, many of them are optimizations. Compiling to a high-level language that has an optimizing compiler is easy and you can get optimizations for free. Mirah does this.

Slide 61

Slide 61 text

Optimization Lots of optimizations. Lots of opportunities for research. Computers are fast. Friday, October 5, 12 The C# compiler does dozens of passes, many of them are optimizations. Compiling to a high-level language that has an optimizing compiler is easy and you can get optimizations for free. Mirah does this.

Slide 62

Slide 62 text

Optimization Lots of optimizations. Lots of opportunities for research. Computers are fast. Compiling to high-level is okay. Friday, October 5, 12 The C# compiler does dozens of passes, many of them are optimizations. Compiling to a high-level language that has an optimizing compiler is easy and you can get optimizations for free. Mirah does this.

Slide 63

Slide 63 text

Compilers are Hard • Language design • Parsing • Data structures • Optimization Friday, October 5, 12 Compilers are hard.

Slide 64

Slide 64 text

Compilers are Hard • Language design • Parsing • Data structures • Optimization 3 5 add printi Friday, October 5, 12 Compilers are hard.

Slide 65

Slide 65 text

Compilers •What is a compiler? •What makes them hard? •Why should you write one? •How can I write one? Friday, October 5, 12

Slide 66

Slide 66 text

Friday, October 5, 12 Compilers have a lot of challenging parts. Parsing, optimization, language design: college courses, PhD theses, or experience. But before I talked about those, I showed off five minutes’ work. It’s not a cliff-face, it’s a gentle learning curve. With the right mentality, you can jump in and do something quick and dirty.

Slide 67

Slide 67 text

Friday, October 5, 12 Compilers have a lot of challenging parts. Parsing, optimization, language design: college courses, PhD theses, or experience. But before I talked about those, I showed off five minutes’ work. It’s not a cliff-face, it’s a gentle learning curve. With the right mentality, you can jump in and do something quick and dirty.

Slide 68

Slide 68 text

Friday, October 5, 12 It’s not an entirely academic pursuit. Compilers in Ruby tackle problems many of us deal with every day: haml, sass, coffee- script (originally in Ruby) all help me work faster.

Slide 69

Slide 69 text

rats.each do |r| rat = Rat(r) cat.play_with rat end Friday, October 5, 12 Mirah is a JVM language closely related to Ruby. It’s compiled by Ruby, into Java, which is then compiled into JVM bytecode by javac (which is itself mostly implemented in Java), which is usually compiled into machine code during execution. Incidentally, while JRuby performs pretty badly on Android, Mirah performs identically to Java, because it’s Java.

Slide 70

Slide 70 text

Have you ever worked on code bases that have grown inexplicably huge, despite all your best efforts to make them modular and object-oriented? Of course you have. What's the solution? You either learn compilers and start writing your own DSLs, or your get yourself a better language. Steve Yegge, “Rich Programmer Food”, June 21 2007 Friday, October 5, 12 Knowing compilers can help you write better, more succinct code. Even if it’s an internal Ruby DSL where you get parsing for free, the language design is in mapping DSL statements to your codebase.

Slide 71

Slide 71 text

Friday, October 5, 12 Learning compilers is worthwhile because they are hard, and writing compilers will make you better at writing software. We choose to go to the moon in this decade and do the other things, not because they are easy, but because they are hard, because that goal will serve to organize and measure the best of our energies and skills, because that challenge is one that we are willing to accept, one we are unwilling to

Slide 72

Slide 72 text

We choose to go to the moon in this decade and do the other things, not because they are easy, but because they are hard, because that goal will serve to organize and measure the best of our energies and skills, because that challenge is one that we are willing to accept, one we are unwilling to postpone, and one which we intend to win… John F. Kennedy, Speech at Rice University, Sept. 12 1962 Friday, October 5, 12 Learning compilers is worthwhile because they are hard, and writing compilers will make you better at writing software. We choose to go to the moon in this decade and do the other things, not because they are easy, but because they are hard, because that goal will serve to organize and measure the best of our energies and skills, because that challenge is one that we are willing to accept, one we are unwilling to

Slide 73

Slide 73 text

Compilers •What is a compiler? •What makes them hard? •Why should you write one? •How can I write one? Friday, October 5, 12

Slide 74

Slide 74 text

Start 1.Write a program in your new language. 2.Write a test for expected output. 3.Write a Ruby program to pass that test. 4.Refactor. 5.Repeat. Friday, October 5, 12 Red. Green. Refactor.

Slide 75

Slide 75 text

Crapshoot class  TestCrapshoot  <  Test::Unit::TestCase    context  'the  expression  "4d6  +  200"'  do        setup  do            @expression  =  "4d6  +  200"        end        should  'have  a  result  over  200'  do            assert  Crapshooot.roll(@expression)  >  200        end    end end Friday, October 5, 12 This is the first integration test for “Crapshoot,” my dice rolling & arithmetic language.

Slide 76

Slide 76 text

Crapshoot grammar  CrapshootScanner    rule  expression        unary_expression  /        binary_expression    end    rule  binary_expression        unary_expression  (ows  arith  ows  unary_expression)+    end    rule  unary_expression        series  /  constant    end    rule  series        number  'd'  number    end    rule  number        [\d]+    end    rule  drop        'v'  /  '^'    end    rule  constant        number    end    rule  arith        '+'  /  '-­‐'  /  '*'  /  '/'    end    rule  ows        #  optional  whitespace        [\s]*    end end Friday, October 5, 12 This is the first pass, in treetop

Slide 77

Slide 77 text

Crapshoot %%{    machine  scanner;    action  _number  {  @mark_num  =  p  }    action  number  {  @num_stack.push  atos(data[@mark_num..p-­‐1])  }    action  constant  {  @tokens  <<  Tokens::Constant.new(@num_stack.pop)  }    action  series  {        drop  =  @drop_current        @drop_current  =  nil        sides  =  @num_stack.pop        count  =  @num_stack.pop        @tokens  <<  Tokens::Series.new(count,  sides,  drop)    }    action  arithmetic  {  @tokens  <<  Tokens::Arithmetic.new(data[p-­‐1].chr)  }    action  drop  {  @drop_current  =  data[p-­‐1].chr  }    Number  =  digit+  >_number  %number;    Constant  =  Number  %constant;    Drop  =  ('^'  |  'v')  %drop;    Series  =  Number  'd'  Number  Drop?  %series;    Arithmetic  =  ('+'  |  '-­‐'  |  '*'  |  '/')  %arithmetic;    UnaryExpression  =  Series  |  Constant;    BinaryExpression  =  UnaryExpression  (space*  Arithmetic  space*   UnaryExpression)+;    Expression  =  UnaryExpression  |  BinaryExpression;    main  :=  Expression; }%% Friday, October 5, 12 This is what it is today, in Ragel.

Slide 78

Slide 78 text

Crapshoot context  'The  Crapshoot  module'  do    should_roll  '4d6  +  200',  '>='=>200 end Friday, October 5, 12 And this is the original integration test, slightly refactored and more stringent.

Slide 79

Slide 79 text

Crapshoot context  'The  Crapshoot  module'  do    should_roll  '4d6  +  200',  '>='=>200 end Achievement Unlocked TDD in the Compiler Talk Friday, October 5, 12 And this is the original integration test, slightly refactored and more stringent.

Slide 80

Slide 80 text

Crapshoot context  'The  Crapshoot  module'  do    should_roll  '4d6  +  200',  '>='=>200 end Friday, October 5, 12 And this is the original integration test, slightly refactored and more stringent.

Slide 81

Slide 81 text

Compilers •What is a compiler? •What makes them hard? •Why should you write one? •How can I write one? Friday, October 5, 12

Slide 82

Slide 82 text

What’s a Compiler Compilers translate computer programs between languages. Friday, October 5, 12

Slide 83

Slide 83 text

Why are they hard? Compilers are hard because language is complicated. Friday, October 5, 12

Slide 84

Slide 84 text

Why should I write one? Write compilers because they are enjoyably challenging and help you grow. Friday, October 5, 12

Slide 85

Slide 85 text

How can I write one? Write a program, make it compile. Friday, October 5, 12

Slide 86

Slide 86 text

Thanks! Bryce Kerley [email protected] @bonzoesc http://bit.ly/this-bundle Friday, October 5, 12

Slide 87

Slide 87 text

Name Dropping Compiler 2913759: https://gist.github.com/2913759 Lambda the Ultimate: http://lambda-the-ultimate.org/ Philosophy of Ruby: http://www.artima.com/intv/ruby4.html Ragel: http://www.complang.org/ragel/ Treetop: https://github.com/nathansobo/treetop LLVM programming: http://llvm.org/docs/ ProgrammersManual.html Friday, October 5, 12

Slide 88

Slide 88 text

Name Dropping How many passes in C# compiler? http://blogs.msdn.com/b/ ericlippert/archive/2010/02/04/how-many-passes.aspx Steve Yegge, “Rich Programmer Food” http://steve- yegge.blogspot.com/2007/06/rich-programmer-food.html John F. Kennedy, Speech at Rice University http:// www.jfklibrary.org/Research/Ready-Reference/JFK- Speeches/Address-at-Rice-University-on-the-Nations-Space- Effort-September-12-1962.aspx Friday, October 5, 12