pragprog/titles/JRUBY discount code: JRubyIanDees Before we get to the talk, let me make a couple of quick announcements. First, we’re updating the JRuby book this summer with a JRuby 1.7-ready PDF. To celebrate that, we’re offering a discount code on the book during the conference. Second, I’m working on a new book with the Cucumber folks, which has some JRuby/JVM stuff in it—if you’d like to be a tech reviewer, please find me after this talk.
(with apologies to Ira Glass) Act I, Meet Thnad, in which we encounter Thnad, a programming language built with JRuby and designed not for programmer happiness, but for implementer happiness. Act II, Enter the Frenemy, in which we meet a new Ruby runtime. Act III, Thnad's Revenge, in which we port Thnad to run on the Rubinius runtime and encounter some surprises along the way.
Ruby is optimized for programmer happiness, Thnad is optimized for implementer happiness. It was designed to be implemented with a minimum of time and effort, and a maximum amount of fun.
{ times(n, factorial(minus(n, 1))) } } print(factorial(4)) Here’s a sample Thnad program demonstrating all the major features. Thnad has integers, functions, conditionals, and... not much else. These minimal features were easy to add, thanks to the great tools available in the JRuby ecosystem (and other ecosystems, as we’ll see).
Conditionals 4. Function Definitions In the next few minutes, we’re going to trace through each of these four language features, from parsing the source all the way to generating the final binary. We won’t show every single grammar rule, but we will hit the high points.
into four main stages in a typical language: finding the tokens or parts of speech of the text, parsing the tokens into an in-memory tree, transforming the tree, and generating the bytecode. We’re going to look at each of Thnad’s major features in the context of these stages.
Parslet handles the first two stages of compilation (tokenizing and parsing) using a Parsing Expression Grammar, or PEG. PEGs are like regular expressions attached to blocks of code. They sound like a hack, but there’s solid compiler theory behind them.
Thnad::Number :value 42 root :number "42" root Now for the third stage, transformation. We could generate the bytecode straight from the original tree, using a bunch of hard-to-test case statements. But it would be nicer to have a specific Ruby class for each Thnad language feature. The rule at the bottom of this slide tells Parslet to transform a Hash with a key called :number into an instance of a Number class we provide.
int, int] returnvoid end Here's an example, just to get an idea of the flavor. To call a method, you just push the arguments onto the stack and then call a specific opcode, in this case invokestatic. The VM you're writing for is aware of classes, interfaces, and so on—you don't have to implement method lookup like you would with plain machine code.
I first saw the BiteScript, I thought it was something you'd only need if you were doing deep JVM hacking. But when I read the slides from Charlie's presentation at Øredev, it clicked. This library takes me way back to my college days, when we'd write assembler programs for a really simple instruction set like MIPS. BiteScript evokes that same kind of feeling. I'd always thought the JVM would have a huge, crufty instruction set—but it's actually quite manageable to keep the most important parts of it in your head.
end end We can generate the bytecode any way we want. One simple way is to give each of our classes an eval() method that takes a BiteScript generator and calls various methods on it to generate JVM instructions.
context[:params] || [] position = param_names.index(name) raise "Unknown parameter #{name}" unless position builder.iload position end end Dealing with passed-in parameters is nearly as easy as dealing with raw integers; we just look up the parameter name by position, and then push the nth parameter onto the stack.
{:arg => {:number => '42'}}]}} {:arg => {:name => 'foo'}}]}} :funcall :name :args "baz" :arg :arg :number "42" :name "foo" root We’re going to move a little faster here, to leave time for Rubinius. Here, we want to transform this source code into this Ruby data structure representing a function call.
{ |a| a.eval(context, builder) } types = [builder.int] * (args.length + 1) builder.invokestatic \ builder.class_builder, name, types end end The bytecode for a function call is really simple in BiteScript. All functions in Thnad are static methods on a single class.
=> {:number => '0'}, :if_true => {:body => {:number => '42'}}, :if_false => {:body => {:number => '667'}}} :cond :number "0" :if_true :body :number "42" :if_false :body :number "667" root A conditional consists of the “if” keyword, followed by a body of code inside braces, then the “else” keyword, followed by another body of code in braces.
Thnad::Number :value 667 root Thnad::Conditional.new \ Thnad::Number.new(0), Thnad::Number.new(42), Thnad::Number.new(667) Here’s the transformed tree representing a set of custom Ruby classes.
cond.eval context, builder builder.ifeq :else if_true.eval context, builder builder.goto :endif builder.label :else if_false.eval context, builder builder.label :endif end end The bytecode emitter for conditionals has a new twist. The Conditional struct points to three other Thnad nodes. It needs to eval() them at the right time to emit their bytecode in between all the zero checks and gotos.
:params => {:param => {:name => 'x'}}, :body => {:number => '5'}} :func :name "foo" :params :param :name "x" :body :number "5" root A function definition looks a lot like a function call, but with a body attached to it.
param_names = [params].flatten.map(&:name) context[:params] = param_names types = [builder.int] * (param_names.count + 1) builder.public_static_method(self.name, [], *types) do |method| self.body.eval(context, method) method.ireturn end end end Since all Thnad parameters and return types are integers, emitting a function definition is really easy. We count the parameters so that we can give the JVM a correct signature. Then, we just pass a block to the public_static_method helper, a feature of BiteScript that will inspire the Rubinius work later on.
... klass.public_static_method 'main', [], void, string[] do |method| context = Hash.new exprs.each do |e| e.eval(context, method) end method.returnvoid end end end Here’s the core of class generation. We output a standard Java main() function...
... klass.public_static_method 'main', [], void, string[] do |method| context = Hash.new exprs.each do |e| e.eval(context, method) end method.returnvoid end end end ...inside which we eval() our Thnad expressions (not counting function definitions) one by one.
1 isub ireturn end Here’s the definition of minus(). It just pushes its two arguments onto the stack and then subtracts them. The rest of the built-ins are nearly identical to this one, so we won’t show them here.
around half (?) • Core in C++ / LLVM • Tons in Ruby: primitives, parser, bytecode Ruby in Ruby The goal of Rubinius is to implement Ruby in Ruby as much as performance allows. Quite a lot of functionality you’d think would need to be in C is actually in Ruby.
have Rubinius to thank for the executable Ruby specification that all Rubies are now judged against, and for the excellent foreign-function interface that lets you call C code in a way that’s compatible with at least four Rubies.
Arguments: 2 required, 2 total Locals: 2: a, b Stack size: 4 Lines to IP: 2: -1..-1, 3: 0..6 0000: push_local 0 # a 0002: push_local 1 # b 0004: meta_send_op_plus :+ 0006: ret ---------------------------------------- ...or a dump of the actual bytecode for the Rubinius VM.
Ford was a huge help during this effort, answering tons of my “How do I...?” questions in an awesome Socratic way (“Let’s take a look at the Generator class source code....”)
ideas) Because the Thnad syntax is unchanged, we can reuse the parser and syntax transformation. All we need to change is the bytecode output. And even that’s not drastically different.
context[:params] || [] position = param_names.index(name) raise "Unknown parameter #{name}" unless position builder.push_local position end end ...and for parameter names.
push 42 push 1 send_stack #<CM>, 2 JVM RBX In Rubinius, there are no truly static methods. We are calling the method on a Ruby object— namely, an entire Ruby class. So we have to push the name of that class onto the stack first. The other big difference is that in Rubinius, we don’t just push the method name onto the stack—we push a reference to the compiled code itself. Fortunately, there’s a helper method to make this look more Bitescript-like.
:Thnad args.each { |a| a.eval(context, builder) } builder.allow_private builder.send name.to_sym, args.length end end Here’s how that difference affects the bytecode. Notice the allow_private() call? I’m not sure exactly why we need this. It may be an “onion in the varnish,” a reference to a story by Primo Levi in _The Periodic Table_.
factory wondered why the recipe called for an onion. They couldn’t work out chemically why it would be needed, but it had always been one of the ingredients. It turned out that it was just a crude old-school thermometer: when the onion sizzled, the varnish was ready.
else_label = builder.new_label endif_label = builder.new_label cond.eval context, builder builder.push 0 builder.send :==, 1 builder.goto_if_true else_label if_true.eval context, builder builder.goto endif_label else_label.set! if_false.eval context, builder endif_label.set! end end Labels are also a little different in Rubinius, too; here’s what the bytecode for conditionals looks like now.
:add push #<CM> push_scope push_self push_const :Thnad send :attach_method, 4 JVM RBX Remember that in Ruby, there’s no compile-time representation of a class. So rather than emitting a class definition, we emit code that creates a class at runtime.
param_names = [params].flatten.map(&:name) context[:params] = param_names # create a new Rubinius::Generator builder.begin_method name.to_sym, params.count self.body.eval(context, builder.current_method) builder.current_method.ret builder.end_method end end The code to define a method in Rubinius requires spinning up a completely separate bytecode generator. I stuck all this hairy logic in a set of helpers to make it more BiteScript- like.
push_rubinius push_literal inner.name push_literal cm push_scope push_const :Thnad send :attach_method, 4 pop end end Here’s the most interesting part of those helpers. After the function definition is compiled, we push it onto the stack and tell Rubinius to attach it to our class.
= Rubinius::CodeLoader.new(ARGV.first) method = loader.load_compiled_file( ARGV.first, Rubinius::Signature, 18) result = Rubinius.run_script(method) Here’s the entirety of the code to load and run a compiled Rubinius file.
a standard library! Doing the Rubinius implementation helped me improve the JRuby version. I was able to go back and rip out most of the built-in functions from that implementation.
Ryan Davis and Aja Hammerly for Graph Brian Ford for guidance Our tireless conference organizers! ...and to the makers of JRuby, Rubinius, Parslet, BiteScript, and everything else that made this project possible. Cheers!