Slide 1

Slide 1 text

The Hard Parts

Slide 2

Slide 2 text

Subverting the JVM All the tricks, hacks, and kludges we’ve use to make JRuby the best off-JVM language impl around.

Slide 3

Slide 3 text

Intro • Charles Oliver Nutter • Principal Software Engineer • Red Hat, JBoss Polyglot Group • @headius • [email protected]

Slide 4

Slide 4 text

Welcome! • My favorite event of the year • I’ve only missed one! • I will quickly talk through JRuby challenges • Not a comprehensive list. Buy me a beer. • Rest of you can help solve them

Slide 5

Slide 5 text

Ruby • Dynamic, object-oriented language • Created in 90s by Yukihiro Matsumoto • “matz” • Matz’s Ruby Interpreter (MRI) • Inspired by Python, Perl, Lisp, Smalltalk • Memes: TMTOWTDI, MINASWAN, CoC,

Slide 6

Slide 6 text

# Output "I love Ruby"! say = "I love Ruby"! puts say! ! # Output "I *LOVE* RUBY"! say['love'] = "*love*"! puts say.upcase! ! # Output "I *love* Ruby"! # five times! 5.times { puts say }!

Slide 7

Slide 7 text

JRuby • Ruby for the JVM and JVM for the Ruby • Started in 2001, dozens of contribs • Usually the fastest Ruby • At least 20 paid full-time man years in it • Sun Microsystems, Engine Yard, Red Hat

Slide 8

Slide 8 text

Ruby is Hard to Implement!

Slide 9

Slide 9 text

Making It Go (Fast) • Parser-generator hacks • Multiple interpreters • Multiple compilers • JVM-specific tricks

Slide 10

Slide 10 text

Parsing Ruby • Yacc/Bison-based parse.y, almost 12kloc • Very complex, not context-free • No known 100% correct parser that is not YACC-based

Slide 11

Slide 11 text

No content

Slide 12

Slide 12 text

No content

Slide 13

Slide 13 text

No content

Slide 14

Slide 14 text

JRuby’s Parser • Jay parser generator • Maybe 5 projects in the world use it • Our version of parse.y = 4kloc • Two pieces, one is for offline parsing • Works ok, but…

Slide 15

Slide 15 text

Parser Problems! • Array initialization > 65k bytecode • Giant switch won’t JIT • Outlining the case bodies: better • Case bodies as runnables in machine: best • org/jruby/parser/RubyParser$445.class • Slow at startup (most important time!)

Slide 16

Slide 16 text

Interpreter • At least four interpreters we’ve tried • Original: visitor-based • Modified: big switch rather than visitor • Experimental: stackless instr-based • Current: direct execution of AST • Execution state on artificial stack

Slide 17

Slide 17 text

The New Way • JRuby 9000 introduces a new IR • Traditional-style compiler IR • Register-based • CFG, semantic analysis, type and constant propagation, all that jazz • Interpreter has proven it out…JIT next

Slide 18

Slide 18 text

Mixed-Mode • JRuby has both interpreter and JIT • Cost of generating JVM bytecode is high • Our interpreter runs faster than JVM’s • A jitted interpreter is (much) faster than unjitted bytecode

Slide 19

Slide 19 text

Native Execution • Early JIT compiler just translated AST • Bare-minimum semantic analysis • Eliminate artificial frame use • One-off opto for frequent patterns • Too unwieldy to evolve much

Slide 20

Slide 20 text

New IR JIT • Builds off IR runtime • Per-instruction bytecode gen is simple • JVM frame is like infinite register machine • Potential to massively improve perf • Early unboxing numbers…

Slide 21

Slide 21 text

Numeric loop performance 0 1.25 2.5 3.75 5 times faster than MRI 2.1 JRuby 1.7 Rubinius

Slide 22

Slide 22 text

Numeric loop performance 0 15 30 45 60 times faster than MRI 2.1 JRuby 1.7 Rubinius Truffle Topaz 9k+unbox

Slide 23

Slide 23 text

mandelbrot(500) 0 10 20 30 40 times faster than MRI 2.1 JRuby 9k + indy JRuby 9k + unboxing JRuby 9k + Truffle

Slide 24

Slide 24 text

Whither Truffle? • RubyTruffle merged into JRuby • Same licenses as rest of JRuby • Chris Seaton continues to work on it • Very impressive peak numbers • Startup, steady-state…needs work • Considering initial use for targeted opto

Slide 25

Slide 25 text

JVM Tricks • Lack of class hierarchy analysis in JIT • Manually split methods to beat limits • Everything is an expression, so exception- handling has to maintain current stack • Tweaking JIT flags will just make you sad • Unsafe

Slide 26

Slide 26 text

IRubyObject public RubyClass getMetaClass(); RubyBasicObject private RubyClass metaClass; public RubyClass getMetaClass() { return metaClass; } RubyString RubyArray RubyObject obj.getMetaClass()

Slide 27

Slide 27 text

public static RubyClass metaclass(IRubyObject object) {
 return object instanceof RubyBasicObject ?
 ((RubyBasicObject)object).getMetaClass() :
 object.getMetaClass();
 }

Slide 28

Slide 28 text

Compatibility • Strings and Encodings • IO • Fibers • Difficult choices

Slide 29

Slide 29 text

Strings • All arbitrary-width byte data is String • Binary data and encoded text alike • Many supported encodings • j.l.String, char[] poor options • Size, data integrity, behavioral differences

Slide 30

Slide 30 text

The First Big Decision • We realized we needed a byte[] String • Had been StringBuilder-based until then • That meant a lot of porting… • Regex engine (joni) • Encoding subsystem (jcodings) • Low-level IO + transcoding (in JRuby)

Slide 31

Slide 31 text

JOni • Port of Oniguruma regex library • Pluggable grammars + arbitrary encodings • Bytecode engine (shallow call stack) • Interruptible • Re-forked as char[] engine for Nashorn • https://github.com/jruby/joni

Slide 32

Slide 32 text

Data: ‘a’-‘z’ in byte[] Match /.*tuv(..)yz$/ 0s 1.5s 3s 4.5s 6s j.u.regex JOni

Slide 33

Slide 33 text

Data: ‘a’-‘z’ from IO Match /.*tuv(..)yz$/ 0s 0.7s 1.4s 2.1s 2.8s j.u.regex JOni

Slide 34

Slide 34 text

Jcodings • Character tables • Used heavily by JOni and JRuby • Transcoding tables and logic • Replaces Charset logic from JRuby 1.7 • https://github.com/jruby/jcodings

Slide 35

Slide 35 text

NO GRAPH NEEDED

Slide 36

Slide 36 text

JRuby 9000 • Finished porting, connecting transcoders • New port of IO operations • Transcoding works directly against IO buffers; hard to simulate other ways • Lots of fun native (C) calls to emulate…

Slide 37

Slide 37 text

Fibers • Coroutines, goroutines, continuations • MRI uses stack-swapping • And limits Fiber stack size as a result • Useless as a concurrency model • Useful for multiplexing operations • Try read, no data, go to next fiber

Slide 38

Slide 38 text

Fibers on JRuby • Yep, they’re just native threads • Transfer perf with j.u.c utils is pretty close • Resource load is very bad • Spin-up time is bad without thread pool • So early or occasional fibers cost a lot • Where are you, coro?!

Slide 39

Slide 39 text

Hard Decisions • ObjectSpace walks heap, off by default • Trace functions add overhead, off by default • Full coroutines not possible • C extension API too difficult to emulate • Perhaps only item to really hurt us

Slide 40

Slide 40 text

Native Integration • Process control • More selectable IO • FFI layer • C extension API • Misc

Slide 41

Slide 41 text

Ruby’s Roots • Matz is/was a C programmer • Early Ruby did little more than stitch C calls together • Some of those roots remain • ttys, fcntl, process control, IO, ext API • We knew we needed a solution

Slide 42

Slide 42 text

JNA, and then JNR • Started with jna-posix to map POSIX • stat, symlink, etc needed to do basics • JNR replaced JNA • Wayne Meissner started his empire…

Slide 43

Slide 43 text

The Cancer • Many off-platform runtimes are not as good as Hotspot • Many of their users must turn to C for perf • So, since many people use C exts on MRI, maybe we need to implement it? • Or get a student to do it…

Slide 44

Slide 44 text

MRI C Extensions • Very invasive API • Direct pointer access, object internals, conservative GC, threading constraints • Like bridging one JNI to another • Experimental in JRuby 1.6, gone in 1.7 • Will not revisit unless new API

Slide 45

Slide 45 text

FFI • Ruby API/DSL for binding C libs • Additional tools for generating that code • If you need to go native, it’s the best way • In use in production JRuby apps • ØMQ client, bson lib, sodium crypto, …

Slide 46

Slide 46 text

Ruby FFI example class Timeval < FFI::Struct! layout :tv_sec => :ulong,! :tv_usec => :ulong! end! ! module LibC! extend FFI::Library! ffi_lib FFI::Library::LIBC! attach_function :gettimeofday,! [ :pointer, :pointer ],! :int! end! ! t = Timeval.new! LibC.gettimeofday(t.pointer, nil)

Slide 47

Slide 47 text

Layered Runtime jffi jnr-ffi libffi jnr-posix jnr-constants ! jnr-enxio jnr-x86asm jnr-unixsocket etc etc

Slide 48

Slide 48 text

Native in JRuby • POSIX stuff missing from Java • Ruby FFI DSL for binding C libs • Stdio • selection, remove buffering, control tty • Process launching and control • !!!!!!

Slide 49

Slide 49 text

Process Control • Java’s ProcessBuilder/Process are bad • No channel access (no select!) • Spins up at least one thread per process • Drains child output ahead of you • New process API based on posix_spawn

Slide 50

Slide 50 text

in_c, in_p = IO.pipe out_p, out_c = IO.pipe ! pid = spawn('cat -n', :in => in_c, :out => out_c, :err => 'error.log') ! [in_c, out_c].each(&:close) ! in_p.puts("hello, world") in_p.close ! puts out_p.read # => " 1 hello, world" ! Process.waitpid(pid)

Slide 51

Slide 51 text

Usability • Backtraces • Command-line and launchers • Startup time

Slide 52

Slide 52 text

Backtraces • JVM backtraces make Rubyists’ eyes bleed • Initially, Ruby trace maintained manually • JIT emits mangled class to produce a Ruby trace element • AOT produces single class, mangled method name • Mixed-mode backtraces!

Slide 53

Slide 53 text

at java.lang.reflect.Method.invoke(Method.java:597) at org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:86) at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:234) at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1061) at groovy.lang.ExpandoMetaClass.invokeMethod(ExpandoMetaClass.java:910) at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:892) at groovy.lang.Closure.call(Closure.java:279) at org.codehaus.groovy.runtime.DefaultGroovyMethods.callClosureForMapEntry(DefaultGroovyMet hods.java:1911) at org.codehaus.groovy.runtime.DefaultGroovyMethods.each(DefaultGroovyMethods.java: 1184) at org.codehaus.groovy.runtime.dgm$88.invoke(Unknown Source) at org.codehaus.groovy.runtime.callsite.PojoMetaMethodSite $PojoMetaMethodSiteNoUnwrapNoCoerce.invoke(PojoMetaMethodSite.java:270) at org.codehaus.groovy.runtime.callsite.PojoMetaMethodSite.call(PojoMetaMethodSite.java:52) at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java: 124) at BootStrap.populateBootstrapData(BootStrap.groovy:786) at BootStrap.this$2$populateBootstrapData(BootStrap.groovy) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:86) at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:234) at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1061) at groovy.lang.ExpandoMetaClass.invokeMethod(ExpandoMetaClass.java:910) at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:892) at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1009) at groovy.lang.ExpandoMetaClass.invokeMethod(ExpandoMetaClass.java:910) at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:892) at

Slide 54

Slide 54 text

at org.jruby.javasupport.JavaMethod.invokeStaticDirect(JavaMethod.java:362) at org.jruby.java.invokers.StaticMethodInvoker.call(StaticMethodInvoker.java:50) at org.jruby.runtime.callsite.CachingCallSite.cacheAndCall(CachingCallSite.java:306) at org.jruby.runtime.callsite.CachingCallSite.call(CachingCallSite.java:136) at org.jruby.ast.CallNoArgNode.interpret(CallNoArgNode.java:60) at org.jruby.ast.NewlineNode.interpret(NewlineNode.java:105) at org.jruby.ast.RootNode.interpret(RootNode.java:129) at org.jruby.evaluator.ASTInterpreter.INTERPRET_EVAL(ASTInterpreter.java:95) at org.jruby.evaluator.ASTInterpreter.evalWithBinding(ASTInterpreter.java:184) at org.jruby.RubyKernel.evalCommon(RubyKernel.java:1158) at org.jruby.RubyKernel.eval19(RubyKernel.java:1121) at org.jruby.RubyKernel$INVOKER$s$0$3$eval19.call(RubyKernel$INVOKER$s$0$3$eval19.gen) at org.jruby.internal.runtime.methods.DynamicMethod.call(DynamicMethod.java:210) at org.jruby.internal.runtime.methods.DynamicMethod.call(DynamicMethod.java:206) at java.lang.invoke.MethodHandle.invokeWithArguments(MethodHandle.java:599) at org.jruby.runtime.invokedynamic.InvocationLinker.invocationFallback(InvocationLinker.java:155) at ruby.__dash_e__.method__1$RUBY$bar(-e:1) at java.lang.invoke.MethodHandle.invokeWithArguments(MethodHandle.java:599) at org.jruby.runtime.invokedynamic.InvocationLinker.invocationFallback(InvocationLinker.java:138) at ruby.__dash_e__.block_0$RUBY$foo(-e:1) at ruby$__dash_e__$block_0$RUBY$foo.call(ruby$__dash_e__$block_0$RUBY$foo) at org.jruby.runtime.CompiledBlock19.yieldSpecificInternal(CompiledBlock19.java:117) at org.jruby.runtime.CompiledBlock19.yieldSpecific(CompiledBlock19.java:92) at org.jruby.runtime.Block.yieldSpecific(Block.java:111) at org.jruby.RubyFixnum.times(RubyFixnum.java:275) at java.lang.invoke.MethodHandle.invokeWithArguments(MethodHandle.java:599) at org.jruby.runtime.invokedynamic.InvocationLinker.invocationFallback(InvocationLinker.java:230) at ruby.__dash_e__.method__0$RUBY$foo(-e:1) at java.lang.invoke.MethodHandle.invokeWithArguments(MethodHandle.java:599) at org.jruby.runtime.invokedynamic.InvocationLinker.invocationFallback(InvocationLinker.java:138) at ruby.__dash_e__.__file__(-e:1)

Slide 55

Slide 55 text

No content

Slide 56

Slide 56 text

• org.jruby.RubyFixnum.times • org.jruby.evaluator.ASTInterpreter.INTERPRET_EVAL • rubyjit.Object$ $foo_3AB1F5052668B3CD74A0B4CD4999CF6A65E9 2973271627940.__file__ • ruby.__dash_e__.method__0$RUBY$foo

Slide 57

Slide 57 text

Command Line • Rubyists typically are at CLI • Command line and tty must behave • Epic bash and .bat scripts • 300-500 lines of heinous shell script • Unusable in shebang lines • Repurposed NetBeans native launcher

Slide 58

Slide 58 text

system ~/projects/jruby $ time bin/jruby.bash -v jruby 9000.dev-SNAPSHOT (2.1.2) 2014-07-27 9cca1ec Java HotSpot(TM) 64-Bit Server VM 24.45-b08 on 1.7.0_45-b18 [darwin-x86_64] ! real 0m0.126s user 0m0.092s sys 0m0.031s ! system ~/projects/jruby $ time bin/jruby.bash -v jruby 9000.dev-SNAPSHOT (2.1.2) 2014-07-27 9cca1ec Java HotSpot(TM) 64-Bit Server VM 24.45-b08 on 1.7.0_45-b18 [darwin-x86_64] ! real 0m0.124s user 0m0.089s sys 0m0.033s ! system ~/projects/jruby $ time jruby -v jruby 9000.dev-SNAPSHOT (2.1.2) 2014-07-27 9cca1ec Java HotSpot(TM) 64-Bit Server VM 24.45-b08 on 1.7.0_45-b18 [darwin-x86_64] ! real 0m0.106s user 0m0.080s sys 0m0.022s ! system ~/projects/jruby $ time jruby -v jruby 9000.dev-SNAPSHOT (2.1.2) 2014-07-27 9cca1ec Java HotSpot(TM) 64-Bit Server VM 24.45-b08 on 1.7.0_45-b18 [darwin-x86_64] ! real 0m0.110s user 0m0.085s sys 0m0.023s

Slide 59

Slide 59 text

Console Support • Rubyists also typically use REPLs • Readline support is a must • jline has been forked all over the place • Looking into JNA-based readline now

Slide 60

Slide 60 text

CLI == Startup Time • BY FAR the #1 complaint • May be the only reason we haven’t won! • We’re trying everything we can

Slide 61

Slide 61 text

JRuby Startup -e 1 gem --help rake -T Time in seconds (lower is better) 0 2.5 5 7.5 10 C Ruby JRuby

Slide 62

Slide 62 text

Tweaking Flags • -client mode • -XX:+TieredCompilation -XX:TieredStopAtLevel=1 • -X-C to disable JRuby’s compiler • Heap sizes, code verification, etc etc

Slide 63

Slide 63 text

Nailgun? • Keep a single JVM running in background • Toss commands over to it • It stays hot, so code starts faster • Hard to clean up all state (e.g. threads) • Can’t get access to user’s terminal • http://www.martiansoftware.com/nailgun/

Slide 64

Slide 64 text

Drip Isolated JVM Application Command #1 Isolated JVM Application Command #1 Isolated JVM Application Command #1

Slide 65

Slide 65 text

Drip • Start a new JVM after each command • Pre-boot JVM plus optional code • Analyze command line for differences • Age out unused instances • https://github.com/flatland/drip

Slide 66

Slide 66 text

Drip Init • Give Drip some code to pre-boot • Load more libraries • Warm up some code • Pre-execution initialization • Run as much as possible in background • We also pre-load ./dripmain.rb if exists

Slide 67

Slide 67 text

$ cat dripmain.rb # Preload some code Rails always needs require File.expand_path('../config/application', __FILE__)

Slide 68

Slide 68 text

JRuby Startup rake -T Time in seconds (lower is better) 0 2.5 5 7.5 10 C Ruby JRuby JRuby (best) JRuby (drip) JRuby (drip init) JRuby (dripmain)

Slide 69

Slide 69 text

CONCLUSION

Slide 70

Slide 70 text

Hard Parts • 64k bytecode limit • Falling over JIT limits • String char[] pain • Startup and warmup • Coroutines • FFI at JVM level • Too many flags • Tiered compiler slow • Interpreter opto • Bytecode is a blunt tool • Indy has taken too long • Charlie may burn out

Slide 71

Slide 71 text

Thank You! • Charles Oliver Nutter • @headius • [email protected] • http://blog.headius.com