JRuby: The Hard Parts

F1d37642fdaa1662ff46e4c65731e9ab?s=47 headius
July 28, 2014

JRuby: The Hard Parts

A survey of all the hard problems JRuby developers have had to solve, whether the JVM likes it or not. Topics include parsing, interpreting, compiling, optimization, native libraries, posix, startup time, console features, and much more.

F1d37642fdaa1662ff46e4c65731e9ab?s=128

headius

July 28, 2014
Tweet

Transcript

  1. The Hard Parts

  2. Subverting the JVM All the tricks, hacks, and kludges we’ve

    use to make JRuby the best off-JVM language impl around.
  3. Intro • Charles Oliver Nutter • Principal Software Engineer •

    Red Hat, JBoss Polyglot Group • @headius • headius@headius.com
  4. Welcome! • My favorite event of the year • I’ve

    only missed one! • I will quickly talk through JRuby challenges • Not a comprehensive list. Buy me a beer. • Rest of you can help solve them
  5. Ruby • Dynamic, object-oriented language • Created in 90s by

    Yukihiro Matsumoto • “matz” • Matz’s Ruby Interpreter (MRI) • Inspired by Python, Perl, Lisp, Smalltalk • Memes: TMTOWTDI, MINASWAN, CoC,
  6. # Output "I love Ruby"! say = "I love Ruby"!

    puts say! ! # Output "I *LOVE* RUBY"! say['love'] = "*love*"! puts say.upcase! ! # Output "I *love* Ruby"! # five times! 5.times { puts say }!
  7. JRuby • Ruby for the JVM and JVM for the

    Ruby • Started in 2001, dozens of contribs • Usually the fastest Ruby • At least 20 paid full-time man years in it • Sun Microsystems, Engine Yard, Red Hat
  8. Ruby is Hard to Implement!

  9. Making It Go (Fast) • Parser-generator hacks • Multiple interpreters

    • Multiple compilers • JVM-specific tricks
  10. Parsing Ruby • Yacc/Bison-based parse.y, almost 12kloc • Very complex,

    not context-free • No known 100% correct parser that is not YACC-based
  11. None
  12. None
  13. None
  14. JRuby’s Parser • Jay parser generator • Maybe 5 projects

    in the world use it • Our version of parse.y = 4kloc • Two pieces, one is for offline parsing • Works ok, but…
  15. Parser Problems! • Array initialization > 65k bytecode • Giant

    switch won’t JIT • Outlining the case bodies: better • Case bodies as runnables in machine: best • org/jruby/parser/RubyParser$445.class • Slow at startup (most important time!)
  16. Interpreter • At least four interpreters we’ve tried • Original:

    visitor-based • Modified: big switch rather than visitor • Experimental: stackless instr-based • Current: direct execution of AST • Execution state on artificial stack
  17. The New Way • JRuby 9000 introduces a new IR

    • Traditional-style compiler IR • Register-based • CFG, semantic analysis, type and constant propagation, all that jazz • Interpreter has proven it out…JIT next
  18. Mixed-Mode • JRuby has both interpreter and JIT • Cost

    of generating JVM bytecode is high • Our interpreter runs faster than JVM’s • A jitted interpreter is (much) faster than unjitted bytecode
  19. Native Execution • Early JIT compiler just translated AST •

    Bare-minimum semantic analysis • Eliminate artificial frame use • One-off opto for frequent patterns • Too unwieldy to evolve much
  20. New IR JIT • Builds off IR runtime • Per-instruction

    bytecode gen is simple • JVM frame is like infinite register machine • Potential to massively improve perf • Early unboxing numbers…
  21. Numeric loop performance 0 1.25 2.5 3.75 5 times faster

    than MRI 2.1 JRuby 1.7 Rubinius
  22. Numeric loop performance 0 15 30 45 60 times faster

    than MRI 2.1 JRuby 1.7 Rubinius Truffle Topaz 9k+unbox
  23. mandelbrot(500) 0 10 20 30 40 times faster than MRI

    2.1 JRuby 9k + indy JRuby 9k + unboxing JRuby 9k + Truffle
  24. Whither Truffle? • RubyTruffle merged into JRuby • Same licenses

    as rest of JRuby • Chris Seaton continues to work on it • Very impressive peak numbers • Startup, steady-state…needs work • Considering initial use for targeted opto
  25. JVM Tricks • Lack of class hierarchy analysis in JIT

    • Manually split methods to beat limits • Everything is an expression, so exception- handling has to maintain current stack • Tweaking JIT flags will just make you sad • Unsafe
  26. IRubyObject public RubyClass getMetaClass(); RubyBasicObject private RubyClass metaClass; public RubyClass

    getMetaClass() { return metaClass; } RubyString RubyArray RubyObject obj.getMetaClass()
  27. public static RubyClass metaclass(IRubyObject object) {
 return object instanceof RubyBasicObject

    ?
 ((RubyBasicObject)object).getMetaClass() :
 object.getMetaClass();
 }
  28. Compatibility • Strings and Encodings • IO • Fibers •

    Difficult choices
  29. Strings • All arbitrary-width byte data is String • Binary

    data and encoded text alike • Many supported encodings • j.l.String, char[] poor options • Size, data integrity, behavioral differences
  30. The First Big Decision • We realized we needed a

    byte[] String • Had been StringBuilder-based until then • That meant a lot of porting… • Regex engine (joni) • Encoding subsystem (jcodings) • Low-level IO + transcoding (in JRuby)
  31. JOni • Port of Oniguruma regex library • Pluggable grammars

    + arbitrary encodings • Bytecode engine (shallow call stack) • Interruptible • Re-forked as char[] engine for Nashorn • https://github.com/jruby/joni
  32. Data: ‘a’-‘z’ in byte[] Match /.*tuv(..)yz$/ 0s 1.5s 3s 4.5s

    6s j.u.regex JOni
  33. Data: ‘a’-‘z’ from IO Match /.*tuv(..)yz$/ 0s 0.7s 1.4s 2.1s

    2.8s j.u.regex JOni
  34. Jcodings • Character tables • Used heavily by JOni and

    JRuby • Transcoding tables and logic • Replaces Charset logic from JRuby 1.7 • https://github.com/jruby/jcodings
  35. NO GRAPH NEEDED

  36. JRuby 9000 • Finished porting, connecting transcoders • New port

    of IO operations • Transcoding works directly against IO buffers; hard to simulate other ways • Lots of fun native (C) calls to emulate…
  37. Fibers • Coroutines, goroutines, continuations • MRI uses stack-swapping •

    And limits Fiber stack size as a result • Useless as a concurrency model • Useful for multiplexing operations • Try read, no data, go to next fiber
  38. Fibers on JRuby • Yep, they’re just native threads •

    Transfer perf with j.u.c utils is pretty close • Resource load is very bad • Spin-up time is bad without thread pool • So early or occasional fibers cost a lot • Where are you, coro?!
  39. Hard Decisions • ObjectSpace walks heap, off by default •

    Trace functions add overhead, off by default • Full coroutines not possible • C extension API too difficult to emulate • Perhaps only item to really hurt us
  40. Native Integration • Process control • More selectable IO •

    FFI layer • C extension API • Misc
  41. Ruby’s Roots • Matz is/was a C programmer • Early

    Ruby did little more than stitch C calls together • Some of those roots remain • ttys, fcntl, process control, IO, ext API • We knew we needed a solution
  42. JNA, and then JNR • Started with jna-posix to map

    POSIX • stat, symlink, etc needed to do basics • JNR replaced JNA • Wayne Meissner started his empire…
  43. The Cancer • Many off-platform runtimes are not as good

    as Hotspot • Many of their users must turn to C for perf • So, since many people use C exts on MRI, maybe we need to implement it? • Or get a student to do it…
  44. MRI C Extensions • Very invasive API • Direct pointer

    access, object internals, conservative GC, threading constraints • Like bridging one JNI to another • Experimental in JRuby 1.6, gone in 1.7 • Will not revisit unless new API
  45. FFI • Ruby API/DSL for binding C libs • Additional

    tools for generating that code • If you need to go native, it’s the best way • In use in production JRuby apps • ØMQ client, bson lib, sodium crypto, …
  46. Ruby FFI example class Timeval < FFI::Struct! layout :tv_sec =>

    :ulong,! :tv_usec => :ulong! end! ! module LibC! extend FFI::Library! ffi_lib FFI::Library::LIBC! attach_function :gettimeofday,! [ :pointer, :pointer ],! :int! end! ! t = Timeval.new! LibC.gettimeofday(t.pointer, nil)
  47. Layered Runtime jffi jnr-ffi libffi jnr-posix jnr-constants ! jnr-enxio jnr-x86asm

    jnr-unixsocket etc etc
  48. Native in JRuby • POSIX stuff missing from Java •

    Ruby FFI DSL for binding C libs • Stdio • selection, remove buffering, control tty • Process launching and control • !!!!!!
  49. Process Control • Java’s ProcessBuilder/Process are bad • No channel

    access (no select!) • Spins up at least one thread per process • Drains child output ahead of you • New process API based on posix_spawn
  50. in_c, in_p = IO.pipe out_p, out_c = IO.pipe ! pid

    = spawn('cat -n', :in => in_c, :out => out_c, :err => 'error.log') ! [in_c, out_c].each(&:close) ! in_p.puts("hello, world") in_p.close ! puts out_p.read # => " 1 hello, world" ! Process.waitpid(pid)
  51. Usability • Backtraces • Command-line and launchers • Startup time

  52. Backtraces • JVM backtraces make Rubyists’ eyes bleed • Initially,

    Ruby trace maintained manually • JIT emits mangled class to produce a Ruby trace element • AOT produces single class, mangled method name • Mixed-mode backtraces!
  53. at java.lang.reflect.Method.invoke(Method.java:597) at org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:86) at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:234) at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1061) at groovy.lang.ExpandoMetaClass.invokeMethod(ExpandoMetaClass.java:910)

    at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:892) at groovy.lang.Closure.call(Closure.java:279) at org.codehaus.groovy.runtime.DefaultGroovyMethods.callClosureForMapEntry(DefaultGroovyMet hods.java:1911) at org.codehaus.groovy.runtime.DefaultGroovyMethods.each(DefaultGroovyMethods.java: 1184) at org.codehaus.groovy.runtime.dgm$88.invoke(Unknown Source) at org.codehaus.groovy.runtime.callsite.PojoMetaMethodSite $PojoMetaMethodSiteNoUnwrapNoCoerce.invoke(PojoMetaMethodSite.java:270) at org.codehaus.groovy.runtime.callsite.PojoMetaMethodSite.call(PojoMetaMethodSite.java:52) at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java: 124) at BootStrap.populateBootstrapData(BootStrap.groovy:786) at BootStrap.this$2$populateBootstrapData(BootStrap.groovy) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:86) at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:234) at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1061) at groovy.lang.ExpandoMetaClass.invokeMethod(ExpandoMetaClass.java:910) at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:892) at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1009) at groovy.lang.ExpandoMetaClass.invokeMethod(ExpandoMetaClass.java:910) at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:892) at
  54. at org.jruby.javasupport.JavaMethod.invokeStaticDirect(JavaMethod.java:362) at org.jruby.java.invokers.StaticMethodInvoker.call(StaticMethodInvoker.java:50) at org.jruby.runtime.callsite.CachingCallSite.cacheAndCall(CachingCallSite.java:306) at org.jruby.runtime.callsite.CachingCallSite.call(CachingCallSite.java:136) at org.jruby.ast.CallNoArgNode.interpret(CallNoArgNode.java:60)

    at org.jruby.ast.NewlineNode.interpret(NewlineNode.java:105) at org.jruby.ast.RootNode.interpret(RootNode.java:129) at org.jruby.evaluator.ASTInterpreter.INTERPRET_EVAL(ASTInterpreter.java:95) at org.jruby.evaluator.ASTInterpreter.evalWithBinding(ASTInterpreter.java:184) at org.jruby.RubyKernel.evalCommon(RubyKernel.java:1158) at org.jruby.RubyKernel.eval19(RubyKernel.java:1121) at org.jruby.RubyKernel$INVOKER$s$0$3$eval19.call(RubyKernel$INVOKER$s$0$3$eval19.gen) at org.jruby.internal.runtime.methods.DynamicMethod.call(DynamicMethod.java:210) at org.jruby.internal.runtime.methods.DynamicMethod.call(DynamicMethod.java:206) at java.lang.invoke.MethodHandle.invokeWithArguments(MethodHandle.java:599) at org.jruby.runtime.invokedynamic.InvocationLinker.invocationFallback(InvocationLinker.java:155) at ruby.__dash_e__.method__1$RUBY$bar(-e:1) at java.lang.invoke.MethodHandle.invokeWithArguments(MethodHandle.java:599) at org.jruby.runtime.invokedynamic.InvocationLinker.invocationFallback(InvocationLinker.java:138) at ruby.__dash_e__.block_0$RUBY$foo(-e:1) at ruby$__dash_e__$block_0$RUBY$foo.call(ruby$__dash_e__$block_0$RUBY$foo) at org.jruby.runtime.CompiledBlock19.yieldSpecificInternal(CompiledBlock19.java:117) at org.jruby.runtime.CompiledBlock19.yieldSpecific(CompiledBlock19.java:92) at org.jruby.runtime.Block.yieldSpecific(Block.java:111) at org.jruby.RubyFixnum.times(RubyFixnum.java:275) at java.lang.invoke.MethodHandle.invokeWithArguments(MethodHandle.java:599) at org.jruby.runtime.invokedynamic.InvocationLinker.invocationFallback(InvocationLinker.java:230) at ruby.__dash_e__.method__0$RUBY$foo(-e:1) at java.lang.invoke.MethodHandle.invokeWithArguments(MethodHandle.java:599) at org.jruby.runtime.invokedynamic.InvocationLinker.invocationFallback(InvocationLinker.java:138) at ruby.__dash_e__.__file__(-e:1)
  55. None
  56. • org.jruby.RubyFixnum.times • org.jruby.evaluator.ASTInterpreter.INTERPRET_EVAL • rubyjit.Object$ $foo_3AB1F5052668B3CD74A0B4CD4999CF6A65E9 2973271627940.__file__ • ruby.__dash_e__.method__0$RUBY$foo

  57. Command Line • Rubyists typically are at CLI • Command

    line and tty must behave • Epic bash and .bat scripts • 300-500 lines of heinous shell script • Unusable in shebang lines • Repurposed NetBeans native launcher
  58. system ~/projects/jruby $ time bin/jruby.bash -v jruby 9000.dev-SNAPSHOT (2.1.2) 2014-07-27

    9cca1ec Java HotSpot(TM) 64-Bit Server VM 24.45-b08 on 1.7.0_45-b18 [darwin-x86_64] ! real 0m0.126s user 0m0.092s sys 0m0.031s ! system ~/projects/jruby $ time bin/jruby.bash -v jruby 9000.dev-SNAPSHOT (2.1.2) 2014-07-27 9cca1ec Java HotSpot(TM) 64-Bit Server VM 24.45-b08 on 1.7.0_45-b18 [darwin-x86_64] ! real 0m0.124s user 0m0.089s sys 0m0.033s ! system ~/projects/jruby $ time jruby -v jruby 9000.dev-SNAPSHOT (2.1.2) 2014-07-27 9cca1ec Java HotSpot(TM) 64-Bit Server VM 24.45-b08 on 1.7.0_45-b18 [darwin-x86_64] ! real 0m0.106s user 0m0.080s sys 0m0.022s ! system ~/projects/jruby $ time jruby -v jruby 9000.dev-SNAPSHOT (2.1.2) 2014-07-27 9cca1ec Java HotSpot(TM) 64-Bit Server VM 24.45-b08 on 1.7.0_45-b18 [darwin-x86_64] ! real 0m0.110s user 0m0.085s sys 0m0.023s
  59. Console Support • Rubyists also typically use REPLs • Readline

    support is a must • jline has been forked all over the place • Looking into JNA-based readline now
  60. CLI == Startup Time • BY FAR the #1 complaint

    • May be the only reason we haven’t won! • We’re trying everything we can
  61. JRuby Startup -e 1 gem --help rake -T Time in

    seconds (lower is better) 0 2.5 5 7.5 10 C Ruby JRuby
  62. Tweaking Flags • -client mode • -XX:+TieredCompilation -XX:TieredStopAtLevel=1 • -X-C

    to disable JRuby’s compiler • Heap sizes, code verification, etc etc
  63. Nailgun? • Keep a single JVM running in background •

    Toss commands over to it • It stays hot, so code starts faster • Hard to clean up all state (e.g. threads) • Can’t get access to user’s terminal • http://www.martiansoftware.com/nailgun/
  64. Drip Isolated JVM Application Command #1 Isolated JVM Application Command

    #1 Isolated JVM Application Command #1
  65. Drip • Start a new JVM after each command •

    Pre-boot JVM plus optional code • Analyze command line for differences • Age out unused instances • https://github.com/flatland/drip
  66. Drip Init • Give Drip some code to pre-boot •

    Load more libraries • Warm up some code • Pre-execution initialization • Run as much as possible in background • We also pre-load ./dripmain.rb if exists
  67. $ cat dripmain.rb # Preload some code Rails always needs

    require File.expand_path('../config/application', __FILE__)
  68. JRuby Startup rake -T Time in seconds (lower is better)

    0 2.5 5 7.5 10 C Ruby JRuby JRuby (best) JRuby (drip) JRuby (drip init) JRuby (dripmain)
  69. CONCLUSION

  70. Hard Parts • 64k bytecode limit • Falling over JIT

    limits • String char[] pain • Startup and warmup • Coroutines • FFI at JVM level • Too many flags • Tiered compiler slow • Interpreter opto • Bytecode is a blunt tool • Indy has taken too long • Charlie may burn out
  71. Thank You! • Charles Oliver Nutter • @headius • headius@headius.com

    • http://blog.headius.com