Upgrade to Pro — share decks privately, control downloads, hide ads and more …

JRuby 9000 - Taipei Ruby User's Group

headius
December 18, 2015

JRuby 9000 - Taipei Ruby User's Group

Talk on JRuby 9000 delivered at 5xRuby in Taipei on December 1 2015.

headius

December 18, 2015
Tweet

More Decks by headius

Other Decks in Programming

Transcript

  1. Shoulders of Giants JVM J. Rose J. Rose J. Rose

    J. Rose J. Rose J. Rose J. Rose J. Rose J. Rose Hiro Marcin Nahi Subbu Douglas Christian Dmitry Tom Charlie JRuby
  2. All the stuff! JVM J. Rose J. Rose J. Rose

    J. Rose J. Rose J. Rose J. Rose J. Rose J. Rose Garbage Collection Native JIT Profiled Optimizations Native Threading Tooling Cross Platform
  3. Can leverage Java Ecosystem 47k libraries in Maven Hadoop EHCache

    Selenium Sitemesh Lucene Neo4j JMonkeyEngine
  4. red/black tree, pure Ruby versus native ruby-2.0.0 + Ruby ruby-2.0.0

    + C ext jruby + Ruby Runtime per iteration 0 0.75 1.5 2.25 3 0.29s 0.51s 2.48s
  5. GC

  6. GC Matters • Applications grow over time • Ruby is

    very object-heavy • Multiprocess multiplies the problem • You will eventually have issues
  7. class Simple attr_accessor :next end top = Simple.new puts Benchmark.measure

    { outer = 10 total = 100000 per = 100 outer.times do total.times do per.times { Simple.new } s = Simple.new top.next = s top = s end end }
  8. Real Parallellism • Ruby thread = JVM thread = native

    thread • One process can use all cores • One server can handle all requests
  9. Per-iteration time versus thread count 0.2s 0.35s 0.5s 0.65s 0.8s

    one thread two threads three threads four threads threaded_reverse
  10. Profiling • Java profilers • VisualVM, YourKit, NetBeans, JXInsight •

    jruby [--profile | --profile.graph] • JVM command-line profilers
  11. VisualVM • CPU, memory, thread monitoring • CPU and memory

    profiling • VisualGC • Heap analysis
  12. Purugin • Nearly 100% Ruby wrapper • Thin shim makes

    Java feel very Ruby-like • It’s Minecraft!
  13. Egg Madness class EggMadnessPlugin include Purugin::Plugin description 'EggMadness', 0.1 def

    on_enable event(:player_egg_throw) do |e| e.hatching = true e.num_hatches = 50 e.hatching_type = :chicken end end end
  14. 1.7.23 JRuby Roadmap master master 1.7.3 1.7.4 1.7.5 1.7.6 ...

    1.7.7 1.7.22 ... 9.0.4 2.2 1.8, 1.9 jruby-1_7 jruby-1_7 jruby-1_7 master 2.2 1.8, 1.9 End of week Last Friday
  15. JRuby 9000 • Ruby 2.2 • New runtime (IR) •

    Major IO and Encodings overhaul
  16. Block Jitting • JRuby 1.7 only jitted methods • Not

    free-standing procs/lambdas • Not define_method blocks • Easier to do now with 9000's IR • Blocks JIT now in 9.0.4.0
  17. Jitting is Winning Performance of define_method in loaded file 0k

    iters/s 750k iters/s 1500k iters/s 2250k iters/s 3000k iters/s MRI JRuby 9.0.1.0 JRuby 9.0.4.0 normal method define_method method ruby -e 'load "bench_define_method.rb"'
  18. define_method Convenient for metaprogramming, but blocks have more overhead than

    methods. define_method(:add) do |a, b|
 a + b
 end names.each do |name|
 define_method(name) { send :"do_#{name}" }
 end
  19. :-( 0k iters/s 1000k iters/s 2000k iters/s 3000k iters/s 4000k

    iters/s MRI JRuby 9.0.1.0 def define_method define_method w/ capture
  20. Optimizing define_method • Noncapturing • Treat as method in compiler

    • Ignore surrounding scope • Capturing (future work) • Lift read-only variables as constant
  21. Getting Better! 0k iters/s 1000k iters/s 2000k iters/s 3000k iters/s

    4000k iters/s MRI JRuby 9.0.1.0 JRuby 9.0.4.0 def define_method define_method w/ capture
  22. Reduced-cost Exceptions • Backtrace cost is VERY high on JVM

    • Heavily optimized, lots of work to build • Exceptions frequently ignored • ...or used as flow control (shame!) • If ignored, backtrace is not needed!
  23. Postfix Antipattern foo rescue nil Exception raised StandardError rescued Exception

    ignored Result is simple expression, so exception is never visible.
  24. csv.rb Converters Converters = { integer: lambda { |f|
 Integer(f.encode(ConverterEncoding))

    rescue f
 },
 float: lambda { |f|
 Float(f.encode(ConverterEncoding)) rescue f
 },
 ... All trivial rescues, no traces needed.
  25. Lexical Analysis Parsing Semantic Analysis Optimization Bytecode Generation Interpret AST

    IR Instructions CFG DFG ... JRuby 1.7.x 9000+ Dalvik Generation ...
  26. def foo(a, b) c = 1 d = a +

    c end 0 check_arity(2, 0, -1) 1 a = recv_pre_reqd_arg(0) 2 b = recv_pre_reqd_arg(1) 3 %block = recv_closure 4 thread_poll 5 line_num(1) 6 c = 1 7 line_num(2) 8 %v_0 = call(:+, a, [c]) 9 d = copy(%v_0) 10 return(%v_0) Register-based 3 address format IR Instructions Semantic Analysis
  27. -Xir.passes=LocalOptimizationPass, DeadCodeElimination def foo(a, b) c = 1 d =

    a + c end 0 check_arity(2, 0, -1) 1 a = recv_pre_reqd_arg(0) 2 b = recv_pre_reqd_arg(1) 3 %block = recv_closure 4 thread_poll 5 line_num(1) 6 c = 1 7 line_num(2) 8 %v_0 = call(:+, a, [c]) 9 d = copy(%v_0) 10 return(%v_0) Optimization
  28. def foo(a, b) c = 1 d = a +

    c end 0 check_arity(2, 0, -1) 1 a = recv_pre_reqd_arg(0) 4 thread_poll 5 line_num(1) 6 c = 1 7 line_num(2) 8 %v_0 = call(:+, a, [c]) 9 d = copy(%v_0) 10 return(%v_0) 0 check_arity(2, 0, -1) 1 a = recv_pre_reqd_arg(0) 4 thread_poll 5 line_num(1) 6 c = 7 line_num(2) 8 %v_0 = call(:+, a, [ ]) 9 d = copy(%v_0) 10 return(%v_0) 1 0 check_arity(2, 0, -1) 1 a = recv_pre_reqd_arg(0) 4 thread_poll 5 line_num(1) 7 line_num(2) 8 %v_0 = call(:+, a, [1]) 9 d = copy(%v_0) 10 return(%v_0) Optimization -Xir.passes=LocalOptimizationPass, DeadCodeElimination
  29. 0 check_arity(2, 0, -1) 1 a = recv_pre_reqd_arg(0) 4 thread_poll

    5 line_num(1) 7 line_num(2) 8 %v_0 = call(:+, a, [1]) 9 d = copy(%v_0) 10 return(%v_0) 0 check_arity(2, 0, -1) 1 a = recv_pre_reqd_arg(0) 4 thread_poll 7 line_num(2) 8 %v_0 = call(:+, a, [1]) 9 d = copy(%v_0) 10 return(%v_0) Optimization -Xir.passes=LocalOptimizationPass, DeadCodeElimination
  30. Inlining • 500 pound gorilla of optimizations • shove method/closure

    back to callsite • eliminate stack frame • eliminate parameter passing/return • eliminate additional allocation Optimization
  31. Today’s Inliner def decrement_one(i) i - 1 end i =

    1_000_000 while i > 0 i = decrement_one(i) end def decrement_one(i) i - 1 end i = 1_000_000 while i < 0 if guard_same? self i = i - 1 else i = decrement_one(i) end end
  32. Numeric Specialization • Everything's an object • JVM has only

    references and primitives • Not compatible in bytecode • Need to optimize numerics as primitive
  33. def looper(n)
 i = 0
 while i < n
 do_something(i)


    i += 1
 end
 end Cached object Call with i New Fixnum i + 1 Probably a Fixnum?
  34. def looper(n)
 i = 0
 while i < n
 do_something(i)


    i += 1
 end
 end def looper(long n)
 long i = 0
 while i < n
 do_something(i)
 i += 1
 end
 end Specialize n, i to long def looper(n)
 i = 0
 while i < n
 do_something(i)
 i += 1
 end
 end Deopt to object version if n or i + 1 is not Fixnum
  35. JVM Futures • We're good friends with OpenJDK folks •

    Working to improve JVM as well • FFI being added at JVM level • AOT compilation for startup perf
  36. FFI in JVM • Project Panama (JEP-191) • Native support

    for FFI • Code generators for binding • JIT support for calling • API support for userland
  37. JIT Magic callq <getpid address> ; - libSystem.B.dylib ;*invokeinterface getpid

    ; - GetPidJNRExample::benchGetPid@13 (line 26) ; {optimized virtual_call} Direct call from JITed Ruby code
  38. Startup Time • By far our greatest challenge • Everything

    starts cold: parser, interpreter, compiler, core classes, boot logic • Increasing amount of Ruby in JRuby • Aggravates the problem
  39. JRuby Startup C Ruby JRuby Time in seconds (lower is

    better) 0s 3.5s 7s 10.5s 14s -e 1 gem --list rake -T in Rals app
  40. --dev • Disables JRuby JIT • Sets JVM to reduced

    optimization mode • 50% reduction in startup time • Much lower peak perf
  41. JRuby --dev C Ruby JRuby JRuby --dev Time in seconds

    (lower is better) 0s 3.5s 7s 10.5s 14s -e 1 gem --list rake -T in Rals app
  42. AOT • Precompile JVM bytecode to native • Focus on

    hot code • Save original structure for optimization • Get JRuby running native right away • AOT compile Ruby to native in future
  43. Getting There C Ruby JRuby JRuby --dev Non-opto AOT Opto

    AOT Time in seconds (lower is better) 0s 3.5s 7s 10.5s 14s rake -T in Rails app
  44. AOT Future • AOT might be available in Java 9

    • Many tweaks we can make to help it • Ideal: all code run at boot runs native • Should get closer to MRI