Slide 1

Slide 1 text

JRuby 9000 Charles O. Nutter JRuby; Red Hat

Slide 2

Slide 2 text

JRuby is Ruby… (on the JVM...shhhh!)

Slide 3

Slide 3 text

Why the JVM is great for Ruby!

Slide 4

Slide 4 text

Shoulders of Giants JVM J. Rose J. Rose J. Rose J. Rose J. Rose J. Rose J. Rose J. Rose J. Rose Hiro Marcin Nahi Subbu Douglas Christian Dmitry Tom Charlie JRuby

Slide 5

Slide 5 text

All the stuff! JVM J. Rose J. Rose J. Rose J. Rose J. Rose J. Rose J. Rose J. Rose J. Rose Garbage Collection Native JIT Profiled Optimizations Native Threading Tooling Cross Platform

Slide 6

Slide 6 text

Can leverage Java Ecosystem 47k libraries in Maven Hadoop EHCache Selenium Sitemesh Lucene Neo4j JMonkeyEngine

Slide 7

Slide 7 text

Polyglot Clojure Scala Groovy Jython Rhino/Nashorn/ DynJS (JavaScript) Micro Focus JVM Visual COBOL Java

Slide 8

Slide 8 text

red/black tree, pure Ruby versus native ruby-2.0.0 + Ruby ruby-2.0.0 + C ext jruby + Ruby Runtime per iteration 0 0.75 1.5 2.25 3 0.29s 0.51s 2.48s

Slide 9

Slide 9 text

GC

Slide 10

Slide 10 text

GC Matters • Applications grow over time • Ruby is very object-heavy • Multiprocess multiplies the problem • You will eventually have issues

Slide 11

Slide 11 text

gc_demo.rb • Heavy GC, mix of old and young • Steadily growing heap use

Slide 12

Slide 12 text

class Simple attr_accessor :next end top = Simple.new puts Benchmark.measure { outer = 10 total = 100000 per = 100 outer.times do total.times do per.times { Simple.new } s = Simple.new top.next = s top = s end end }

Slide 13

Slide 13 text

0 750 1500 2250 3000 GC count Ruby 2.1.1 JRuby

Slide 14

Slide 14 text

1 10 100 1000 10000 GC count Ruby 2.1.1 JRuby

Slide 15

Slide 15 text

0s 0.45s 0.9s 1.35s 1.8s GC time % Ruby 2.2.2 JRuby

Slide 16

Slide 16 text

Threads

Slide 17

Slide 17 text

Real Parallellism • Ruby thread = JVM thread = native thread • One process can use all cores • One server can handle all requests

Slide 18

Slide 18 text

Ruby 2.2 unthreaded Ruby 2.2 threaded JRuby unthreaded JRuby threaded

Slide 19

Slide 19 text

Per-iteration time versus thread count 0.2s 0.35s 0.5s 0.65s 0.8s one thread two threads three threads four threads threaded_reverse

Slide 20

Slide 20 text

Tools

Slide 21

Slide 21 text

Profiling • Java profilers • VisualVM, YourKit, NetBeans, JXInsight • jruby [--profile | --profile.graph] • JVM command-line profilers

Slide 22

Slide 22 text

VisualVM • CPU, memory, thread monitoring • CPU and memory profiling • VisualGC • Heap analysis

Slide 23

Slide 23 text

No content

Slide 24

Slide 24 text

No content

Slide 25

Slide 25 text

No content

Slide 26

Slide 26 text

Scripting Java

Slide 27

Slide 27 text

Purugin • Nearly 100% Ruby wrapper • Thin shim makes Java feel very Ruby-like • It’s Minecraft!

Slide 28

Slide 28 text

Egg Madness

Slide 29

Slide 29 text

Egg Madness class EggMadnessPlugin include Purugin::Plugin description 'EggMadness', 0.1 def on_enable event(:player_egg_throw) do |e| e.hatching = true e.num_hatches = 50 e.hatching_type = :chicken end end end

Slide 30

Slide 30 text

1.7.23 JRuby Roadmap master master 1.7.3 1.7.4 1.7.5 1.7.6 ... 1.7.7 1.7.22 ... 9.0.4 2.2 1.8, 1.9 jruby-1_7 jruby-1_7 jruby-1_7 master 2.2 1.8, 1.9 End of week Last Friday

Slide 31

Slide 31 text

JRuby 9000 • Ruby 2.2 • New runtime (IR) • Major IO and Encodings overhaul

Slide 32

Slide 32 text

“It’s over 9000!!!!”

Slide 33

Slide 33 text

Now What?

Slide 34

Slide 34 text

PERFORMANCE WORK topic of this talk Now and into not so distant future!

Slide 35

Slide 35 text

Recent Wins • JITable blocks • define_method performance • Reduced-cost transient exceptions

Slide 36

Slide 36 text

Block Jitting • JRuby 1.7 only jitted methods • Not free-standing procs/lambdas • Not define_method blocks • Easier to do now with 9000's IR • Blocks JIT now in 9.0.4.0

Slide 37

Slide 37 text

Jitting is Winning Performance of define_method in loaded file 0k iters/s 750k iters/s 1500k iters/s 2250k iters/s 3000k iters/s MRI JRuby 9.0.1.0 JRuby 9.0.4.0 normal method define_method method ruby -e 'load "bench_define_method.rb"'

Slide 38

Slide 38 text

define_method Convenient for metaprogramming, but blocks have more overhead than methods. define_method(:add) do |a, b|
 a + b
 end names.each do |name|
 define_method(name) { send :"do_#{name}" }
 end

Slide 39

Slide 39 text

:-( 0k iters/s 1000k iters/s 2000k iters/s 3000k iters/s 4000k iters/s MRI JRuby 9.0.1.0 def define_method define_method w/ capture

Slide 40

Slide 40 text

Optimizing define_method • Noncapturing • Treat as method in compiler • Ignore surrounding scope • Capturing (future work) • Lift read-only variables as constant

Slide 41

Slide 41 text

Getting Better! 0k iters/s 1000k iters/s 2000k iters/s 3000k iters/s 4000k iters/s MRI JRuby 9.0.1.0 JRuby 9.0.4.0 def define_method define_method w/ capture

Slide 42

Slide 42 text

Reduced-cost Exceptions • Backtrace cost is VERY high on JVM • Heavily optimized, lots of work to build • Exceptions frequently ignored • ...or used as flow control (shame!) • If ignored, backtrace is not needed!

Slide 43

Slide 43 text

Postfix Antipattern foo rescue nil Exception raised StandardError rescued Exception ignored Result is simple expression, so exception is never visible.

Slide 44

Slide 44 text

csv.rb Converters Converters = { integer: lambda { |f|
 Integer(f.encode(ConverterEncoding)) rescue f
 },
 float: lambda { |f|
 Float(f.encode(ConverterEncoding)) rescue f
 },
 ... All trivial rescues, no traces needed.

Slide 45

Slide 45 text

Simple rescue Improvement 0 150000 300000 450000 600000 Iters/second 524,475 10,700

Slide 46

Slide 46 text

Nearly Two Magnitudes! 1 10 100 1000 10000 100000 1000000 Iters/second 524,475 10,700

Slide 47

Slide 47 text

New Runtime? • AST to semantic representation • Traditional Compiler Design • Wanted Architectural longevity


Slide 48

Slide 48 text

Lexical Analysis Parsing Semantic Analysis Optimization Bytecode Generation Interpret AST IR Instructions CFG DFG ... JRuby 1.7.x 9000+ Dalvik Generation ...

Slide 49

Slide 49 text

def foo(a, b) c = 1 d = a + c end 0 check_arity(2, 0, -1) 1 a = recv_pre_reqd_arg(0) 2 b = recv_pre_reqd_arg(1) 3 %block = recv_closure 4 thread_poll 5 line_num(1) 6 c = 1 7 line_num(2) 8 %v_0 = call(:+, a, [c]) 9 d = copy(%v_0) 10 return(%v_0) Register-based 3 address format IR Instructions Semantic Analysis

Slide 50

Slide 50 text

-Xir.passes=LocalOptimizationPass, DeadCodeElimination def foo(a, b) c = 1 d = a + c end 0 check_arity(2, 0, -1) 1 a = recv_pre_reqd_arg(0) 2 b = recv_pre_reqd_arg(1) 3 %block = recv_closure 4 thread_poll 5 line_num(1) 6 c = 1 7 line_num(2) 8 %v_0 = call(:+, a, [c]) 9 d = copy(%v_0) 10 return(%v_0) Optimization

Slide 51

Slide 51 text

def foo(a, b) c = 1 d = a + c end 0 check_arity(2, 0, -1) 1 a = recv_pre_reqd_arg(0) 4 thread_poll 5 line_num(1) 6 c = 1 7 line_num(2) 8 %v_0 = call(:+, a, [c]) 9 d = copy(%v_0) 10 return(%v_0) 0 check_arity(2, 0, -1) 1 a = recv_pre_reqd_arg(0) 4 thread_poll 5 line_num(1) 6 c = 7 line_num(2) 8 %v_0 = call(:+, a, [ ]) 9 d = copy(%v_0) 10 return(%v_0) 1 0 check_arity(2, 0, -1) 1 a = recv_pre_reqd_arg(0) 4 thread_poll 5 line_num(1) 7 line_num(2) 8 %v_0 = call(:+, a, [1]) 9 d = copy(%v_0) 10 return(%v_0) Optimization -Xir.passes=LocalOptimizationPass, DeadCodeElimination

Slide 52

Slide 52 text

0 check_arity(2, 0, -1) 1 a = recv_pre_reqd_arg(0) 4 thread_poll 5 line_num(1) 7 line_num(2) 8 %v_0 = call(:+, a, [1]) 9 d = copy(%v_0) 10 return(%v_0) 0 check_arity(2, 0, -1) 1 a = recv_pre_reqd_arg(0) 4 thread_poll 7 line_num(2) 8 %v_0 = call(:+, a, [1]) 9 d = copy(%v_0) 10 return(%v_0) Optimization -Xir.passes=LocalOptimizationPass, DeadCodeElimination

Slide 53

Slide 53 text

Inlining • 500 pound gorilla of optimizations • shove method/closure back to callsite • eliminate stack frame • eliminate parameter passing/return • eliminate additional allocation Optimization

Slide 54

Slide 54 text

Today’s Inliner def decrement_one(i) i - 1 end i = 1_000_000 while i > 0 i = decrement_one(i) end def decrement_one(i) i - 1 end i = 1_000_000 while i < 0 if guard_same? self i = i - 1 else i = decrement_one(i) end end

Slide 55

Slide 55 text

Numeric Specialization • Everything's an object • JVM has only references and primitives • Not compatible in bytecode • Need to optimize numerics as primitive

Slide 56

Slide 56 text

def looper(n)
 i = 0
 while i < n
 do_something(i)
 i += 1
 end
 end Cached object Call with i New Fixnum i + 1 Probably a Fixnum?

Slide 57

Slide 57 text

def looper(n)
 i = 0
 while i < n
 do_something(i)
 i += 1
 end
 end def looper(long n)
 long i = 0
 while i < n
 do_something(i)
 i += 1
 end
 end Specialize n, i to long def looper(n)
 i = 0
 while i < n
 do_something(i)
 i += 1
 end
 end Deopt to object version if n or i + 1 is not Fixnum

Slide 58

Slide 58 text

JVM Futures • We're good friends with OpenJDK folks • Working to improve JVM as well • FFI being added at JVM level • AOT compilation for startup perf

Slide 59

Slide 59 text

FFI in JVM • Project Panama (JEP-191) • Native support for FFI • Code generators for binding • JIT support for calling • API support for userland

Slide 60

Slide 60 text

User Code JNI call JNI impl Target Library Java C/native

Slide 61

Slide 61 text

User Code JNR stub JNI call JNI impl libffi Target Library Java C/native

Slide 62

Slide 62 text

User Code Panama Target Library Java C/native JIT knows about both sides

Slide 63

Slide 63 text

JIT Magic callq ; - libSystem.B.dylib ;*invokeinterface getpid ; - GetPidJNRExample::benchGetPid@13 (line 26) ; {optimized virtual_call} Direct call from JITed Ruby code

Slide 64

Slide 64 text

Startup Time • By far our greatest challenge • Everything starts cold: parser, interpreter, compiler, core classes, boot logic • Increasing amount of Ruby in JRuby • Aggravates the problem

Slide 65

Slide 65 text

JRuby Startup C Ruby JRuby Time in seconds (lower is better) 0s 3.5s 7s 10.5s 14s -e 1 gem --list rake -T in Rals app

Slide 66

Slide 66 text

--dev • Disables JRuby JIT • Sets JVM to reduced optimization mode • 50% reduction in startup time • Much lower peak perf

Slide 67

Slide 67 text

JRuby --dev C Ruby JRuby JRuby --dev Time in seconds (lower is better) 0s 3.5s 7s 10.5s 14s -e 1 gem --list rake -T in Rals app

Slide 68

Slide 68 text

AOT • Precompile JVM bytecode to native • Focus on hot code • Save original structure for optimization • Get JRuby running native right away • AOT compile Ruby to native in future

Slide 69

Slide 69 text

Getting There C Ruby JRuby JRuby --dev Non-opto AOT Opto AOT Time in seconds (lower is better) 0s 3.5s 7s 10.5s 14s rake -T in Rails app

Slide 70

Slide 70 text

AOT Future • AOT might be available in Java 9 • Many tweaks we can make to help it • Ideal: all code run at boot runs native • Should get closer to MRI

Slide 71

Slide 71 text

No content

Slide 72

Slide 72 text

No content

Slide 73

Slide 73 text

Thank You • @headius • @tom_enebo • http://jruby.org