Slide 1

Slide 1 text

JRuby 2018: Real World Perf Charles Oliver Nutter (@headius) Thomas Enebo (@tom_enebo)

Slide 2

Slide 2 text

• JRuby co-leads • Red Hat Inc. Charles Thomas Ruby Java Beer

Slide 3

Slide 3 text

No content

Slide 4

Slide 4 text

No content

Slide 5

Slide 5 text

What is JRuby • It's just Ruby! • Ruby 2.5 compatible, if something's broken tell us • Supports pure-Ruby gems, many extensions • We want to be a Ruby first! • It's a JVM language • Full access to the power of the JVM platform!

Slide 6

Slide 6 text

JVM Tools and GC

Slide 7

Slide 7 text

Parallel and Concurrent

Slide 8

Slide 8 text

Fun Stuff event(:player_egg_throw) do |e| e.hatching = true e.num_hatches = 120 e.player.mesg "hatched" end Purugin

Slide 9

Slide 9 text

Roadmap • 9.2.0.0 May 24???? O_o • EOL 2.3.x support soon? • How to handle 2.6? 9.1.17.0 ... 9.2.0.0 2.5.x 2.3.x 2.6? 9.2.1.0 master jruby-9_1 9.1.1.18.0 EOL?

Slide 10

Slide 10 text

New Feature? You Can Help! • New features are great opportunities to contribute! • Learn more about how Ruby and JRuby work! • Help us keep up with Ruby development! • Profit! • We are always standing by on IRC, Gitter, Twitter to help you

Slide 11

Slide 11 text

Library Compatibility • We run pure-ruby libraries well • Rails, Rake, RubyGems, ... • If a pure-Ruby library doesn't work the same, let us know • What about native extensions?

Slide 12

Slide 12 text

Oj

Slide 13

Slide 13 text

Oj+JRuby • OJ == Optimized JSON • Mild path of pain for potential JRuby users • Common transitive dependency with custom API • Needed for discourse • C extension only until now… https://github.com/ohler55/oj

Slide 14

Slide 14 text

Oj • Oj is large… • 19810 lines of C • 7 parsers/dumpers (object, strict, compat, null, custom, rails, wab) • parser stream + string for each • mimic API to be compatible with ‘json’ gem

Slide 15

Slide 15 text

JRuby Oj Port Status • 9200 lines of Java • Missing part of Wab dumper & parser • Missing mimic implementation • 448 runs, 765 assertions, 43 failures, 12 errors, 0 skips • 20 from wab; 15 from time/date bugs

Slide 16

Slide 16 text

Load Performance 0 0.3 0.6 0.9 1.2 small medium large 0.06 0.11 0.72 0.05 0.11 0.36 0.13 0.29 1.1 JRuby(oj) JRuby(json) MRI(oj) Million of loads per second (higher is better) Data from https://techblog.thescore.com/2014/05/23/benchmarking-json-generation-in-ruby/

Slide 17

Slide 17 text

Dump Performance 0 0.75 1.5 2.25 3 small medium large 0.33 0.73 2.1 0.22 0.44 1.1 0.44 0.86 2.3 JRuby(oj) JRuby(json) MRI(oj) Million of dumps per second (higher is better)

Slide 18

Slide 18 text

Oj Tasks Left • Green Build • Update wab+mimic • Submit an epic PR • Start performance tuning

Slide 19

Slide 19 text

JRuby on Rails

Slide 20

Slide 20 text

A Long, Hard Journey • JRuby first ran Rails in 2006 • Almost as long as Rails has existed! • Thousands of JRoR instances around the world • JRuby 9000, Ruby 2.4, 2.5 work slowed down Rails support • Rails 5.0 not supported for at least a year • ActiveRecord suffered the most

Slide 21

Slide 21 text

Rails 5.2.0 actioncable: something broken bootstrapping actionpack: 3148 runs, 15832 assertions, 1 failures, 0 errors actionmailer: 204 runs, 457 assertions, 0 failures, 0 errors actionview: 1990 runs, 4395 assertions, 4 failures, 4 errors activejob: 173 runs, 401 assertions, 0 failures, 0 errors activemodel: 803 runs, 2231 assertions, 0 failures, 0 errors activerecord: 5226 runs, 14665 assertions, 8 failures, 6 errors activesupport: 4135 runs, 762864 assertions, 17 failures, 2 errors railties: uses fork()

Slide 22

Slide 22 text

Failure: TimeWithZoneTest#test_minus_with_time_precision [activesupport/ test/core_ext/time_with_zone_test.rb:340]: Expected: 86399.999999998 Actual: 86399.99999999799

Slide 23

Slide 23 text

Rails Status • Tests running well • Over 99% passing • Rails apps should just work • SQLite3, MySQL, and Postgresql supported by us • MSSQL returning soon? • Oracle, DB2: see third-party adapters for now

Slide 24

Slide 24 text

Performance

Slide 25

Slide 25 text

JRuby Architecture Ruby (.rb) Ruby Instructions (IR) interpret JIT Java Instructions (java bytecode) interpret C1 Compile native code better native code parse interpreter java bytecode interpreter execute C2 Compile Java Virtual Machine JRuby Internals

Slide 26

Slide 26 text

Microbenching • Very fun to show off, see improve • Practically useless • Like judging a person by how much they can bench press • JRuby has won microbenchmarks for years • Easier to isolate specific measurements • Great for exploring new runtimes and tech

Slide 27

Slide 27 text

InvokeDynamic • JVM support for dynamic invocation • Let the JVM see through all the dynamic bits of Ruby • Added in Java 7, with much input and testing from JRuby • Steadily improving performance, reducing overhead • -Xcompile.invokedynamic • May be default soon!

Slide 28

Slide 28 text

bench_mandelbrot • Generate a text Mandelbrot fractal • See? Useful! • Test of numeric performance • Heavy reliance on JVM to optimize • Graal is especially good to us here

Slide 29

Slide 29 text

bench_mandelbrot.rb def mandelbrot(size) sum = 0 byte_acc = 0 bit_num = 0 y = 0 while y < size ci = (2.0*y/size)-1.0 x = 0 while x < size zrzr = zr = 0.0 zizi = zi = 0.0 cr = (2.0*x/size)-1.5 escape = 0b1 z = 0 while z < 50

Slide 30

Slide 30 text

bench_mandelbrot total execution time (lower is better) 0s 1s 2s 3s 4s CRuby 2.5 CRuby 2.6 JIT JRuby JRuby Indy 1.33s 2.95s 3.5s 3.57s

Slide 31

Slide 31 text

Graal • New JVM native JIT written in Java • Faster evolution • More advanced optimization • Plugs into JDK9+ via command line flags • Shipped with JDK10...try it today!

Slide 32

Slide 32 text

bench_mandelbrot total execution time (lower is better) 0s 0.75s 1.5s 2.25s 3s JRuby JRuby Indy JRuby Indy Graal 0.139s 1.33s 2.95s

Slide 33

Slide 33 text

Optimizing Objects • Ruby instance vars are dynamic • Space allocated on assignment • Any unfrozen object can grow • Looks like a Hash • Inefficient for mostly-same keys • Array reduces cost, still high • Make them JVM fields! class Person # closest we get to a declaration attr_accessor :fname, :lname, :bdate def initialize(fname, lname, bdate) # encountered after first object # has already been constructed @fname, @lname, @bdate = fname, lname, bdate end def initialize_id @id ||= SecureRandom.uuid end end

Slide 34

Slide 34 text

Optimizing Arrays • Arrays are growable until frozen, but... • Arrays are small + immutable or large + mutable • Large, mutable arrays will often continue to mutate • Manually optimized 1- and 2-element arrays using fields • Future: hook into Object Shaping for Array

Slide 35

Slide 35 text

10M * One-variable Object 0MB 200MB 400MB 600MB 800MB No Shaping Shaping 320 480 400 Ruby Object Object[]

Slide 36

Slide 36 text

Rails `select` Bench percent live alloc'ed class rank self accum bytes objs bytes objs name 23 0.82% 73.58% 1744576 18168 5894464 61396 org.jruby.gen.RubyObject17 32 0.44% 78.33% 937784 23432 2071464 51774 org.jruby.gen.RubyObject2 42 0.30% 81.96% 633312 19775 1525824 47666 org.jruby.gen.RubyObject0 43 0.30% 82.26% 632168 11280 2783968 49705 org.jruby.gen.RubyObject6 46 0.27% 83.10% 587072 18330 2133984 66671 org.jruby.gen.RubyObject1 58 0.22% 86.08% 465056 3630 1672864 13066 org.jruby.gen.RubyObject25 60 0.21% 86.51% 439304 10970 1493024 37313 org.jruby.gen.RubyObject3 61 0.20% 86.71% 434608 9044 2311744 48151 org.jruby.gen.RubyObject5 68 0.16% 87.93% 349936 7280 1305136 27180 org.jruby.gen.RubyObject4 79 0.11% 89.34% 233824 3646 838432 13093 org.jruby.gen.RubyObject8 238 0.01% 96.11% 28088 314 30816 345 org.jruby.gen.RubyObject14

Slide 37

Slide 37 text

10M * One-element Array 0MB 250MB 500MB 750MB 1000MB No Shaping Shaping 400 650 570 Ruby Object IRubyObject[]

Slide 38

Slide 38 text

Nearly Half are 1 or 2-element Arrays percent live alloc'ed class rank self accum bytes objs bytes objs name 5 4.90% 33.79% 10481824 218361 38183968 795489 org.jruby.RubyArray 11 3.11% 56.32% 6661072 138762 22817680 475358 org.jruby.specialized.RubyArrayOneObject 17 1.46% 67.96% 3124112 55779 15838128 282815 org.jruby.specialized.RubyArrayTwoObject

Slide 39

Slide 39 text

JRuby on Rails Performance

Slide 40

Slide 40 text

ActiveRecord Performance • Rails apps live and die by ActiveRecord • Largest CPU consumer by far • Heavy object churn, GC overhead • Create, read, and update measurements • If delete is your bottleneck, we need to talk • CRuby 2.5.1 vs JRuby 9.2 on JDK10

Slide 41

Slide 41 text

ActiveRecord create operations per second 0 40 80 120 160 JRuby JRuby Indy JRuby Graal CRuby 157.233 144.092 140.449 135.135

Slide 42

Slide 42 text

ActiveRecord find(id) operations per second 0 1250 2500 3750 5000 JRuby JRuby Indy JRuby Graal CRuby 3,940 4,672 4,999 3,937

Slide 43

Slide 43 text

ActiveRecord select operations per second 0 1050 2100 3150 4200 JRuby JRuby Indy JRuby Graal CRuby 3,125 3,703 4,132 2,403

Slide 44

Slide 44 text

ActiveRecord find_all operations per second 0 525 1050 1575 2100 JRuby JRuby Indy JRuby Graal CRuby 1,597 2,016 1,908 1,677

Slide 45

Slide 45 text

ActiveRecord update operations per second 0 1750 3500 5250 7000 JRuby JRuby Indy JRuby Graal CRuby 2,604 6,250 6,944 4,000

Slide 46

Slide 46 text

Scaling Rails • Classic problem on MRI • No concurrent threads, so we need processes • Processes inevitably duplicate runtime state • Much effort and lots of money wasted • JRuby is a great answer! • Multi-threaded single process runs your whole site

Slide 47

Slide 47 text

Measuring Rails Performance • Rails 5.1.6, Postgresql 10, scaffolded view • 4k requests to warm up, then measure every 10k • EC2 c4.xlarge: 4 vCPUs, 7.5GB • Bench, database, and app on same instance

Slide 48

Slide 48 text

Requests per second, full stack scaffolded read on Postgresql 0 325 650 975 1300 JRuby CRuby 910.02 1,253.86

Slide 49

Slide 49 text

Requests per second 0 325 650 975 1300 Requests over time 10k 20k 30k 40k 50k 60k 70k 80k 90k 100k CRuby 2.5 CRuby 2.6 JIT JRuby 9.2.4

Slide 50

Slide 50 text

JRuby on Rails Memory • Single instance is much bigger, 400-500MB versus 50MB • Ten CRuby processes = 500MB • Ten JRuby threads = 400-500MB • May need to tell JVM a memory cap • For 100-way or 1000-way...you do the math ADD GRAPH

Slide 51

Slide 51 text

JRuby is the fastest way to run Rails applications.

Slide 52

Slide 52 text

Method Inlining

Slide 53

Slide 53 text

Method Inlining def add(a, b) a + b end def calculate_cost(c) total1 = c.add 1000, 1 total2 = c.add 2000, 2 total1 + total2 end def calculate_cost(c) total1 = 1000 + 1 total2 = 2000 + 2 total1 + total2 end Disclaimer: Ruby much easier to read than IR * c.add must always be call to same method on same type == monomorphic call

Slide 54

Slide 54 text

Method Inlining • Eliminates cost of call (more obvious) • stack deepening • setting up call params • indirection to new method body • Leads to more optimizations (less obvious)

Slide 55

Slide 55 text

Method Inlining def calculate_cost(c) total1 = 1000 + 1 total2 = 2000 + 2 total1 + total2 end def calculate_cost(c) 1000 + 1 + 2000 + 2 end def pad_cost(c) calculate_cost(c) * 2 end def calculate_cost(c) 3003 end

Slide 56

Slide 56 text

JAVA IS GREAT AT INLINING METHODS* *Unless we pass a block to the method

Slide 57

Slide 57 text

Inlining Problem 1000.times do something end 1000.times do something_else end def times i = 0 while i < self do yield i i += 1 end end def times i = 0 while i < self do something i += 1 end end

Slide 58

Slide 58 text

JRuby Method Inlining • Methods with Literal Blocks • Get special call sites • If they always call the same type {n} times • Inline!

Slide 59

Slide 59 text

JRuby Inlining • Methods + Literal Blocks treated as single unit • Duplicate method. • Inline Block into dupe method. • Inline back to call • Both must be IR (e.g. Ruby defined)

Slide 60

Slide 60 text

class Foo def ___inline___me(i) k = i while k > 0 k = yield(k) end i - 1 end end def foo(counter) i = 5_000 while i > 0 i = counter.___inline___me(i) { |j| j - 2 } end end Contrived!

Slide 61

Slide 61 text

0 0.075 0.15 0.225 0.3 foo(counter) JIT JIT+inline Time per foo(counter) (smaller is better) 4.8x faster! Contrived!

Slide 62

Slide 62 text

Grrr…Core Methods in Java • Java implemented methods are quick • …but we cannot inline a Java implemented method! • Integer#times, Enumerable#{ALL THE THINGS} • If only we had a way…

Slide 63

Slide 63 text

Ruby Replacement! • At inline decision time • Do we have Ruby implementation of Java core method? • Yes! Inline with that. • Profit!

Slide 64

Slide 64 text

def foo s = 0 10_000_000.times do s += 1 end s end def times i = 0 while i < self do yield i i += 1 end end def foo s = 0 i = 0 while i < 10_000_000 do s += 1 i += 1 end s end

Slide 65

Slide 65 text

0 0.09 0.18 0.27 0.36 foo() JIT JIT+inline Time per foo() (smaller is better) 1.5x faster! Not Contrived!

Slide 66

Slide 66 text

Ruby Replacement Potential • Why just have one Ruby replacement? def times i = 0 while i < self do yield i i += 1 end end Arbitrary n-element times def times yield 0 yield 1 yield 2 yield 3 yield 4 end 5.times

Slide 67

Slide 67 text

JRuby Inlining Status • Only runs with -Xir.inliner currently • Many limitations • Bugs yet

Slide 68

Slide 68 text

Summary

Slide 69

Slide 69 text

No content

Slide 70

Slide 70 text

JDK 9+ Warnings • JDK 9 introduced stricter encapsulation • We poke through that encapsulation to support Ruby features • You'll see warnings...they're harmless, but we'll deal with them (9.3)

Slide 71

Slide 71 text

No content

Slide 72

Slide 72 text

Thank You! • Charles Oliver Nutter • [email protected] • @headius • Tom Enebo • [email protected] • @tom_enebo • http://jruby.org