Save 37% off PRO during our Black Friday Sale! »

JRuby 2018: Real World Performance

F1d37642fdaa1662ff46e4c65731e9ab?s=47 headius
November 15, 2018

JRuby 2018: Real World Performance

Discussion of JRuby optimization and Rails performance delivered by Charles Oliver Nutter and Thomas Enebo at RubyConf 2018 in Los Angeles.

F1d37642fdaa1662ff46e4c65731e9ab?s=128

headius

November 15, 2018
Tweet

Transcript

  1. JRuby 2018: Real World Perf Charles Oliver Nutter (@headius) Thomas

    Enebo (@tom_enebo)
  2. • JRuby co-leads • Red Hat Inc. Charles Thomas Ruby

    Java Beer
  3. None
  4. None
  5. What is JRuby • It's just Ruby! • Ruby 2.5

    compatible, if something's broken tell us • Supports pure-Ruby gems, many extensions • We want to be a Ruby first! • It's a JVM language • Full access to the power of the JVM platform!
  6. JVM Tools and GC

  7. Parallel and Concurrent

  8. Fun Stuff event(:player_egg_throw) do |e| e.hatching = true e.num_hatches =

    120 e.player.mesg "hatched" end Purugin
  9. Roadmap • 9.2.0.0 May 24???? O_o • EOL 2.3.x support

    soon? • How to handle 2.6? 9.1.17.0 ... 9.2.0.0 2.5.x 2.3.x 2.6? 9.2.1.0 master jruby-9_1 9.1.1.18.0 EOL?
  10. New Feature? You Can Help! • New features are great

    opportunities to contribute! • Learn more about how Ruby and JRuby work! • Help us keep up with Ruby development! • Profit! • We are always standing by on IRC, Gitter, Twitter to help you
  11. Library Compatibility • We run pure-ruby libraries well • Rails,

    Rake, RubyGems, ... • If a pure-Ruby library doesn't work the same, let us know • What about native extensions?
  12. Oj

  13. Oj+JRuby • OJ == Optimized JSON • Mild path of

    pain for potential JRuby users • Common transitive dependency with custom API • Needed for discourse • C extension only until now… https://github.com/ohler55/oj
  14. Oj • Oj is large… • 19810 lines of C

    • 7 parsers/dumpers (object, strict, compat, null, custom, rails, wab) • parser stream + string for each • mimic API to be compatible with ‘json’ gem
  15. JRuby Oj Port Status • 9200 lines of Java •

    Missing part of Wab dumper & parser • Missing mimic implementation • 448 runs, 765 assertions, 43 failures, 12 errors, 0 skips • 20 from wab; 15 from time/date bugs
  16. Load Performance 0 0.3 0.6 0.9 1.2 small medium large

    0.06 0.11 0.72 0.05 0.11 0.36 0.13 0.29 1.1 JRuby(oj) JRuby(json) MRI(oj) Million of loads per second (higher is better) Data from https://techblog.thescore.com/2014/05/23/benchmarking-json-generation-in-ruby/
  17. Dump Performance 0 0.75 1.5 2.25 3 small medium large

    0.33 0.73 2.1 0.22 0.44 1.1 0.44 0.86 2.3 JRuby(oj) JRuby(json) MRI(oj) Million of dumps per second (higher is better)
  18. Oj Tasks Left • Green Build • Update wab+mimic •

    Submit an epic PR • Start performance tuning
  19. JRuby on Rails

  20. A Long, Hard Journey • JRuby first ran Rails in

    2006 • Almost as long as Rails has existed! • Thousands of JRoR instances around the world • JRuby 9000, Ruby 2.4, 2.5 work slowed down Rails support • Rails 5.0 not supported for at least a year • ActiveRecord suffered the most
  21. Rails 5.2.0 actioncable: something broken bootstrapping actionpack: 3148 runs, 15832

    assertions, 1 failures, 0 errors actionmailer: 204 runs, 457 assertions, 0 failures, 0 errors actionview: 1990 runs, 4395 assertions, 4 failures, 4 errors activejob: 173 runs, 401 assertions, 0 failures, 0 errors activemodel: 803 runs, 2231 assertions, 0 failures, 0 errors activerecord: 5226 runs, 14665 assertions, 8 failures, 6 errors activesupport: 4135 runs, 762864 assertions, 17 failures, 2 errors railties: uses fork()
  22. Failure: TimeWithZoneTest#test_minus_with_time_precision [activesupport/ test/core_ext/time_with_zone_test.rb:340]: Expected: 86399.999999998 Actual: 86399.99999999799

  23. Rails Status • Tests running well • Over 99% passing

    • Rails apps should just work • SQLite3, MySQL, and Postgresql supported by us • MSSQL returning soon? • Oracle, DB2: see third-party adapters for now
  24. Performance

  25. JRuby Architecture Ruby (.rb) Ruby Instructions (IR) interpret JIT Java

    Instructions (java bytecode) interpret C1 Compile native code better native code parse interpreter java bytecode interpreter execute C2 Compile Java Virtual Machine JRuby Internals
  26. Microbenching • Very fun to show off, see improve •

    Practically useless • Like judging a person by how much they can bench press • JRuby has won microbenchmarks for years • Easier to isolate specific measurements • Great for exploring new runtimes and tech
  27. InvokeDynamic • JVM support for dynamic invocation • Let the

    JVM see through all the dynamic bits of Ruby • Added in Java 7, with much input and testing from JRuby • Steadily improving performance, reducing overhead • -Xcompile.invokedynamic • May be default soon!
  28. bench_mandelbrot • Generate a text Mandelbrot fractal • See? Useful!

    • Test of numeric performance • Heavy reliance on JVM to optimize • Graal is especially good to us here
  29. bench_mandelbrot.rb def mandelbrot(size) sum = 0 byte_acc = 0 bit_num

    = 0 y = 0 while y < size ci = (2.0*y/size)-1.0 x = 0 while x < size zrzr = zr = 0.0 zizi = zi = 0.0 cr = (2.0*x/size)-1.5 escape = 0b1 z = 0 while z < 50
  30. bench_mandelbrot total execution time (lower is better) 0s 1s 2s

    3s 4s CRuby 2.5 CRuby 2.6 JIT JRuby JRuby Indy 1.33s 2.95s 3.5s 3.57s
  31. Graal • New JVM native JIT written in Java •

    Faster evolution • More advanced optimization • Plugs into JDK9+ via command line flags • Shipped with JDK10...try it today!
  32. bench_mandelbrot total execution time (lower is better) 0s 0.75s 1.5s

    2.25s 3s JRuby JRuby Indy JRuby Indy Graal 0.139s 1.33s 2.95s
  33. Optimizing Objects • Ruby instance vars are dynamic • Space

    allocated on assignment • Any unfrozen object can grow • Looks like a Hash • Inefficient for mostly-same keys • Array reduces cost, still high • Make them JVM fields! class Person # closest we get to a declaration attr_accessor :fname, :lname, :bdate def initialize(fname, lname, bdate) # encountered after first object # has already been constructed @fname, @lname, @bdate = fname, lname, bdate end def initialize_id @id ||= SecureRandom.uuid end end
  34. Optimizing Arrays • Arrays are growable until frozen, but... •

    Arrays are small + immutable or large + mutable • Large, mutable arrays will often continue to mutate • Manually optimized 1- and 2-element arrays using fields • Future: hook into Object Shaping for Array
  35. 10M * One-variable Object 0MB 200MB 400MB 600MB 800MB No

    Shaping Shaping 320 480 400 Ruby Object Object[]
  36. Rails `select` Bench percent live alloc'ed class rank self accum

    bytes objs bytes objs name 23 0.82% 73.58% 1744576 18168 5894464 61396 org.jruby.gen.RubyObject17 32 0.44% 78.33% 937784 23432 2071464 51774 org.jruby.gen.RubyObject2 42 0.30% 81.96% 633312 19775 1525824 47666 org.jruby.gen.RubyObject0 43 0.30% 82.26% 632168 11280 2783968 49705 org.jruby.gen.RubyObject6 46 0.27% 83.10% 587072 18330 2133984 66671 org.jruby.gen.RubyObject1 58 0.22% 86.08% 465056 3630 1672864 13066 org.jruby.gen.RubyObject25 60 0.21% 86.51% 439304 10970 1493024 37313 org.jruby.gen.RubyObject3 61 0.20% 86.71% 434608 9044 2311744 48151 org.jruby.gen.RubyObject5 68 0.16% 87.93% 349936 7280 1305136 27180 org.jruby.gen.RubyObject4 79 0.11% 89.34% 233824 3646 838432 13093 org.jruby.gen.RubyObject8 238 0.01% 96.11% 28088 314 30816 345 org.jruby.gen.RubyObject14
  37. 10M * One-element Array 0MB 250MB 500MB 750MB 1000MB No

    Shaping Shaping 400 650 570 Ruby Object IRubyObject[]
  38. Nearly Half are 1 or 2-element Arrays percent live alloc'ed

    class rank self accum bytes objs bytes objs name 5 4.90% 33.79% 10481824 218361 38183968 795489 org.jruby.RubyArray 11 3.11% 56.32% 6661072 138762 22817680 475358 org.jruby.specialized.RubyArrayOneObject 17 1.46% 67.96% 3124112 55779 15838128 282815 org.jruby.specialized.RubyArrayTwoObject
  39. JRuby on Rails Performance

  40. ActiveRecord Performance • Rails apps live and die by ActiveRecord

    • Largest CPU consumer by far • Heavy object churn, GC overhead • Create, read, and update measurements • If delete is your bottleneck, we need to talk • CRuby 2.5.1 vs JRuby 9.2 on JDK10
  41. ActiveRecord create operations per second 0 40 80 120 160

    JRuby JRuby Indy JRuby Graal CRuby 157.233 144.092 140.449 135.135
  42. ActiveRecord find(id) operations per second 0 1250 2500 3750 5000

    JRuby JRuby Indy JRuby Graal CRuby 3,940 4,672 4,999 3,937
  43. ActiveRecord select operations per second 0 1050 2100 3150 4200

    JRuby JRuby Indy JRuby Graal CRuby 3,125 3,703 4,132 2,403
  44. ActiveRecord find_all operations per second 0 525 1050 1575 2100

    JRuby JRuby Indy JRuby Graal CRuby 1,597 2,016 1,908 1,677
  45. ActiveRecord update operations per second 0 1750 3500 5250 7000

    JRuby JRuby Indy JRuby Graal CRuby 2,604 6,250 6,944 4,000
  46. Scaling Rails • Classic problem on MRI • No concurrent

    threads, so we need processes • Processes inevitably duplicate runtime state • Much effort and lots of money wasted • JRuby is a great answer! • Multi-threaded single process runs your whole site
  47. Measuring Rails Performance • Rails 5.1.6, Postgresql 10, scaffolded view

    • 4k requests to warm up, then measure every 10k • EC2 c4.xlarge: 4 vCPUs, 7.5GB • Bench, database, and app on same instance
  48. Requests per second, full stack scaffolded read on Postgresql 0

    325 650 975 1300 JRuby CRuby 910.02 1,253.86
  49. Requests per second 0 325 650 975 1300 Requests over

    time 10k 20k 30k 40k 50k 60k 70k 80k 90k 100k CRuby 2.5 CRuby 2.6 JIT JRuby 9.2.4
  50. JRuby on Rails Memory • Single instance is much bigger,

    400-500MB versus 50MB • Ten CRuby processes = 500MB • Ten JRuby threads = 400-500MB • May need to tell JVM a memory cap • For 100-way or 1000-way...you do the math ADD GRAPH
  51. JRuby is the fastest way to run Rails applications.

  52. Method Inlining

  53. Method Inlining def add(a, b) a + b end def

    calculate_cost(c) total1 = c.add 1000, 1 total2 = c.add 2000, 2 total1 + total2 end def calculate_cost(c) total1 = 1000 + 1 total2 = 2000 + 2 total1 + total2 end Disclaimer: Ruby much easier to read than IR * c.add must always be call to same method on same type == monomorphic call
  54. Method Inlining • Eliminates cost of call (more obvious) •

    stack deepening • setting up call params • indirection to new method body • Leads to more optimizations (less obvious)
  55. Method Inlining def calculate_cost(c) total1 = 1000 + 1 total2

    = 2000 + 2 total1 + total2 end def calculate_cost(c) 1000 + 1 + 2000 + 2 end def pad_cost(c) calculate_cost(c) * 2 end def calculate_cost(c) 3003 end
  56. JAVA IS GREAT AT INLINING METHODS* *Unless we pass a

    block to the method
  57. Inlining Problem 1000.times do something end 1000.times do something_else end

    def times i = 0 while i < self do yield i i += 1 end end def times i = 0 while i < self do something i += 1 end end
  58. JRuby Method Inlining • Methods with Literal Blocks • Get

    special call sites • If they always call the same type {n} times • Inline!
  59. JRuby Inlining • Methods + Literal Blocks treated as single

    unit • Duplicate method. • Inline Block into dupe method. • Inline back to call • Both must be IR (e.g. Ruby defined)
  60. class Foo def ___inline___me(i) k = i while k >

    0 k = yield(k) end i - 1 end end def foo(counter) i = 5_000 while i > 0 i = counter.___inline___me(i) { |j| j - 2 } end end Contrived!
  61. 0 0.075 0.15 0.225 0.3 foo(counter) JIT JIT+inline Time per

    foo(counter) (smaller is better) 4.8x faster! Contrived!
  62. Grrr…Core Methods in Java • Java implemented methods are quick

    • …but we cannot inline a Java implemented method! • Integer#times, Enumerable#{ALL THE THINGS} • If only we had a way…
  63. Ruby Replacement! • At inline decision time • Do we

    have Ruby implementation of Java core method? • Yes! Inline with that. • Profit!
  64. def foo s = 0 10_000_000.times do s += 1

    end s end def times i = 0 while i < self do yield i i += 1 end end def foo s = 0 i = 0 while i < 10_000_000 do s += 1 i += 1 end s end
  65. 0 0.09 0.18 0.27 0.36 foo() JIT JIT+inline Time per

    foo() (smaller is better) 1.5x faster! Not Contrived!
  66. Ruby Replacement Potential • Why just have one Ruby replacement?

    def times i = 0 while i < self do yield i i += 1 end end Arbitrary n-element times def times yield 0 yield 1 yield 2 yield 3 yield 4 end 5.times
  67. JRuby Inlining Status • Only runs with -Xir.inliner currently •

    Many limitations • Bugs yet
  68. Summary

  69. None
  70. JDK 9+ Warnings • JDK 9 introduced stricter encapsulation •

    We poke through that encapsulation to support Ruby features • You'll see warnings...they're harmless, but we'll deal with them (9.3)
  71. None
  72. Thank You! • Charles Oliver Nutter • headius@headius.com • @headius

    • Tom Enebo • tom.enebo@gmail.com • @tom_enebo • http://jruby.org