Upgrade to Pro — share decks privately, control downloads, hide ads and more …

JRuby 2018: Real World Performance

headius
November 15, 2018

JRuby 2018: Real World Performance

Discussion of JRuby optimization and Rails performance delivered by Charles Oliver Nutter and Thomas Enebo at RubyConf 2018 in Los Angeles.

headius

November 15, 2018
Tweet

More Decks by headius

Other Decks in Programming

Transcript

  1. JRuby 2018: Real World Perf
    Charles Oliver Nutter (@headius)
    Thomas Enebo (@tom_enebo)

    View full-size slide

  2. • JRuby co-leads
    • Red Hat Inc.
    Charles Thomas Ruby
    Java
    Beer

    View full-size slide

  3. What is JRuby
    • It's just Ruby!
    • Ruby 2.5 compatible, if something's broken tell us
    • Supports pure-Ruby gems, many extensions
    • We want to be a Ruby first!
    • It's a JVM language
    • Full access to the power of the JVM platform!

    View full-size slide

  4. JVM Tools and GC

    View full-size slide

  5. Parallel and Concurrent

    View full-size slide

  6. Fun Stuff
    event(:player_egg_throw) do |e|
    e.hatching = true
    e.num_hatches = 120
    e.player.mesg "hatched"
    end
    Purugin

    View full-size slide

  7. Roadmap
    • 9.2.0.0 May 24???? O_o
    • EOL 2.3.x support soon?
    • How to handle 2.6?
    9.1.17.0
    ...
    9.2.0.0
    2.5.x
    2.3.x
    2.6?
    9.2.1.0
    master
    jruby-9_1
    9.1.1.18.0 EOL?

    View full-size slide

  8. New Feature? You Can Help!
    • New features are great opportunities to contribute!
    • Learn more about how Ruby and JRuby work!
    • Help us keep up with Ruby development!
    • Profit!
    • We are always standing by on IRC, Gitter, Twitter to help you

    View full-size slide

  9. Library Compatibility
    • We run pure-ruby libraries well
    • Rails, Rake, RubyGems, ...
    • If a pure-Ruby library doesn't work the same, let us know
    • What about native extensions?

    View full-size slide

  10. Oj+JRuby
    • OJ == Optimized JSON
    • Mild path of pain for potential JRuby users
    • Common transitive dependency with custom API
    • Needed for discourse
    • C extension only until now…
    https://github.com/ohler55/oj

    View full-size slide

  11. Oj
    • Oj is large…
    • 19810 lines of C
    • 7 parsers/dumpers (object, strict, compat, null, custom, rails, wab)
    • parser stream + string for each
    • mimic API to be compatible with ‘json’ gem

    View full-size slide

  12. JRuby Oj Port Status
    • 9200 lines of Java
    • Missing part of Wab dumper & parser
    • Missing mimic implementation
    • 448 runs, 765 assertions, 43 failures, 12 errors, 0 skips
    • 20 from wab; 15 from time/date bugs

    View full-size slide

  13. Load Performance
    0
    0.3
    0.6
    0.9
    1.2
    small medium large
    0.06
    0.11
    0.72
    0.05
    0.11
    0.36
    0.13
    0.29
    1.1
    JRuby(oj) JRuby(json) MRI(oj)
    Million of loads per second (higher is better)
    Data from https://techblog.thescore.com/2014/05/23/benchmarking-json-generation-in-ruby/

    View full-size slide

  14. Dump Performance
    0
    0.75
    1.5
    2.25
    3
    small medium large
    0.33
    0.73
    2.1
    0.22
    0.44
    1.1
    0.44
    0.86
    2.3
    JRuby(oj) JRuby(json) MRI(oj)
    Million of dumps per second (higher is better)

    View full-size slide

  15. Oj Tasks Left
    • Green Build
    • Update wab+mimic
    • Submit an epic PR
    • Start performance tuning

    View full-size slide

  16. JRuby on Rails

    View full-size slide

  17. A Long, Hard Journey
    • JRuby first ran Rails in 2006
    • Almost as long as Rails has existed!
    • Thousands of JRoR instances around the world
    • JRuby 9000, Ruby 2.4, 2.5 work slowed down Rails support
    • Rails 5.0 not supported for at least a year
    • ActiveRecord suffered the most

    View full-size slide

  18. Rails 5.2.0
    actioncable: something broken bootstrapping
    actionpack: 3148 runs, 15832 assertions, 1 failures, 0 errors
    actionmailer: 204 runs, 457 assertions, 0 failures, 0 errors
    actionview: 1990 runs, 4395 assertions, 4 failures, 4 errors
    activejob: 173 runs, 401 assertions, 0 failures, 0 errors
    activemodel: 803 runs, 2231 assertions, 0 failures, 0 errors
    activerecord: 5226 runs, 14665 assertions, 8 failures, 6 errors
    activesupport: 4135 runs, 762864 assertions, 17 failures, 2 errors
    railties: uses fork()

    View full-size slide

  19. Failure:
    TimeWithZoneTest#test_minus_with_time_precision [activesupport/
    test/core_ext/time_with_zone_test.rb:340]:
    Expected: 86399.999999998
    Actual: 86399.99999999799

    View full-size slide

  20. Rails Status
    • Tests running well
    • Over 99% passing
    • Rails apps should just work
    • SQLite3, MySQL, and Postgresql supported by us
    • MSSQL returning soon?
    • Oracle, DB2: see third-party adapters for now

    View full-size slide

  21. JRuby Architecture
    Ruby (.rb) Ruby Instructions
    (IR)
    interpret
    JIT
    Java Instructions
    (java bytecode)
    interpret
    C1 Compile
    native code better
    native code
    parse
    interpreter
    java
    bytecode
    interpreter
    execute
    C2 Compile
    Java Virtual Machine
    JRuby Internals

    View full-size slide

  22. Microbenching
    • Very fun to show off, see improve
    • Practically useless
    • Like judging a person by how much they can bench press
    • JRuby has won microbenchmarks for years
    • Easier to isolate specific measurements
    • Great for exploring new runtimes and tech

    View full-size slide

  23. InvokeDynamic
    • JVM support for dynamic invocation
    • Let the JVM see through all the dynamic bits of Ruby
    • Added in Java 7, with much input and testing from JRuby
    • Steadily improving performance, reducing overhead
    • -Xcompile.invokedynamic
    • May be default soon!

    View full-size slide

  24. bench_mandelbrot
    • Generate a text Mandelbrot fractal
    • See? Useful!
    • Test of numeric performance
    • Heavy reliance on JVM to optimize
    • Graal is especially good to us here

    View full-size slide

  25. bench_mandelbrot.rb
    def mandelbrot(size)
    sum = 0
    byte_acc = 0
    bit_num = 0
    y = 0
    while y < size
    ci = (2.0*y/size)-1.0
    x = 0
    while x < size
    zrzr = zr = 0.0
    zizi = zi = 0.0
    cr = (2.0*x/size)-1.5
    escape = 0b1
    z = 0
    while z < 50

    View full-size slide

  26. bench_mandelbrot total execution time (lower is better)
    0s
    1s
    2s
    3s
    4s
    CRuby 2.5 CRuby 2.6 JIT JRuby JRuby Indy
    1.33s
    2.95s
    3.5s
    3.57s

    View full-size slide

  27. Graal
    • New JVM native JIT written in Java
    • Faster evolution
    • More advanced optimization
    • Plugs into JDK9+ via command line flags
    • Shipped with JDK10...try it today!

    View full-size slide

  28. bench_mandelbrot total execution time (lower is better)
    0s
    0.75s
    1.5s
    2.25s
    3s
    JRuby JRuby Indy JRuby Indy Graal
    0.139s
    1.33s
    2.95s

    View full-size slide

  29. Optimizing Objects
    • Ruby instance vars are dynamic
    • Space allocated on assignment
    • Any unfrozen object can grow
    • Looks like a Hash
    • Inefficient for mostly-same keys
    • Array reduces cost, still high
    • Make them JVM fields!
    class Person
    # closest we get to a declaration
    attr_accessor :fname, :lname, :bdate
    def initialize(fname, lname, bdate)
    # encountered after first object
    # has already been constructed
    @fname, @lname, @bdate =
    fname, lname, bdate
    end
    def initialize_id
    @id ||= SecureRandom.uuid
    end
    end

    View full-size slide

  30. Optimizing Arrays
    • Arrays are growable until frozen, but...
    • Arrays are small + immutable or large + mutable
    • Large, mutable arrays will often continue to mutate
    • Manually optimized 1- and 2-element arrays using fields
    • Future: hook into Object Shaping for Array

    View full-size slide

  31. 10M * One-variable Object
    0MB
    200MB
    400MB
    600MB
    800MB
    No Shaping Shaping
    320
    480
    400
    Ruby Object Object[]

    View full-size slide

  32. Rails `select` Bench
    percent live alloc'ed class
    rank self accum bytes objs bytes objs name
    23 0.82% 73.58% 1744576 18168 5894464 61396 org.jruby.gen.RubyObject17
    32 0.44% 78.33% 937784 23432 2071464 51774 org.jruby.gen.RubyObject2
    42 0.30% 81.96% 633312 19775 1525824 47666 org.jruby.gen.RubyObject0
    43 0.30% 82.26% 632168 11280 2783968 49705 org.jruby.gen.RubyObject6
    46 0.27% 83.10% 587072 18330 2133984 66671 org.jruby.gen.RubyObject1
    58 0.22% 86.08% 465056 3630 1672864 13066 org.jruby.gen.RubyObject25
    60 0.21% 86.51% 439304 10970 1493024 37313 org.jruby.gen.RubyObject3
    61 0.20% 86.71% 434608 9044 2311744 48151 org.jruby.gen.RubyObject5
    68 0.16% 87.93% 349936 7280 1305136 27180 org.jruby.gen.RubyObject4
    79 0.11% 89.34% 233824 3646 838432 13093 org.jruby.gen.RubyObject8
    238 0.01% 96.11% 28088 314 30816 345 org.jruby.gen.RubyObject14

    View full-size slide

  33. 10M * One-element Array
    0MB
    250MB
    500MB
    750MB
    1000MB
    No Shaping Shaping
    400
    650
    570
    Ruby Object IRubyObject[]

    View full-size slide

  34. Nearly Half are 1 or 2-element Arrays
    percent live alloc'ed class
    rank self accum bytes objs bytes objs name
    5 4.90% 33.79% 10481824 218361 38183968 795489 org.jruby.RubyArray
    11 3.11% 56.32% 6661072 138762 22817680 475358 org.jruby.specialized.RubyArrayOneObject
    17 1.46% 67.96% 3124112 55779 15838128 282815 org.jruby.specialized.RubyArrayTwoObject

    View full-size slide

  35. JRuby on Rails Performance

    View full-size slide

  36. ActiveRecord Performance
    • Rails apps live and die by ActiveRecord
    • Largest CPU consumer by far
    • Heavy object churn, GC overhead
    • Create, read, and update measurements
    • If delete is your bottleneck, we need to talk
    • CRuby 2.5.1 vs JRuby 9.2 on JDK10

    View full-size slide

  37. ActiveRecord
    create operations per second
    0
    40
    80
    120
    160
    JRuby JRuby Indy JRuby Graal CRuby
    157.233
    144.092
    140.449
    135.135

    View full-size slide

  38. ActiveRecord
    find(id) operations per second
    0
    1250
    2500
    3750
    5000
    JRuby JRuby Indy JRuby Graal CRuby
    3,940
    4,672
    4,999
    3,937

    View full-size slide

  39. ActiveRecord
    select operations per second
    0
    1050
    2100
    3150
    4200
    JRuby JRuby Indy JRuby Graal CRuby
    3,125
    3,703
    4,132
    2,403

    View full-size slide

  40. ActiveRecord
    find_all operations per second
    0
    525
    1050
    1575
    2100
    JRuby JRuby Indy JRuby Graal CRuby
    1,597
    2,016
    1,908
    1,677

    View full-size slide

  41. ActiveRecord
    update operations per second
    0
    1750
    3500
    5250
    7000
    JRuby JRuby Indy JRuby Graal CRuby
    2,604
    6,250
    6,944
    4,000

    View full-size slide

  42. Scaling Rails
    • Classic problem on MRI
    • No concurrent threads, so we need processes
    • Processes inevitably duplicate runtime state
    • Much effort and lots of money wasted
    • JRuby is a great answer!
    • Multi-threaded single process runs your whole site

    View full-size slide

  43. Measuring Rails Performance
    • Rails 5.1.6, Postgresql 10, scaffolded view
    • 4k requests to warm up, then measure every 10k
    • EC2 c4.xlarge: 4 vCPUs, 7.5GB
    • Bench, database, and app on same instance

    View full-size slide

  44. Requests per second, full stack scaffolded read on Postgresql
    0
    325
    650
    975
    1300
    JRuby CRuby
    910.02
    1,253.86

    View full-size slide

  45. Requests per second
    0
    325
    650
    975
    1300
    Requests over time
    10k 20k 30k 40k 50k 60k 70k 80k 90k 100k
    CRuby 2.5 CRuby 2.6 JIT JRuby 9.2.4

    View full-size slide

  46. JRuby on Rails Memory
    • Single instance is much bigger, 400-500MB versus 50MB
    • Ten CRuby processes = 500MB
    • Ten JRuby threads = 400-500MB
    • May need to tell JVM a memory cap
    • For 100-way or 1000-way...you do the math
    ADD GRAPH

    View full-size slide

  47. JRuby is the fastest way
    to run Rails applications.

    View full-size slide

  48. Method Inlining

    View full-size slide

  49. Method Inlining
    def add(a, b)
    a + b
    end
    def calculate_cost(c)
    total1 = c.add 1000, 1
    total2 = c.add 2000, 2
    total1 + total2
    end
    def calculate_cost(c)
    total1 = 1000 + 1
    total2 = 2000 + 2
    total1 + total2
    end
    Disclaimer: Ruby much easier to read than IR
    * c.add must always be call to same method on same type == monomorphic call

    View full-size slide

  50. Method Inlining
    • Eliminates cost of call (more obvious)
    • stack deepening
    • setting up call params
    • indirection to new method body
    • Leads to more optimizations (less obvious)

    View full-size slide

  51. Method Inlining
    def calculate_cost(c)
    total1 = 1000 + 1
    total2 = 2000 + 2
    total1 + total2
    end
    def calculate_cost(c)
    1000 + 1 + 2000 + 2
    end
    def pad_cost(c)
    calculate_cost(c) * 2
    end
    def calculate_cost(c)
    3003
    end

    View full-size slide

  52. JAVA IS GREAT AT

    INLINING METHODS*
    *Unless we pass a block to the method

    View full-size slide

  53. Inlining Problem
    1000.times do
    something
    end
    1000.times do
    something_else
    end
    def times
    i = 0
    while i < self do
    yield i
    i += 1
    end
    end
    def times
    i = 0
    while i < self do
    something
    i += 1
    end
    end

    View full-size slide

  54. JRuby Method Inlining
    • Methods with Literal Blocks
    • Get special call sites
    • If they always call the same type {n} times
    • Inline!

    View full-size slide

  55. JRuby Inlining
    • Methods + Literal Blocks treated as single unit
    • Duplicate method.
    • Inline Block into dupe method.
    • Inline back to call
    • Both must be IR (e.g. Ruby defined)

    View full-size slide

  56. class Foo
    def ___inline___me(i)
    k = i
    while k > 0
    k = yield(k)
    end
    i - 1
    end
    end
    def foo(counter)
    i = 5_000
    while i > 0
    i = counter.___inline___me(i) { |j| j - 2 }
    end
    end
    Contrived!

    View full-size slide

  57. 0
    0.075
    0.15
    0.225
    0.3
    foo(counter)
    JIT JIT+inline
    Time per foo(counter) (smaller is better)
    4.8x faster!
    Contrived!

    View full-size slide

  58. Grrr…Core Methods in Java
    • Java implemented methods are quick
    • …but we cannot inline a Java implemented method!
    • Integer#times, Enumerable#{ALL THE THINGS}
    • If only we had a way…

    View full-size slide

  59. Ruby Replacement!
    • At inline decision time
    • Do we have Ruby implementation of Java core method?
    • Yes! Inline with that.
    • Profit!

    View full-size slide

  60. def foo
    s = 0
    10_000_000.times do
    s += 1
    end
    s
    end
    def times
    i = 0
    while i < self do
    yield i
    i += 1
    end
    end
    def foo
    s = 0
    i = 0
    while i < 10_000_000 do
    s += 1
    i += 1
    end
    s
    end

    View full-size slide

  61. 0
    0.09
    0.18
    0.27
    0.36
    foo()
    JIT JIT+inline
    Time per foo() (smaller is better)
    1.5x faster!
    Not Contrived!

    View full-size slide

  62. Ruby Replacement Potential
    • Why just have one Ruby replacement?
    def times
    i = 0
    while i < self do
    yield i
    i += 1
    end
    end
    Arbitrary n-element times
    def times
    yield 0
    yield 1
    yield 2
    yield 3
    yield 4
    end
    5.times

    View full-size slide

  63. JRuby Inlining Status
    • Only runs with -Xir.inliner currently
    • Many limitations
    • Bugs yet

    View full-size slide

  64. JDK 9+ Warnings
    • JDK 9 introduced stricter encapsulation
    • We poke through that encapsulation to support Ruby features
    • You'll see warnings...they're harmless, but we'll deal with them (9.3)

    View full-size slide

  65. Thank You!
    • Charles Oliver Nutter
    [email protected]
    • @headius
    • Tom Enebo
    [email protected]
    • @tom_enebo
    • http://jruby.org

    View full-size slide