Upgrade to Pro — share decks privately, control downloads, hide ads and more …

High Performance Ruby - E4E Conference 2013

headius
June 28, 2013

High Performance Ruby - E4E Conference 2013

A presentation on how JRuby is making Ruby faster and other tricks Rubyists can do to speed up code.

headius

June 28, 2013
Tweet

More Decks by headius

Other Decks in Programming

Transcript

  1. Me • Charles Oliver Nutter • @headius • Java developer

    since 1996 • JRuby developer since 2006 • Red Hat / JBoss polyglot group Monday, July 1, 13
  2. What Should We Optimize? • Overall execution time? • Memory

    use? • Developer time? • Developer happiness? :-) Monday, July 1, 13
  3. Strategies • Use a better runtime • Use more cores

    • Write better code Monday, July 1, 13
  4. Many Options • Ruby 2.0 • Significant execution improvements •

    JRuby • Leveraging JVM more and more • Rubinius • Optimizing VM built for Ruby Monday, July 1, 13
  5. 0 7.5 15 22.5 30 Java 1.4 Java 5 Java

    6 Java 7 Go Java Go! JRuby 1.0.3 (bm_red_black_tree.rb) 300% for free Monday, July 1, 13
  6. 0 2 4 6 8 1.0.3 1.1.6 1.4.0 1.5.6 1.6.8

    1.7.0 OpenJDK 8 (bm_red_black_tree.rb) Go JRuby Go! 8.2x Improvement Monday, July 1, 13
  7. rbtree Extension • Pure Ruby version works everywhere • C

    or Java extension FOR SPEED • Oh really? ;-) Monday, July 1, 13
  8. 0 1 2 3 4 ruby-1.9.3 + Ruby ruby-2.0.0 +

    Ruby maglev + Ruby macruby-0.12 + Ruby rbx-2.0.0rc1 + Ruby ruby-1.9.3 + C ext ruby-2.0.0 + C ext jruby + Ruby jruby + Java ext red/black tree, pure Ruby versus native Runtime per iteration Monday, July 1, 13
  9. 0 1 2 3 4 ruby-1.9.3 + Ruby ruby-2.0.0 +

    Ruby maglev + Ruby macruby-0.12 + Ruby rbx-2.0.0rc1 + Ruby ruby-1.9.3 + C ext ruby-2.0.0 + C ext jruby + Ruby jruby + Java ext 3.96 red/black tree, pure Ruby versus native Runtime per iteration Monday, July 1, 13
  10. 0 1 2 3 4 ruby-1.9.3 + Ruby ruby-2.0.0 +

    Ruby maglev + Ruby macruby-0.12 + Ruby rbx-2.0.0rc1 + Ruby ruby-1.9.3 + C ext ruby-2.0.0 + C ext jruby + Ruby jruby + Java ext 3.96 2.48 red/black tree, pure Ruby versus native Runtime per iteration Monday, July 1, 13
  11. 0 1 2 3 4 ruby-1.9.3 + Ruby ruby-2.0.0 +

    Ruby maglev + Ruby macruby-0.12 + Ruby rbx-2.0.0rc1 + Ruby ruby-1.9.3 + C ext ruby-2.0.0 + C ext jruby + Ruby jruby + Java ext 3.96 2.48 1.39 red/black tree, pure Ruby versus native Runtime per iteration Monday, July 1, 13
  12. 0 1 2 3 4 ruby-1.9.3 + Ruby ruby-2.0.0 +

    Ruby maglev + Ruby macruby-0.12 + Ruby rbx-2.0.0rc1 + Ruby ruby-1.9.3 + C ext ruby-2.0.0 + C ext jruby + Ruby jruby + Java ext 3.96 2.48 1.39 1.19 red/black tree, pure Ruby versus native Runtime per iteration Monday, July 1, 13
  13. 0 1 2 3 4 ruby-1.9.3 + Ruby ruby-2.0.0 +

    Ruby maglev + Ruby macruby-0.12 + Ruby rbx-2.0.0rc1 + Ruby ruby-1.9.3 + C ext ruby-2.0.0 + C ext jruby + Ruby jruby + Java ext 3.96 2.48 1.39 1.19 0.51 red/black tree, pure Ruby versus native Runtime per iteration Monday, July 1, 13
  14. 0 1 2 3 4 ruby-1.9.3 + Ruby ruby-2.0.0 +

    Ruby maglev + Ruby macruby-0.12 + Ruby rbx-2.0.0rc1 + Ruby ruby-1.9.3 + C ext ruby-2.0.0 + C ext jruby + Ruby jruby + Java ext 3.96 2.48 1.39 1.19 0.51 0.51 red/black tree, pure Ruby versus native Runtime per iteration Monday, July 1, 13
  15. 0 1 2 3 4 ruby-1.9.3 + Ruby ruby-2.0.0 +

    Ruby maglev + Ruby macruby-0.12 + Ruby rbx-2.0.0rc1 + Ruby ruby-1.9.3 + C ext ruby-2.0.0 + C ext jruby + Ruby jruby + Java ext 3.96 2.48 1.39 1.19 0.51 0.51 0.51 red/black tree, pure Ruby versus native Runtime per iteration Monday, July 1, 13
  16. 0 1 2 3 4 ruby-1.9.3 + Ruby ruby-2.0.0 +

    Ruby maglev + Ruby macruby-0.12 + Ruby rbx-2.0.0rc1 + Ruby ruby-1.9.3 + C ext ruby-2.0.0 + C ext jruby + Ruby jruby + Java ext 3.96 2.48 1.39 1.19 0.51 0.51 0.51 0.29 red/black tree, pure Ruby versus native Runtime per iteration Monday, July 1, 13
  17. 0 1 2 3 4 ruby-1.9.3 + Ruby ruby-2.0.0 +

    Ruby maglev + Ruby macruby-0.12 + Ruby rbx-2.0.0rc1 + Ruby ruby-1.9.3 + C ext ruby-2.0.0 + C ext jruby + Ruby jruby + Java ext 3.96 2.48 1.39 1.19 0.51 0.51 0.51 0.29 0.1 red/black tree, pure Ruby versus native Runtime per iteration Monday, July 1, 13
  18. Dynamic Optimization • Target method/value discovered at runtime • Lookup

    is expensive • We can cache it • Cache has to be validated • Indirection hurts pipeline • Inline methods/values at access point Monday, July 1, 13
  19. Method Caching Target Object FooClass def foo ... def bar

    ... associated with obj.foo() VM method table Monday, July 1, 13
  20. VM Operations Method Caching Target Object FooClass def foo ...

    def bar ... associated with obj.foo() VM Call Site method table Monday, July 1, 13
  21. VM Operations Method Lookup Method Caching Target Object FooClass def

    foo ... def bar ... associated with obj.foo() VM Call Site method table Monday, July 1, 13
  22. VM Operations Method Lookup Method Caching Target Object FooClass def

    foo ... def bar ... associated with obj.foo() VM def foo ... Call Site method table Monday, July 1, 13
  23. VM Operations Method Lookup Branch Method Caching Target Object FooClass

    def foo ... def bar ... associated with obj.foo() VM def foo ... Call Site method table Monday, July 1, 13
  24. VM Operations Method Lookup Branch Method Cache Method Caching Target

    Object FooClass def foo ... def bar ... associated with obj.foo() VM def foo ... Call Site method table Monday, July 1, 13
  25. VM Operations Locate Value Bind Permanently Constant Lookup Constant Table

    MY_CONST VM Access Site value Monday, July 1, 13
  26. def foo; 1; end def invoker; foo; end i =

    0 while i < 10000 invoker i+=1 end Inlining Monday, July 1, 13
  27. def invoker; 1; end i = 0 while i <

    10000 invoker i+=1 end Inline foo into invoker Monday, July 1, 13
  28. i = 0 while i < 10000 1 i+=1 end

    Inline invoker into loop Monday, July 1, 13
  29. i = 0 while i < 10000 i+=1 end Value

    is transient Monday, July 1, 13
  30. It's a multi-core world • Scaling today is horizontal, not

    vertical • N processes does not cut it • N users * X MB process = $$$ • CoW is only a partial band-aid • Non-parallel impls are falling behind • JRuby, Rubinius your only real options Monday, July 1, 13
  31. True Parallellism Ruby Threads Native Threads Ruby 1.8.7 Green Threading

    CPU Cores in Use Single Thread Monday, July 1, 13
  32. True Parallellism Ruby Threads Native Threads Ruby 1.8.7 Ruby 2.0.0

    Green Threading CPU Cores in Use Global Lock Single Thread Monday, July 1, 13
  33. True Parallellism Ruby Threads Native Threads Ruby 1.8.7 Ruby 2.0.0

    Green Threading CPU Cores in Use JRuby Global Lock Single Thread Real Threading Monday, July 1, 13
  34. Multicore in MRI 200MB MRI Instance 200MB MRI Instance 200MB

    MRI Instance 200MB MRI Instance 200MB MRI Instance 200MB MRI Instance 200MB MRI Instance 200MB MRI Instance 200MB MRI Instance 200MB MRI Instance 200MB MRI Instance Ten instances * 200MB = 2GB Monday, July 1, 13
  35. require 'benchmark' ary = (1..1000000).to_a loop { puts Benchmark.measure {

    10.times { ary.each {|i|} } } } Monday, July 1, 13
  36. require 'benchmark' ary = (1..1000000).to_a loop { puts Benchmark.measure {

    (1..10).map { Thread.new { ary.each {|i|} } }.map(&:join) } } Monday, July 1, 13
  37. Ruby 1.9 single thread Ruby 1.9 multiple threads JRuby single

    thread JRuby multiple threads Monday, July 1, 13
  38. 0.2s 0.35s 0.5s 0.65s 0.8s one thread two threads three

    threads four threads Per-iteration time versus thread count threaded_reverse Monday, July 1, 13
  39. Doing It Right • Lock-free persistent data structures • hamster

    et al • Thread-safety utilities • Mutex, Queue, thread_safe + atomic gems • Threaded servers • puma, trinidad, torquebox, JVM servers Monday, July 1, 13
  40. Finding Problems • JRuby • VM flags (heap/thread dumps, debug)

    • Some of the best tools in the world • Rubinius • gdb, OS-level tools • #rubinius Monday, July 1, 13
  41. • eval • Exceptions as flow control • Excessive allocation

    • Defeating optimizations • IO, DB, bad libraries • VM flaw* Usual Suspects *I usually assume it's JRuby's fault until proven otherwise Monday, July 1, 13
  42. eval • Code never stays the same • VM can't

    cache, can't see patterns • No optimization is possible* *Specific cases can sometimes be cached and optimized Monday, July 1, 13
  43. Fixing eval • Evaluate code into a method and leave

    it • Methods are stable, optimizable • Pass dynamic state, rather than interpolate • Branches are cheaper than new code • Do all evaluation up front • ...not during your app's hot path Monday, July 1, 13
  44. Exceptions • Act like a special return value • Construct

    object with information • Capture call stack at raise point • Unroll call stack until rescued • Overhead ranges from big to huge • Especially costly on optimizing VMs Monday, July 1, 13
  45. def foo(a); raise; rescue; return a + 1; end Shallow

    stack, 100k calls: JRuby w/ exception: 7.7s JRuby w/o exception: 0.004s Ruby 2 w/ exception: 0.25s Ruby 2 w/o exception: 0.009s Rubinius w/ exception: 0.1s Rubinius w/o exception: 0.002s Monday, July 1, 13
  46. def foo(a); raise; rescue; return a + 1; end Deep

    stack, 100k calls: JRuby w/ exception: 200s Ruby 2 w/ exception: 1.25s Rubinius w/ exception: 7.7s Monday, July 1, 13
  47. Exception Alternatives • Pre-allocated exception object • Empty backtrace passed

    to raise() • Special return value • Check at each caller • catch/throw • Avoids most overhead Monday, July 1, 13
  48. Allocation • Literals • "foo" creates object every time •

    String + String, Array + Array • Creates intermediate objects • += is especially wasteful • Slicing and enumerating • ary.map{}.select{}.inject{}.find = 3 arrays Monday, July 1, 13
  49. Fixing Literals • Constants are your friends • Optimizes well

    on most impls • Avoids literal churn • Cache common interpolated values • Study memory profiles Monday, July 1, 13
  50. Fixing Concat/Copy • Modify in place • Thread-safety trade-offs... •

    Use persistent structures • "hamster" gem • Google "immutable ruby" Monday, July 1, 13
  51. Fixing Enum Chaining • Condense into fewer steps • Lazy

    Enumerator in 2.0 • Just use a loop :-) Monday, July 1, 13
  52. Defeating Optimization • Caching and inlining are key to perf

    • If we can't cache... • Methods won't inline, won't optimize • Constants must be looked up every time • We have less time for real work Monday, July 1, 13
  53. Method Cache Busting • VM must ensure cache is correct

    • Check type • Ensure method table is the same • New type every time? No caching. • Modify method table? No caching. Monday, July 1, 13
  54. VM Operations Method Lookup Branch Method Cache Dynamic Invocation Target

    Object FooClass def foo ... def bar ... associated with obj.foo() VM Call Site method table Monday, July 1, 13
  55. VM Operations Method Lookup Branch Method Cache Dynamic Invocation Target

    Object FooClass def foo ... def bar ... associated with obj.foo() VM def foo ... Call Site method table Monday, July 1, 13
  56. VM Operations Method Lookup Branch Method Cache Dynamic Invocation Target

    Object FooClass def foo ... def bar ... associated with obj.foo() VM Call Site method table Monday, July 1, 13
  57. VM Operations Method Lookup Branch Method Cache Dynamic Invocation Target

    Object FooClass def foo ... def bar ... associated with obj.foo() VM def foo ... Call Site method table Monday, July 1, 13
  58. VM Operations Method Lookup Branch Method Cache Dynamic Invocation Target

    Object FooClass def foo ... def bar ... associated with obj.foo() VM Call Site method table Monday, July 1, 13
  59. VM Operations Method Lookup Branch Method Cache Dynamic Invocation Target

    Object FooClass def foo ... def bar ... associated with obj.foo() VM def foo ... Call Site method table Monday, July 1, 13
  60. VM Operations Method Lookup Branch Method Cache Dynamic Invocation Target

    Object FooClass def foo ... def bar ... associated with obj.foo() VM Call Site method table Monday, July 1, 13
  61. VM Operations Method Lookup Branch Method Cache Dynamic Invocation Target

    Object FooClass def foo ... def bar ... associated with obj.foo() VM def foo ... Call Site method table Monday, July 1, 13
  62. VM Operations Method Lookup Branch Method Cache Dynamic Invocation Target

    Object FooClass def foo ... def bar ... associated with obj.foo() VM Call Site method table Monday, July 1, 13
  63. VM Operations Method Lookup Branch Method Cache Dynamic Invocation Target

    Object FooClass def foo ... def bar ... associated with obj.foo() VM def foo ... Call Site method table Monday, July 1, 13
  64. Singletons • Creates new types at runtime • Impossible to

    cache based on type • Usually defines new methods • Method table is always different class << foo ... def foo.bar ... Monday, July 1, 13
  65. Object#extend • Includes module into single object • New one-off

    type every time • Class hierarchy keeps changing foo.extend Enumerable Monday, July 1, 13
  66. static VALUE io_getpartial(int argc, VALUE *argv, VALUE io, int nonblock)

    { ... n = rb_read_internal(fptr->fd, RSTRING_PTR(str), len); rb_str_unlocktmp(str); if (n < 0) { if (!nonblock && rb_io_wait_readable(fptr->fd)) goto again; if (nonblock && (errno == EWOULDBLOCK || errno == EAGAIN)) rb_mod_sys_fail(rb_mWaitReadable, "read would block"); rb_sys_fail_path(fptr->pathv); } ... } Monday, July 1, 13
  67. static VALUE io_getpartial(int argc, VALUE *argv, VALUE io, int nonblock)

    { ... n = rb_read_internal(fptr->fd, RSTRING_PTR(str), len); rb_str_unlocktmp(str); if (n < 0) { if (!nonblock && rb_io_wait_readable(fptr->fd)) goto again; if (nonblock && (errno == EWOULDBLOCK || errno == EAGAIN)) rb_mod_sys_fail(rb_mWaitReadable, "read would block"); rb_sys_fail_path(fptr->pathv); } ... } Monday, July 1, 13
  68. void rb_mod_sys_fail(VALUE mod, const char *mesg) { VALUE exc =

    make_errno_exc(mesg); rb_extend_object(exc, mod); rb_exc_raise(exc); } Monday, July 1, 13
  69. void rb_mod_sys_fail(VALUE mod, const char *mesg) { VALUE exc =

    make_errno_exc(mesg); rb_extend_object(exc, mod); rb_exc_raise(exc); } Monday, July 1, 13
  70. Fixing Singletons/ #extend • Functional patterns • FooLibrary.process(obj) rather than

    obj.extend FooLibrary; obj.process • Create types up front (programmatically?) • 1000 predefined types beats infinite types Monday, July 1, 13
  71. Constant Lookup • Constants in tables on classes/modules • Usually

    assigned only once, at load time • Lookup is expensive, like methods • Values can be cached Monday, July 1, 13
  72. Constant Cache • Constant search proceeds two ways • First,

    lexical scoping • Second, class hierarchy • Invalidation happens globally Monday, July 1, 13
  73. Constant Cache Busting • Redefining constants • Introducing new lexical

    scopes • Classes created at runtime • Evaluated code • Altering class hierarchies • Lookup results may change...no caching Monday, July 1, 13
  74. Fixing Constants • Don't modify them • i.e. CONSTANT •

    Avoid runtime class hierarchy changes Monday, July 1, 13
  75. Performance Issues • Assume nothing...most can be fixed • Isolate

    bad code, small a case as possible • Use VM tools to monitor caches • Fix if it's your bug, PR if it's a library • Come to us for help or if it's a VM bug • Repeat... Monday, July 1, 13
  76. Concurrency Issues • Avoid mutable state • Synchronize mutations •

    Start coarse-grained, get finer over time • VM tooling to monitor locks, contention • Contact VM authors for help Monday, July 1, 13
  77. Thank You! • Charles Oliver Nutter • @headius • http://jruby.org

    • http://blog.headius.com • Book: "Using JRuby" • Book: "Deploying JRuby" Monday, July 1, 13