$30 off During Our Annual Pro Sale. View Details »

Running Rack and Rails Faster with TruffleRuby

Benoit Daloze
September 04, 2020

Running Rack and Rails Faster with TruffleRuby

Video at https://youtu.be/0ykPiPAKZL8?t=16104

Code at https://github.com/eregon/rsb/tree/bench
Presentation at RubyKaigi 2020
https://rubykaigi.org/2020-takeout/speakers#eregontp

Optimizing Rack and Rails applications with a just-in-time (JIT) compiler is a challenge. For example, MJIT does not speed up Rails currently. TruffleRuby tackles this challenge. We have been running the Rails Simpler Benchmarks with TruffleRuby and now achieve higher performance than any other Ruby implementation.

In this talk we’ll show how we got there and what TruffleRuby optimizations are useful for Rack and Rails applications. TruffleRuby is getting ready to speed up your applications, will you try it?

Benoit Daloze

September 04, 2020
Tweet

More Decks by Benoit Daloze

Other Decks in Programming

Transcript

  1. Benoit Daloze
    TruffleRuby Project Lead
    [email protected]
    Running Rack and Rails
    Faster with TruffleRuby

    View Slide

  2. The following is intended to outline our general product direction. It is intended for information purposes only, and
    may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality,
    and should not be relied upon in making purchasing decisions. The development, release, timing, and pricing of
    any features or functionality described for Oracle’s products may change and remains at the sole discretion of
    Oracle Corporation.
    GraalVM Native Image Early Adopter Status
    GraalVMNative Image technology (including SubstrateVM) is Early Adopter technology. It is available only under
    an early adopter license and remains subject to potentially significant further changes, compatibility testing and
    certification.
    Safe harbor statement
    Copyright © 2020, Oracle and/or its affiliates
    2

    View Slide

  3. TruffleRuby
    • A high-performance Ruby implementation by Oracle Labs
    • Uses the GraalVM JIT Compiler
    • Targets full compatibility with CRuby 2.6, including C extensions
    • Open-Source on GitHub: https://github.com/oracle/truffleruby
    © 2020 Oracle
    3

    View Slide

  4. Two Modes to Run TruffleRuby
    • JVM Mode
    • Can interoperate with Java conveniently
    • Native Mode (default), TruffleRuby & Graal are compiled AOT to a
    native executable
    • Fast startup, even faster than CRuby 2.6! (25ms vs 48ms)
    • Fast warmup (Graal & TruffleRuby interpreter precompiled)
    • Lower footprint (≈ 60MB max RSS for Hello World)
    © 2020 Oracle
    4

    View Slide

  5. Compatibility
    • Targets full compatibility with CRuby 2.6
    • Many C extensions work out of the box:
    openssl, zlib, ripper, nokogiri, database drivers and many more
    • Can reuse many existing Gemfiles with no changes
    • Completeness according to ruby/spec:
    © 2020 Oracle
    5

    View Slide

  6. Goals and Status
    • Run idiomatic Ruby code faster
    • TruffleRuby is doing great on many CPU-intensive benchmarks
    • Run Ruby code in parallel
    • No Global Interpreter Lock for Ruby code
    • A Global Lock is used for C extensions for maximum compatibility
    • Provide new tooling that work across languages
    • GraalVM provides cross-language debuggers, profilers, etc
    © 2020 Oracle
    6

    View Slide

  7. Built-in Profiler: CPUSampler
    ---------------------------------------------------------------------------------------------------------------
    Name | Total Time | Opt % || Self Time | Opt % | Location
    ---------------------------------------------------------------------------------------------------------------
    Optcarrot::PPU#vsync | 32559ms 63.6% | 59.0% || 19311ms 37.7% | 99.5% | optcarrot/ppu.rb~258-264
    Optcarrot::PPU#run | 13728ms 26.8% | 0.0% || 13715ms 26.8% | 0.0% | optcarrot/ppu.rb~875-883
    Optcarrot::CPU#run | 14042ms 27.4% | 59.6% || 9589ms 18.7% | 67.9% | optcarrot/cpu.rb~925-945
    Optcarrot::CPU#r_op | 2312ms 4.5% | 55.4% || 1346ms 2.6% | 78.3% | optcarrot/cpu.rb~876-879
    Optcarrot::APU#proceed | 2933ms 5.7% | 18.4% || 894ms 1.7% | 57.4% | optcarrot/apu.rb~225-231
    Optcarrot::APU::Mixer#sample | 1999ms 3.9% | 0.4% || 721ms 1.4% | 1.0% | optcarrot/apu.rb~377-390
    Optcarrot::APU::Noise#sample | 578ms 1.1% | 7.8% || 576ms 1.1% | 7.8% | optcarrot/apu.rb~702-725
    Optcarrot::APU::Pulse#sample | 469ms 0.9% | 4.9% || 466ms 0.9% | 4.9% | optcarrot/apu.rb~559-585
    © 2020 Oracle
    7
    $ truffleruby --cpusampler optcarrot.rb

    View Slide

  8. Rails Simpler Benchmarks (RSB)
    • https://github.com/noahgibbs/rsb
    • By Noah Gibbs, who did a lot of benchmarking related to Ruby 3x3
    • A simple Rack app and a simple Rails 4 app with various routes
    • Benchmarked with `wrk`
    • Harness to run with various configurations (Ruby, concurrency, etc)
    © 2020 Oracle
    8

    View Slide

  9. What We Benchmark in this Talk
    • The Rack app with a simple ERB template (from CRuby benchmarks)
    • The Rails 4 app, serving a plain response
    • On Puma, a widely used web server (all tests pass on TruffleRuby)
    © 2020 Oracle
    9

    View Slide

  10. Which Rubies?
    • Ruby interpreters:
    • CRuby 2.6.6
    • TruffleRuby master, in JVM mode
    • I wanted to try JRuby 9.2.13.0, but:
    • There is a bug with JRuby+Puma with keep-alive connections that is still
    not solved with latest JRuby+Puma (since RubyKaigi'19).
    The result is JRuby seems idle, uses little CPU and is very slow.
    • Benchmarking RSB without keep-alive connections is not really
    interesting, there is a lot of overhead to create a new connection for each
    request.
    © 2020 Oracle
    10

    View Slide

  11. Concurrency Settings
    • 3 concurrency settings
    • Number of server threads / processes
    • Number of wrk request threads
    • Total number of connections inside wrk (>= request threads)
    • Concretely we use the same number for all (e.g., 1/1/1, 2/2/2) so:
    • Same number of request and server threads
    • Each request thread maintains 1 connection
    • Avoids extra processing on both wrk's side and the server side
    © 2020 Oracle
    11

    View Slide

  12. Other Benchmark Considerations
    • Processor: AMD Ryzen 7 3700X 8-Core Processor
    • Frequency scaling & Turbo Core/Boost enabled, like on real servers
    • All results show peak performance after enough warmup
    • Reported numbers are the Requests/sec from `wrk` for 10 seconds
    • Very small Rack/Rails applications, they do not represent well
    real applications
    • Conditions for measurement are ideal, not real world
    © 2020 Oracle
    12

    View Slide

  13. Rack ERB Results with Threads
    © 2020 Oracle
    13

    View Slide

  14. Rack ERB Results with Threads & Processes
    © 2020 Oracle
    14

    View Slide

  15. Advantages of Threads over Processes
    • Need to JIT the program once, not once per process
    • Common data structures are guaranteed shared vs
    hopefully sharing some of it via copy-on-write memory with fork
    • No fork() pitfalls, not magically sharing the same file descriptor
    • Not having to manage multiple processes, just one
    • Much faster communication between threads than between processes
    • Can easily synchronize if needed
    © 2020 Oracle
    15

    View Slide

  16. Rails Results with Threads & Processes
    © 2020 Oracle
    16

    View Slide

  17. Future Work
    • Benchmarks ran on TruffleRuby with the C extension lock disabled, since it is
    OK to run the Puma parser C extension in parallel
    • A very similar issue for Ruby 3's Ractor, C extensions need to be marked as
    thread-safe / can be executed in parallel
    • For TruffleRuby, it only matters that the state in C is synchronized, the
    state of the Ruby interpreter is already thread-safe
    • The Rails benchmark ran on TruffleRuby with a higher splitting limit, the
    default is too low for Rails. Splitting avoids megamorphic calls.
    © 2020 Oracle
    17

    View Slide

  18. Reminder about the Results
    • Very small Rack/Rails applications, they do not represent well
    real applications
    • Conditions for measurement are ideal, not real world
    • Benchmark your own applications, and if possible share the results
    © 2020 Oracle
    18

    View Slide

  19. Trying TruffleRuby
    Via your favorite Ruby manager (Native mode):
    • ruby-install truffleruby
    • rbenv install truffleruby-20.2.0
    • rvm install truffleruby
    Via GraalVM, downloads at https://www.graalvm.org/
    • 'gu install ruby' to install the TruffleRuby component
    • Other languages in GraalVM: Java, JavaScript + Node.js, Python, R, LLVM
    bitcode (C, C++, Rust, etc), WebAssembly, ... in a single VM!
    © 2020 Oracle
    19

    View Slide