Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Running Rack and Rails Faster with TruffleRuby

Running Rack and Rails Faster with TruffleRuby

Video at https://youtu.be/0ykPiPAKZL8?t=16104

Code at https://github.com/eregon/rsb/tree/bench
Presentation at RubyKaigi 2020

Optimizing Rack and Rails applications with a just-in-time (JIT) compiler is a challenge. For example, MJIT does not speed up Rails currently. TruffleRuby tackles this challenge. We have been running the Rails Simpler Benchmarks with TruffleRuby and now achieve higher performance than any other Ruby implementation.

In this talk we’ll show how we got there and what TruffleRuby optimizations are useful for Rack and Rails applications. TruffleRuby is getting ready to speed up your applications, will you try it?

Benoit Daloze

September 04, 2020

More Decks by Benoit Daloze

Other Decks in Programming


  1. The following is intended to outline our general product direction.

    It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, timing, and pricing of any features or functionality described for Oracle’s products may change and remains at the sole discretion of Oracle Corporation. GraalVM Native Image Early Adopter Status GraalVMNative Image technology (including SubstrateVM) is Early Adopter technology. It is available only under an early adopter license and remains subject to potentially significant further changes, compatibility testing and certification. Safe harbor statement Copyright © 2020, Oracle and/or its affiliates 2
  2. TruffleRuby • A high-performance Ruby implementation by Oracle Labs •

    Uses the GraalVM JIT Compiler • Targets full compatibility with CRuby 2.6, including C extensions • Open-Source on GitHub: https://github.com/oracle/truffleruby © 2020 Oracle 3
  3. Two Modes to Run TruffleRuby • JVM Mode • Can

    interoperate with Java conveniently • Native Mode (default), TruffleRuby & Graal are compiled AOT to a native executable • Fast startup, even faster than CRuby 2.6! (25ms vs 48ms) • Fast warmup (Graal & TruffleRuby interpreter precompiled) • Lower footprint (≈ 60MB max RSS for Hello World) © 2020 Oracle 4
  4. Compatibility • Targets full compatibility with CRuby 2.6 • Many

    C extensions work out of the box: openssl, zlib, ripper, nokogiri, database drivers and many more • Can reuse many existing Gemfiles with no changes • Completeness according to ruby/spec: © 2020 Oracle 5
  5. Goals and Status • Run idiomatic Ruby code faster •

    TruffleRuby is doing great on many CPU-intensive benchmarks • Run Ruby code in parallel • No Global Interpreter Lock for Ruby code • A Global Lock is used for C extensions for maximum compatibility • Provide new tooling that work across languages • GraalVM provides cross-language debuggers, profilers, etc © 2020 Oracle 6
  6. Built-in Profiler: CPUSampler --------------------------------------------------------------------------------------------------------------- Name | Total Time | Opt

    % || Self Time | Opt % | Location --------------------------------------------------------------------------------------------------------------- Optcarrot::PPU#vsync | 32559ms 63.6% | 59.0% || 19311ms 37.7% | 99.5% | optcarrot/ppu.rb~258-264 Optcarrot::PPU#run | 13728ms 26.8% | 0.0% || 13715ms 26.8% | 0.0% | optcarrot/ppu.rb~875-883 Optcarrot::CPU#run | 14042ms 27.4% | 59.6% || 9589ms 18.7% | 67.9% | optcarrot/cpu.rb~925-945 Optcarrot::CPU#r_op | 2312ms 4.5% | 55.4% || 1346ms 2.6% | 78.3% | optcarrot/cpu.rb~876-879 Optcarrot::APU#proceed | 2933ms 5.7% | 18.4% || 894ms 1.7% | 57.4% | optcarrot/apu.rb~225-231 Optcarrot::APU::Mixer#sample | 1999ms 3.9% | 0.4% || 721ms 1.4% | 1.0% | optcarrot/apu.rb~377-390 Optcarrot::APU::Noise#sample | 578ms 1.1% | 7.8% || 576ms 1.1% | 7.8% | optcarrot/apu.rb~702-725 Optcarrot::APU::Pulse#sample | 469ms 0.9% | 4.9% || 466ms 0.9% | 4.9% | optcarrot/apu.rb~559-585 © 2020 Oracle 7 $ truffleruby --cpusampler optcarrot.rb
  7. Rails Simpler Benchmarks (RSB) • https://github.com/noahgibbs/rsb • By Noah Gibbs,

    who did a lot of benchmarking related to Ruby 3x3 • A simple Rack app and a simple Rails 4 app with various routes • Benchmarked with `wrk` • Harness to run with various configurations (Ruby, concurrency, etc) © 2020 Oracle 8
  8. What We Benchmark in this Talk • The Rack app

    with a simple ERB template (from CRuby benchmarks) • The Rails 4 app, serving a plain response • On Puma, a widely used web server (all tests pass on TruffleRuby) © 2020 Oracle 9
  9. Which Rubies? • Ruby interpreters: • CRuby 2.6.6 • TruffleRuby

    master, in JVM mode • I wanted to try JRuby, but: • There is a bug with JRuby+Puma with keep-alive connections that is still not solved with latest JRuby+Puma (since RubyKaigi'19). The result is JRuby seems idle, uses little CPU and is very slow. • Benchmarking RSB without keep-alive connections is not really interesting, there is a lot of overhead to create a new connection for each request. © 2020 Oracle 10
  10. Concurrency Settings • 3 concurrency settings • Number of server

    threads / processes • Number of wrk request threads • Total number of connections inside wrk (>= request threads) • Concretely we use the same number for all (e.g., 1/1/1, 2/2/2) so: • Same number of request and server threads • Each request thread maintains 1 connection • Avoids extra processing on both wrk's side and the server side © 2020 Oracle 11
  11. Other Benchmark Considerations • Processor: AMD Ryzen 7 3700X 8-Core

    Processor • Frequency scaling & Turbo Core/Boost enabled, like on real servers • All results show peak performance after enough warmup • Reported numbers are the Requests/sec from `wrk` for 10 seconds • Very small Rack/Rails applications, they do not represent well real applications • Conditions for measurement are ideal, not real world © 2020 Oracle 12
  12. Advantages of Threads over Processes • Need to JIT the

    program once, not once per process • Common data structures are guaranteed shared vs hopefully sharing some of it via copy-on-write memory with fork • No fork() pitfalls, not magically sharing the same file descriptor • Not having to manage multiple processes, just one • Much faster communication between threads than between processes • Can easily synchronize if needed © 2020 Oracle 15
  13. Future Work • Benchmarks ran on TruffleRuby with the C

    extension lock disabled, since it is OK to run the Puma parser C extension in parallel • A very similar issue for Ruby 3's Ractor, C extensions need to be marked as thread-safe / can be executed in parallel • For TruffleRuby, it only matters that the state in C is synchronized, the state of the Ruby interpreter is already thread-safe • The Rails benchmark ran on TruffleRuby with a higher splitting limit, the default is too low for Rails. Splitting avoids megamorphic calls. © 2020 Oracle 17
  14. Reminder about the Results • Very small Rack/Rails applications, they

    do not represent well real applications • Conditions for measurement are ideal, not real world • Benchmark your own applications, and if possible share the results © 2020 Oracle 18
  15. Trying TruffleRuby Via your favorite Ruby manager (Native mode): •

    ruby-install truffleruby • rbenv install truffleruby-20.2.0 • rvm install truffleruby Via GraalVM, downloads at https://www.graalvm.org/ • 'gu install ruby' to install the TruffleRuby component • Other languages in GraalVM: Java, JavaScript + Node.js, Python, R, LLVM bitcode (C, C++, Rust, etc), WebAssembly, ... in a single VM! © 2020 Oracle 19