Slide 1

Slide 1 text

Benoit Daloze TruffleRuby Project Lead [email protected] Running Rack and Rails Faster with TruffleRuby

Slide 2

Slide 2 text

The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, timing, and pricing of any features or functionality described for Oracle’s products may change and remains at the sole discretion of Oracle Corporation. GraalVM Native Image Early Adopter Status GraalVMNative Image technology (including SubstrateVM) is Early Adopter technology. It is available only under an early adopter license and remains subject to potentially significant further changes, compatibility testing and certification. Safe harbor statement Copyright © 2020, Oracle and/or its affiliates 2

Slide 3

Slide 3 text

TruffleRuby • A high-performance Ruby implementation by Oracle Labs • Uses the GraalVM JIT Compiler • Targets full compatibility with CRuby 2.6, including C extensions • Open-Source on GitHub: https://github.com/oracle/truffleruby © 2020 Oracle 3

Slide 4

Slide 4 text

Two Modes to Run TruffleRuby • JVM Mode • Can interoperate with Java conveniently • Native Mode (default), TruffleRuby & Graal are compiled AOT to a native executable • Fast startup, even faster than CRuby 2.6! (25ms vs 48ms) • Fast warmup (Graal & TruffleRuby interpreter precompiled) • Lower footprint (≈ 60MB max RSS for Hello World) © 2020 Oracle 4

Slide 5

Slide 5 text

Compatibility • Targets full compatibility with CRuby 2.6 • Many C extensions work out of the box: openssl, zlib, ripper, nokogiri, database drivers and many more • Can reuse many existing Gemfiles with no changes • Completeness according to ruby/spec: © 2020 Oracle 5

Slide 6

Slide 6 text

Goals and Status • Run idiomatic Ruby code faster • TruffleRuby is doing great on many CPU-intensive benchmarks • Run Ruby code in parallel • No Global Interpreter Lock for Ruby code • A Global Lock is used for C extensions for maximum compatibility • Provide new tooling that work across languages • GraalVM provides cross-language debuggers, profilers, etc © 2020 Oracle 6

Slide 7

Slide 7 text

Built-in Profiler: CPUSampler --------------------------------------------------------------------------------------------------------------- Name | Total Time | Opt % || Self Time | Opt % | Location --------------------------------------------------------------------------------------------------------------- Optcarrot::PPU#vsync | 32559ms 63.6% | 59.0% || 19311ms 37.7% | 99.5% | optcarrot/ppu.rb~258-264 Optcarrot::PPU#run | 13728ms 26.8% | 0.0% || 13715ms 26.8% | 0.0% | optcarrot/ppu.rb~875-883 Optcarrot::CPU#run | 14042ms 27.4% | 59.6% || 9589ms 18.7% | 67.9% | optcarrot/cpu.rb~925-945 Optcarrot::CPU#r_op | 2312ms 4.5% | 55.4% || 1346ms 2.6% | 78.3% | optcarrot/cpu.rb~876-879 Optcarrot::APU#proceed | 2933ms 5.7% | 18.4% || 894ms 1.7% | 57.4% | optcarrot/apu.rb~225-231 Optcarrot::APU::Mixer#sample | 1999ms 3.9% | 0.4% || 721ms 1.4% | 1.0% | optcarrot/apu.rb~377-390 Optcarrot::APU::Noise#sample | 578ms 1.1% | 7.8% || 576ms 1.1% | 7.8% | optcarrot/apu.rb~702-725 Optcarrot::APU::Pulse#sample | 469ms 0.9% | 4.9% || 466ms 0.9% | 4.9% | optcarrot/apu.rb~559-585 © 2020 Oracle 7 $ truffleruby --cpusampler optcarrot.rb

Slide 8

Slide 8 text

Rails Simpler Benchmarks (RSB) • https://github.com/noahgibbs/rsb • By Noah Gibbs, who did a lot of benchmarking related to Ruby 3x3 • A simple Rack app and a simple Rails 4 app with various routes • Benchmarked with `wrk` • Harness to run with various configurations (Ruby, concurrency, etc) © 2020 Oracle 8

Slide 9

Slide 9 text

What We Benchmark in this Talk • The Rack app with a simple ERB template (from CRuby benchmarks) • The Rails 4 app, serving a plain response • On Puma, a widely used web server (all tests pass on TruffleRuby) © 2020 Oracle 9

Slide 10

Slide 10 text

Which Rubies? • Ruby interpreters: • CRuby 2.6.6 • TruffleRuby master, in JVM mode • I wanted to try JRuby 9.2.13.0, but: • There is a bug with JRuby+Puma with keep-alive connections that is still not solved with latest JRuby+Puma (since RubyKaigi'19). The result is JRuby seems idle, uses little CPU and is very slow. • Benchmarking RSB without keep-alive connections is not really interesting, there is a lot of overhead to create a new connection for each request. © 2020 Oracle 10

Slide 11

Slide 11 text

Concurrency Settings • 3 concurrency settings • Number of server threads / processes • Number of wrk request threads • Total number of connections inside wrk (>= request threads) • Concretely we use the same number for all (e.g., 1/1/1, 2/2/2) so: • Same number of request and server threads • Each request thread maintains 1 connection • Avoids extra processing on both wrk's side and the server side © 2020 Oracle 11

Slide 12

Slide 12 text

Other Benchmark Considerations • Processor: AMD Ryzen 7 3700X 8-Core Processor • Frequency scaling & Turbo Core/Boost enabled, like on real servers • All results show peak performance after enough warmup • Reported numbers are the Requests/sec from `wrk` for 10 seconds • Very small Rack/Rails applications, they do not represent well real applications • Conditions for measurement are ideal, not real world © 2020 Oracle 12

Slide 13

Slide 13 text

Rack ERB Results with Threads © 2020 Oracle 13

Slide 14

Slide 14 text

Rack ERB Results with Threads & Processes © 2020 Oracle 14

Slide 15

Slide 15 text

Advantages of Threads over Processes • Need to JIT the program once, not once per process • Common data structures are guaranteed shared vs hopefully sharing some of it via copy-on-write memory with fork • No fork() pitfalls, not magically sharing the same file descriptor • Not having to manage multiple processes, just one • Much faster communication between threads than between processes • Can easily synchronize if needed © 2020 Oracle 15

Slide 16

Slide 16 text

Rails Results with Threads & Processes © 2020 Oracle 16

Slide 17

Slide 17 text

Future Work • Benchmarks ran on TruffleRuby with the C extension lock disabled, since it is OK to run the Puma parser C extension in parallel • A very similar issue for Ruby 3's Ractor, C extensions need to be marked as thread-safe / can be executed in parallel • For TruffleRuby, it only matters that the state in C is synchronized, the state of the Ruby interpreter is already thread-safe • The Rails benchmark ran on TruffleRuby with a higher splitting limit, the default is too low for Rails. Splitting avoids megamorphic calls. © 2020 Oracle 17

Slide 18

Slide 18 text

Reminder about the Results • Very small Rack/Rails applications, they do not represent well real applications • Conditions for measurement are ideal, not real world • Benchmark your own applications, and if possible share the results © 2020 Oracle 18

Slide 19

Slide 19 text

Trying TruffleRuby Via your favorite Ruby manager (Native mode): • ruby-install truffleruby • rbenv install truffleruby-20.2.0 • rvm install truffleruby Via GraalVM, downloads at https://www.graalvm.org/ • 'gu install ruby' to install the TruffleRuby component • Other languages in GraalVM: Java, JavaScript + Node.js, Python, R, LLVM bitcode (C, C++, Rust, etc), WebAssembly, ... in a single VM! © 2020 Oracle 19