Six Years of Ruby Performance: A History

Six Years of Ruby Performance History But How to Measure...?

Have You Seen the Toy Museum? Brighton's fun. If you
haven't seen the toy and model museum, it's right under the train station.

I Love Questions - Seriously Am I speaking too fast?
Tell me. Methodology question? Speak up. I can’t always see your hands, so just ask. Separately: AppFolio pays me to do Ruby stuff.  Thank you, AppFolio!

Ruby 3x3 In Japan, Matz spoke about Ruby 3x3 at
RubyKaigi. He said the speed is doing well.

Will JIT Fix It? With MJIT, Ruby 2.6 is 280%
the speed of 2.0.0. We’re nearly done!

Except... But JIT doesn’t help with Rails. With MJIT, Rails
is slower than without it. MJIT will get a little faster, but...

Ruby or Rails? I’m going to look at using Rails
to measure Ruby speed because… Rails is important code that is slow right now.

Old Way: Measure with RRB Rails Ruby Bench answers “how
fast is Ruby for a real-world, production Rails app?”

How Does RRB Measure? Rails Ruby Bench uses a highly
concurrent simulated real world workload on a real world forum app called Discourse.

Throughput, Not Latency RRB uses 10 processes and 60 threads,
based on measurements showing that to be the fastest configuration. We'll talk more about that later.

"Real World", "Production" Us: It’s about 172% faster. But the
times are all mixed together. Which part is slow? RRB:

"Real World", "Production" Us: It’s about 172% faster. But the
times are all mixed together. Which part is slow? RRB: ... "

Micro vs Macro Benchmarks Benchmarks come in many sizes. RRB
is very large.

Micro/Macro Small benchmarks are specific. Large ones are representative.

Rails Ruby Bench is Macro RRB answers “how fast is
this workload?” but not “which part is slow?”

This is a Job For... To explore specific questions, we'll
use a smaller benchmark. What should that benchmark be like?

RSB == Rails Simpler Bench RSB should make it easy
to time various operations. But not mixed into one single timing like RRB.

Routes for Exploration We’ll begin exploring with a trivial “hello,
world” route that returns a static string.

Speciﬁc We’ll start with a single-process, single- thread concurrency model.
Simple, with low latency.

Speciﬁcity So I built it. Let's see what it says
about a few different questions.

First: Rails Overhead Rails takes some time to run for
each request. A static “Hello, World” route is just benchmarking Rails overhead. What does it find?

First Off... WHAT? Remember how with RRB, half the gains
came from Ruby 2.1 and 2.2? This doesn’t look like that at all!

Wait, How Much? Also in this graph Ruby 2.6 gained
only about 30% over 2.0.0p0, and not 70%+. What?

Oh. Wait. We’re not measuring just Ruby framework code yet.
We’re also measuring the app server (Puma.)

Framework Time, App Time To measure Rails framework time versus
the app server, I'll build a quick Rack app and measure it.

And That's Why They Say... This is why Rails has
a special API mode. Normal Rails has a lot of overhead compared to Rack.

And Now, the Final Total If we turn reqs/sec into
secs/req and subtract, we get Rails' overhead per request.

Ruby 3x3 Means Concurrency We've looked at framework overhead. What
about concurrency? Is concurrency improving?

Exploration: Concurrency On a multicore machine we get better throughput
with more processes and threads. But how many?

Maximum CRuby Throughput We’ll look for a configuration which will
use a GIL-friendly number of processes and threads.

The Experiment I set up RSB to run with varying
numbers of processes and threads on Ruby from 2.0 to 2.6. For instance, 4 processes beats 4 threads...

Threads Bad? In fact, sometimes threads are clearly worse than
no threads. Let's look at 4 processes - with only 1 thread/process versus with 8.

Sometimes? With more processes, threads help... But only a little.
Here's 8 processes, with 1 thread versus 4 threads.

But Why? This is a workload difference between RRB and
RSB. RRB has lots of non-Ruby work, and lots of context switches.

CRuby Threads, CRuby GIL CRuby only benefits from threads for
non-Ruby time - database, I/O, C extensions, etc. RSB is nearly 100% Ruby work.

One More Observation On each previous graph, from left to
right, each row does roughly the same thing...

So... That means each Ruby is handling concurrency the same
way.

Is There a Win Here? One reminder: if we could
run Ruby concurrently (with Guilds?) then we might be able to get far more speed from this benchmark.

Rails and MJIT We know MJIT isn't helping for Rails
yet. But is it coming closer? Let's check. That means comparing 2.6 performance to prerelease Ruby 2.7 performance.

Rails w/ MJIT - Getting Closer Ruby 2.6 vs recent
2.7 - with 2.6, MJIT is 83% the speed. With 2.7, it's 94%.

Rack w/ MJIT - No Change Ruby 2.6, Rack w/
MJIT is 97% the speed. With 2.7, it's also 97%.

What's Next? There’s a lot more I can benchmark with
RSB, and I do. I post the results to engineering.appfolio.com. Future comparisons will include app servers, memory allocators and floating point.

Methodology RSB is pretty methodologically solid, though the code is
new. This isn’t a 100% full description, but...

Methodology Misc Dedicated EC2 Linux Instance Localhost, not TCP Multiple
consecutive batches, random order One route/concurrency config per batch Many requests/batch No Redis or cache No destructive routes Fail batch > 0.01% error Configurable warmups Low-overhead C load tester (wg/wrk) Rails 4.2 for Ruby 2.0 -> 2.6+ compatibility

The Code, The Data RSB Code: https://github.com/noahgibbs/rsb Data for these
graphs: https://github.com/noahgibbs/rrb_datavis

Questions? Comments? Objections? Recitations? Show Tunes? Should you discover an
important question you missed, I can be reached as @codefolio on Twitter or [email protected] as well. I love talking Ruby! These slides: http://bit.ly/brighton2019-gibbs

Six Years of Ruby Performance: A History

Six Years of Ruby Performance: A History

More Decks by Noah Gibbs

Other Decks in Programming

Featured

Transcript