Conquering Massive Traffic Spikes in Ruby Applications with Pitchfork

CONQUERING MASSIVE TRAFFIC SPIKES IN RUBY APPLICATIONS WITH PITCHFORK PRESENTED
BY SANGYONG SIM “SHIA” EURUKO 2025 VIANA DO CASTELO | PORTUGAL

Self Introduction Sangyong Sim @ STORES shia @ Internet riseshia
@ GitHub Favorite lib: coverage Online profile

STORES Netshop

Merchants of every size Start a sales at specific times
Traffic is incredibly difficult to predict Scheduled sales

Limited goods on sale! more than 10x

Latency degradation 😭 p50 p95 p90

Deliver a stable purchasing experience even during massive traffic spikes!
Our mission

Deliver a stable purchasing experience even during massive traffic spikes,
effortlessly!! Our mission

Minimize request queuing ~= Provision sufficient web server workers Fix
the latency degradation observed at p90 and above Problems Note: Application optimization and cache-based load reduction are out of scope for this talk.

Runs on AWS ECS Fargate Capacity managed by Auto Scaling
Group (ASG) Built with Ruby on Rails / Unicorn Infrastructure of the service

Traffic prediction is impossible → Use historical patterns as baseline
When traffic exceeds estimates → Some queuing is inevitable Peak duration: typically under 1 minute, full resolution: within 5 minutes workers requested by ASG 2-5 min later, so it won’t help. Challenge: Provisioning Sufficient Web Server Workers

Small-scale spikes Always run with excess capacity for 2-3x spike
as buffer Use Fargate Spot for cost optimization Large-scale spikes The rare extreme spikes are usually predictable from their scale, so we manually scale out right before launch Challenge: Provisioning Sufficient Web Server Workers

Challenge: Fix the latency degradation p50 p95 p90 Ruby External
API DB Average Request Time Breakdown by component

Were some workers cold…?

Rails apps have many things executed only once or few
times after boot up, making first requests be slow: TCP connection establishment In-memory cache JIT compilation (if YJIT is enabled) Define methods by method_missing Action View template compilation … The Cold Worker Problem

Let's track which Unicorn workers actually handle requests Endpoint with
100ms response time 8 workers 2 concurrent connections (simulating low load) 10 second test run Compare the number of processed requests per worker Why Only Some Requests?

worker 0: 85 worker 1: 86 worker 2: 2 worker
3: 0 worker 4: 0 worker 5: 0 worker 6: 0 worker 7: 0 Note: Reproducible on Linux environments only Why Only Some Requests?

Why the Imbalance? Socket epoll Worker 0 Worker 1 Worker
n ... Notify Watch Unicorn is a prefork web server At startup, forks configured number of worker processes All workers share a single TCP socket Unicorn uses epoll (or kqueue) What's the notification order?

Worker 1 the worker queue from epoll's perspective Queue works
like LIFO (Last in First out) -> with just 2 concurrent requests, workers 0 and 1 keep jumping to the front Ref: https://blog.cloudflare.com/the-sad-state-of-linux-socket-balancing/ Why the Imbalance? epoll Notify Worker 2 Worker 0 ... Goes back to the front of the queue

Over-provision for traffic spike. Extra workers sit idle since startup.
When the sale hits, extra workers get work. Extra workers are cold, so…? So, What happened:

Generate real traffic to warm Pre-warm before receiving real traffic
(a.k.a service in) Switch to thread-based model, Puma? 🤔 How to warm up All Workers?

Shopify's fork of Unicorn It has a feature called “refork”.
Pitchfork

Uses a worker that has processed a certain number of
requests (adjustable) as template to refork all workers Reduces memory usage by maximizing shared memory with Copy-on-Write (CoW) Refork

COMMAND \_ pitchfork master \_ (gen:0) mold \_ (gen:0) worker[0]
\_ (gen:0) worker[1] \_ (gen:0) worker[2] \_ (gen:0) worker[3] Refork COMMAND \_ pitchfork master \_ (gen:1) mold \_ (gen:1) worker[0] \_ (gen:1) worker[1] \_ (gen:1) worker[2] \_ (gen:1) worker[3] Promote fork

If we refork from a warm worker, won't all workers
be warmed? Refork

LET’S MIGRATE TO PITCHFORK!!!!!

forked process will: inherit opened file descriptors stay alive only
main thread Caveat 1: Fork safety

Is safe to fork with your code after processing some
request? If you’re using unicorn already, then might be ok except some gems: grpc, ruby-vips ... See more: https://github.com/Shopify/pitchfork/blob/master/docs/FORK_SAFETY.md Caveat 1: Fork safety

Sets a number of requests threshold for triggering an automatic
refork. The limit is per-worker, for instance with refork_after “[50]” a refork is triggered once at least one worker processed 50 requests. Each element is a limit for the next generation. “[50, 100, 1000]” a new generation is triggered when a worker has processed 50 requests, then the second generation when a worker from the new generation processed an additional 100 requests and finally after every 1000 requests. Ref: https://github.com/Shopify/pitchfork/blob/master/docs/CONFIGURATION.md#refork_after Caveat 2: refork_after

Refork should be worked before sales start Our service prepare
extra capacity with scale out. So...? -> Need to set extremely small number at the starting point Caveat 2: refork_after

There is no pitchfork-worker-killer, but you can implement it with
after_request_complete callback: Ref: https://github.com/Shopify/pitchfork/issues/92 Caveat 3: worker killer check_cycle = 1500 memory_size_limit = 1024 * 1024 # 1GB (unit: KB) after_request_complete do |server, worker, _env| if (worker.requests_count % check_cycle).zero? mem_info = Pitchfork::MemInfo.new(worker.pid) if mem_info.rss > memory_size_limit exit end end end

Comparing the same major annual sale event. All graphs use
identical Y-axis scales. The only significant change is switching from Unicorn to Pitchfork. Migration Results

Migration Results - RPS 2023 2024

Migration Results - Latency 2023 2024 p50 p95 p90 p50
p95 p90

Migration Results breakdown by component 2023 2024 p50 p95 p90
p50 p95 p90 Ruby External API DB Ruby External API DB

Reactions 😂

Extra effect of Pitchfork Does It make app be memory
efficient?

Extra effect of Pitchfork Does It make app be memory
efficient? No 😭 Netshop always have idle workers, which makes this less efficient… Or just it doesn’t have much things to share after warmed up.

Summary Cold workers cause p90 latency spikes in Unicorn Worker
scheduling creates uneven worker distribution Pitchfork's refork enable all workers be warmed 3x traffic handled smoothly after migration

Furthermore: The Pitchfork Story Ref: https://byroot.github.io/ruby/performance/2025/03/04/the-pitchfork-story.html

THANK YOU

Conquering Massive Traffic Spikes in Ruby Appli...

Conquering Massive Traffic Spikes in Ruby Applications with Pitchfork

More Decks by Shia

Other Decks in Programming

Featured

Transcript