Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Conquering Massive Traffic Spikes in Ruby Appli...

Avatar for Shia Shia
September 25, 2025

Conquering Massive Traffic Spikes in Ruby Applications with Pitchfork

Discover how we tackled extreme traffic spikes on our Rails platform using Pitchfork. Learn to efficiently warm up Rails applications and significantly improve latency during massive sales events.

Avatar for Shia

Shia

September 25, 2025
Tweet

More Decks by Shia

Other Decks in Programming

Transcript

  1. CONQUERING MASSIVE TRAFFIC SPIKES IN RUBY APPLICATIONS WITH PITCHFORK PRESENTED

    BY SANGYONG SIM “SHIA” EURUKO 2025 VIANA DO CASTELO | PORTUGAL
  2. Self Introduction Sangyong Sim @ STORES shia @ Internet riseshia

    @ GitHub Favorite lib: coverage Online profile
  3. Merchants of every size Start a sales at specific times

    Traffic is incredibly difficult to predict Scheduled sales
  4. Minimize request queuing ~= Provision sufficient web server workers Fix

    the latency degradation observed at p90 and above Problems Note: Application optimization and cache-based load reduction are out of scope for this talk.
  5. Runs on AWS ECS Fargate Capacity managed by Auto Scaling

    Group (ASG) Built with Ruby on Rails / Unicorn Infrastructure of the service
  6. Traffic prediction is impossible → Use historical patterns as baseline

    When traffic exceeds estimates → Some queuing is inevitable Peak duration: typically under 1 minute, full resolution: within 5 minutes workers requested by ASG 2-5 min later, so it won’t help. Challenge: Provisioning Sufficient Web Server Workers
  7. Small-scale spikes Always run with excess capacity for 2-3x spike

    as buffer Use Fargate Spot for cost optimization Large-scale spikes The rare extreme spikes are usually predictable from their scale, so we manually scale out right before launch Challenge: Provisioning Sufficient Web Server Workers
  8. Challenge: Fix the latency degradation p50 p95 p90 Ruby External

    API DB Average Request Time Breakdown by component
  9. Rails apps have many things executed only once or few

    times after boot up, making first requests be slow: TCP connection establishment In-memory cache JIT compilation (if YJIT is enabled) Define methods by method_missing Action View template compilation … The Cold Worker Problem
  10. Let's track which Unicorn workers actually handle requests Endpoint with

    100ms response time 8 workers 2 concurrent connections (simulating low load) 10 second test run Compare the number of processed requests per worker Why Only Some Requests?
  11. worker 0: 85 worker 1: 86 worker 2: 2 worker

    3: 0 worker 4: 0 worker 5: 0 worker 6: 0 worker 7: 0 Note: Reproducible on Linux environments only Why Only Some Requests?
  12. Why the Imbalance? Socket epoll Worker 0 Worker 1 Worker

    n ... Notify Watch Unicorn is a prefork web server At startup, forks configured number of worker processes All workers share a single TCP socket Unicorn uses epoll (or kqueue) What's the notification order?
  13. Worker 1 the worker queue from epoll's perspective Queue works

    like LIFO (Last in First out) -> with just 2 concurrent requests, workers 0 and 1 keep jumping to the front Ref: https://blog.cloudflare.com/the-sad-state-of-linux-socket-balancing/ Why the Imbalance? epoll Notify Worker 2 Worker 0 ... Goes back to the front of the queue
  14. Over-provision for traffic spike. Extra workers sit idle since startup.

    When the sale hits, extra workers get work. Extra workers are cold, so…? So, What happened:
  15. Generate real traffic to warm Pre-warm before receiving real traffic

    (a.k.a service in) Switch to thread-based model, Puma? 🤔 How to warm up All Workers?
  16. Uses a worker that has processed a certain number of

    requests (adjustable) as template to refork all workers Reduces memory usage by maximizing shared memory with Copy-on-Write (CoW) Refork
  17. COMMAND \_ pitchfork master \_ (gen:0) mold \_ (gen:0) worker[0]

    \_ (gen:0) worker[1] \_ (gen:0) worker[2] \_ (gen:0) worker[3] Refork COMMAND \_ pitchfork master \_ (gen:1) mold \_ (gen:1) worker[0] \_ (gen:1) worker[1] \_ (gen:1) worker[2] \_ (gen:1) worker[3] Promote fork
  18. Is safe to fork with your code after processing some

    request? If you’re using unicorn already, then might be ok except some gems: grpc, ruby-vips ... See more: https://github.com/Shopify/pitchfork/blob/master/docs/FORK_SAFETY.md Caveat 1: Fork safety
  19. Sets a number of requests threshold for triggering an automatic

    refork. The limit is per-worker, for instance with refork_after “[50]” a refork is triggered once at least one worker processed 50 requests. Each element is a limit for the next generation. “[50, 100, 1000]” a new generation is triggered when a worker has processed 50 requests, then the second generation when a worker from the new generation processed an additional 100 requests and finally after every 1000 requests. Ref: https://github.com/Shopify/pitchfork/blob/master/docs/CONFIGURATION.md#refork_after Caveat 2: refork_after
  20. Refork should be worked before sales start Our service prepare

    extra capacity with scale out. So...? -> Need to set extremely small number at the starting point Caveat 2: refork_after
  21. There is no pitchfork-worker-killer, but you can implement it with

    after_request_complete callback: Ref: https://github.com/Shopify/pitchfork/issues/92 Caveat 3: worker killer check_cycle = 1500 memory_size_limit = 1024 * 1024 # 1GB (unit: KB) after_request_complete do |server, worker, _env| if (worker.requests_count % check_cycle).zero? mem_info = Pitchfork::MemInfo.new(worker.pid) if mem_info.rss > memory_size_limit exit end end end
  22. Comparing the same major annual sale event. All graphs use

    identical Y-axis scales. The only significant change is switching from Unicorn to Pitchfork. Migration Results
  23. Migration Results breakdown by component 2023 2024 p50 p95 p90

    p50 p95 p90 Ruby External API DB Ruby External API DB
  24. Extra effect of Pitchfork Does It make app be memory

    efficient? No 😭 Netshop always have idle workers, which makes this less efficient… Or just it doesn’t have much things to share after warmed up.
  25. Summary Cold workers cause p90 latency spikes in Unicorn Worker

    scheduling creates uneven worker distribution Pitchfork's refork enable all workers be warmed 3x traffic handled smoothly after migration