Slide 1

Slide 1 text

CONQUERING MASSIVE TRAFFIC SPIKES IN RUBY APPLICATIONS WITH PITCHFORK PRESENTED BY SANGYONG SIM “SHIA” EURUKO 2025 VIANA DO CASTELO | PORTUGAL

Slide 2

Slide 2 text

Self Introduction Sangyong Sim @ STORES shia @ Internet riseshia @ GitHub Favorite lib: coverage Online profile

Slide 3

Slide 3 text

No content

Slide 4

Slide 4 text

STORES Netshop

Slide 5

Slide 5 text

Merchants of every size Start a sales at specific times Traffic is incredibly difficult to predict Scheduled sales

Slide 6

Slide 6 text

Limited goods on sale! more than 10x

Slide 7

Slide 7 text

Latency degradation 😭 p50 p95 p90

Slide 8

Slide 8 text

Deliver a stable purchasing experience even during massive traffic spikes! Our mission

Slide 9

Slide 9 text

Deliver a stable purchasing experience even during massive traffic spikes, effortlessly!! Our mission

Slide 10

Slide 10 text

Minimize request queuing ~= Provision sufficient web server workers Fix the latency degradation observed at p90 and above Problems Note: Application optimization and cache-based load reduction are out of scope for this talk.

Slide 11

Slide 11 text

Runs on AWS ECS Fargate Capacity managed by Auto Scaling Group (ASG) Built with Ruby on Rails / Unicorn Infrastructure of the service

Slide 12

Slide 12 text

Traffic prediction is impossible → Use historical patterns as baseline When traffic exceeds estimates → Some queuing is inevitable Peak duration: typically under 1 minute, full resolution: within 5 minutes workers requested by ASG 2-5 min later, so it won’t help. Challenge: Provisioning Sufficient Web Server Workers

Slide 13

Slide 13 text

Small-scale spikes Always run with excess capacity for 2-3x spike as buffer Use Fargate Spot for cost optimization Large-scale spikes The rare extreme spikes are usually predictable from their scale, so we manually scale out right before launch Challenge: Provisioning Sufficient Web Server Workers

Slide 14

Slide 14 text

Challenge: Fix the latency degradation p50 p95 p90 Ruby External API DB Average Request Time Breakdown by component

Slide 15

Slide 15 text

🤔

Slide 16

Slide 16 text

Were some workers cold…?

Slide 17

Slide 17 text

Rails apps have many things executed only once or few times after boot up, making first requests be slow: TCP connection establishment In-memory cache JIT compilation (if YJIT is enabled) Define methods by method_missing Action View template compilation … The Cold Worker Problem

Slide 18

Slide 18 text

Let's track which Unicorn workers actually handle requests Endpoint with 100ms response time 8 workers 2 concurrent connections (simulating low load) 10 second test run Compare the number of processed requests per worker Why Only Some Requests?

Slide 19

Slide 19 text

worker 0: 85 worker 1: 86 worker 2: 2 worker 3: 0 worker 4: 0 worker 5: 0 worker 6: 0 worker 7: 0 Note: Reproducible on Linux environments only Why Only Some Requests?

Slide 20

Slide 20 text

Why the Imbalance? Socket epoll Worker 0 Worker 1 Worker n ... Notify Watch Unicorn is a prefork web server At startup, forks configured number of worker processes All workers share a single TCP socket Unicorn uses epoll (or kqueue) What's the notification order?

Slide 21

Slide 21 text

Worker 1 the worker queue from epoll's perspective Queue works like LIFO (Last in First out) -> with just 2 concurrent requests, workers 0 and 1 keep jumping to the front Ref: https://blog.cloudflare.com/the-sad-state-of-linux-socket-balancing/ Why the Imbalance? epoll Notify Worker 2 Worker 0 ... Goes back to the front of the queue

Slide 22

Slide 22 text

Over-provision for traffic spike. Extra workers sit idle since startup. When the sale hits, extra workers get work. Extra workers are cold, so…? So, What happened:

Slide 23

Slide 23 text

Generate real traffic to warm Pre-warm before receiving real traffic (a.k.a service in) Switch to thread-based model, Puma? 🤔 How to warm up All Workers?

Slide 24

Slide 24 text

Shopify's fork of Unicorn It has a feature called “refork”. Pitchfork

Slide 25

Slide 25 text

Uses a worker that has processed a certain number of requests (adjustable) as template to refork all workers Reduces memory usage by maximizing shared memory with Copy-on-Write (CoW) Refork

Slide 26

Slide 26 text

COMMAND \_ pitchfork master \_ (gen:0) mold \_ (gen:0) worker[0] \_ (gen:0) worker[1] \_ (gen:0) worker[2] \_ (gen:0) worker[3] Refork COMMAND \_ pitchfork master \_ (gen:1) mold \_ (gen:1) worker[0] \_ (gen:1) worker[1] \_ (gen:1) worker[2] \_ (gen:1) worker[3] Promote fork

Slide 27

Slide 27 text

If we refork from a warm worker, won't all workers be warmed? Refork

Slide 28

Slide 28 text

LET’S MIGRATE TO PITCHFORK!!!!!

Slide 29

Slide 29 text

forked process will: inherit opened file descriptors stay alive only main thread Caveat 1: Fork safety

Slide 30

Slide 30 text

Is safe to fork with your code after processing some request? If you’re using unicorn already, then might be ok except some gems: grpc, ruby-vips ... See more: https://github.com/Shopify/pitchfork/blob/master/docs/FORK_SAFETY.md Caveat 1: Fork safety

Slide 31

Slide 31 text

Sets a number of requests threshold for triggering an automatic refork. The limit is per-worker, for instance with refork_after “[50]” a refork is triggered once at least one worker processed 50 requests. Each element is a limit for the next generation. “[50, 100, 1000]” a new generation is triggered when a worker has processed 50 requests, then the second generation when a worker from the new generation processed an additional 100 requests and finally after every 1000 requests. Ref: https://github.com/Shopify/pitchfork/blob/master/docs/CONFIGURATION.md#refork_after Caveat 2: refork_after

Slide 32

Slide 32 text

Refork should be worked before sales start Our service prepare extra capacity with scale out. So...? -> Need to set extremely small number at the starting point Caveat 2: refork_after

Slide 33

Slide 33 text

There is no pitchfork-worker-killer, but you can implement it with after_request_complete callback: Ref: https://github.com/Shopify/pitchfork/issues/92 Caveat 3: worker killer check_cycle = 1500 memory_size_limit = 1024 * 1024 # 1GB (unit: KB) after_request_complete do |server, worker, _env| if (worker.requests_count % check_cycle).zero? mem_info = Pitchfork::MemInfo.new(worker.pid) if mem_info.rss > memory_size_limit exit end end end

Slide 34

Slide 34 text

Comparing the same major annual sale event. All graphs use identical Y-axis scales. The only significant change is switching from Unicorn to Pitchfork. Migration Results

Slide 35

Slide 35 text

Migration Results - RPS 2023 2024

Slide 36

Slide 36 text

Migration Results - RPS 2023 2024

Slide 37

Slide 37 text

Migration Results - Latency 2023 2024 p50 p95 p90 p50 p95 p90

Slide 38

Slide 38 text

Migration Results breakdown by component 2023 2024 p50 p95 p90 p50 p95 p90 Ruby External API DB Ruby External API DB

Slide 39

Slide 39 text

Reactions 😂

Slide 40

Slide 40 text

Extra effect of Pitchfork Does It make app be memory efficient?

Slide 41

Slide 41 text

Extra effect of Pitchfork Does It make app be memory efficient? No 😭 Netshop always have idle workers, which makes this less efficient… Or just it doesn’t have much things to share after warmed up.

Slide 42

Slide 42 text

Summary Cold workers cause p90 latency spikes in Unicorn Worker scheduling creates uneven worker distribution Pitchfork's refork enable all workers be warmed 3x traffic handled smoothly after migration

Slide 43

Slide 43 text

Furthermore: The Pitchfork Story Ref: https://byroot.github.io/ruby/performance/2025/03/04/the-pitchfork-story.html

Slide 44

Slide 44 text

THANK YOU