Background jobs at scale

Kerstin Puschke @titanoboa42 Background jobs at scale

Scaling applications using background jobs keeping code simple

Outline

• Introduction to background jobs Outline

• Introduction to background jobs • Scaling applications Outline

• Introduction to background jobs • Scaling applications • Mastering
challenges Outline

Outline

• Being RESTful Outline

• Being RESTful • Background jobs at scale Outline

• Being RESTful • Background jobs at scale • Summary
Outline

Introduction to background jobs

Decoupling user facing request from time consuming task App Server
Worker

Asynchronous communication App Server Message Queue Worker

Asynchronous communication App Server Message Queue Worker Task Queue

Asynchronous communication App Server Message Queue Worker Worker Worker Task
Queue

Background job backend:  task queue & broker App Server Task
Queue Broker Worker Worker Worker

Scaling applications

Task Queue Spikeability App Server Worker

Task Queue Spikeability App Server Worker Worker Worker

Task Queue Parallelization App Server Worker Worker Worker

Task Queue Retries & Redundancy App Server Worker Worker Worker

Low Prio Queue Prioritization & Specialization App Server High Prio
Queue

Low Prio Queue Prioritization & Specialization App Server Worker Worker
High Prio Queue

Low Prio Queue Prioritization & Specialization App Server Worker Worker
Worker High Prio Queue Special Queue Worker

Mastering challenges

Data inconsistency

Out-of-order delivery

No exactly-once delivery

Processing time

Being RESTful

Don’t lie about resource creation

• 202 Accepted Don’t lie about resource creation

• 202 Accepted • Location: temporary resource Don’t lie about
resource creation

• 202 Accepted • Location: temporary resource • 303 See
other Don’t lie about resource creation

• 202 Accepted • Location: temporary resource • 303 See
other • Location: does not represent target resource Don’t lie about resource creation

Callers can enforce (a)sync behaviour

• Expect header Callers can enforce (a)sync behaviour

• Expect header • 202-accepted Callers can enforce (a)sync behaviour

• Expect header • 202-accepted • 200-ok/201-created/204-no-content Callers can enforce
(a)sync behaviour

• Expect header • 202-accepted • 200-ok/201-created/204-no-content • 417 Expectation
failed Callers can enforce (a)sync behaviour

Background jobs at scale

DelayedJob is easy to get started

• No additional infrastructure DelayedJob is easy to get started

• No additional infrastructure • ActiveRecord DelayedJob is easy to
get started

ActiveJob makes swapping backends easy

DelayedJob has downsides at scale

• Overhead of relational database DelayedJob has downsides at scale

• Overhead of relational database • Workers monitored from outside
DelayedJob has downsides at scale

• Frequently needs workers to restart DelayedJob has downsides at scale

• Frequently needs workers to restart • Hard to keep track DelayedJob has downsides at scale

Resque scales

• Redis Resque scales

• Redis • Parent-child forking for workers Resque scales

• Redis • Parent-child forking for workers • Rarely needs
workers to restart Resque scales

workers to restart • Easy to keep track, since workers manage their own state Resque scales

workers to restart • Easy to keep track, since workers manage their own state • Memory hungry Resque scales

Sidekiq scales

• Resque compatible Sidekiq scales

• Resque compatible • Worker uses threads instead of child
processes Sidekiq scales

processes • Fast Sidekiq scales

processes • Fast • Less memory hungry Sidekiq scales

processes • Fast • Less memory hungry • Requires thread safe code Sidekiq scales

Sharding

Database migrations

Backfills & Updates

Large collections

• Split job into Large collections

• Split job into • Collection Large collections

• Split job into • Collection • Task to be
done Large collections

• Split job into • Collection • Task to be
done • Checkpoint after iteration & requeue Large collections

Interruptible job with automatic resuming

• Allows for frequent deployments Interruptible job with automatic resuming

• Allows for frequent deployments • Disaster prevention Interruptible job
with automatic resuming

• Allows for frequent deployments • Disaster prevention • Data
integrity Interruptible job with automatic resuming

Controlling iterations

• Progress tracking Controlling iterations

• Progress tracking • Parallelization Controlling iterations

Simplicity

Background jobs

• Benefit apps of all sizes Background jobs

• Benefit apps of all sizes • Require trade-oﬀs Background
jobs

• Benefit apps of all sizes • Require trade-oﬀs •
Keep code simple at scale Background jobs

Thanks!  Questions?  @titanoboa42    https://www.shopify.com/careers

Background jobs at scale

Background jobs at scale

More Decks by Kerstin Puschke

Other Decks in Programming

Featured

Transcript