Slide 1

Slide 1 text

Kerstin Puschke @titanoboa42 Background jobs at scale

Slide 2

Slide 2 text

No content

Slide 3

Slide 3 text

Scaling applications using background jobs keeping code simple

Slide 4

Slide 4 text

Outline

Slide 5

Slide 5 text

• Introduction to background jobs Outline

Slide 6

Slide 6 text

• Introduction to background jobs • Scaling applications Outline

Slide 7

Slide 7 text

• Introduction to background jobs • Scaling applications • Mastering challenges Outline

Slide 8

Slide 8 text

Outline

Slide 9

Slide 9 text

• Being RESTful Outline

Slide 10

Slide 10 text

• Being RESTful • Background jobs at scale Outline

Slide 11

Slide 11 text

• Being RESTful • Background jobs at scale • Summary Outline

Slide 12

Slide 12 text

Introduction to background jobs

Slide 13

Slide 13 text

Decoupling user facing request from time consuming task App Server Worker

Slide 14

Slide 14 text

Asynchronous communication App Server Message Queue Worker

Slide 15

Slide 15 text

Asynchronous communication App Server Message Queue Worker Task Queue

Slide 16

Slide 16 text

Asynchronous communication App Server Message Queue Worker Worker Worker Task Queue

Slide 17

Slide 17 text

Background job backend:
 task queue & broker App Server Task Queue Broker Worker Worker Worker

Slide 18

Slide 18 text

Scaling applications

Slide 19

Slide 19 text

Task Queue Spikeability App Server Worker

Slide 20

Slide 20 text

Task Queue Spikeability App Server Worker Worker Worker

Slide 21

Slide 21 text

Task Queue Parallelization App Server Worker Worker Worker

Slide 22

Slide 22 text

Task Queue Retries & Redundancy App Server Worker Worker Worker

Slide 23

Slide 23 text

Low Prio Queue Prioritization & Specialization App Server High Prio Queue

Slide 24

Slide 24 text

Low Prio Queue Prioritization & Specialization App Server Worker Worker High Prio Queue

Slide 25

Slide 25 text

Low Prio Queue Prioritization & Specialization App Server Worker Worker High Prio Queue

Slide 26

Slide 26 text

Low Prio Queue Prioritization & Specialization App Server Worker Worker Worker High Prio Queue Special Queue Worker

Slide 27

Slide 27 text

Mastering challenges

Slide 28

Slide 28 text

Data inconsistency

Slide 29

Slide 29 text

Out-of-order delivery

Slide 30

Slide 30 text

No exactly-once delivery

Slide 31

Slide 31 text

Processing time

Slide 32

Slide 32 text

Being RESTful

Slide 33

Slide 33 text

Don’t lie about resource creation

Slide 34

Slide 34 text

• 202 Accepted Don’t lie about resource creation

Slide 35

Slide 35 text

• 202 Accepted • Location: temporary resource Don’t lie about resource creation

Slide 36

Slide 36 text

• 202 Accepted • Location: temporary resource • 303 See other Don’t lie about resource creation

Slide 37

Slide 37 text

• 202 Accepted • Location: temporary resource • 303 See other • Location: does not represent target resource Don’t lie about resource creation

Slide 38

Slide 38 text

Callers can enforce (a)sync behaviour

Slide 39

Slide 39 text

• Expect header Callers can enforce (a)sync behaviour

Slide 40

Slide 40 text

• Expect header • 202-accepted Callers can enforce (a)sync behaviour

Slide 41

Slide 41 text

• Expect header • 202-accepted • 200-ok/201-created/204-no-content Callers can enforce (a)sync behaviour

Slide 42

Slide 42 text

• Expect header • 202-accepted • 200-ok/201-created/204-no-content • 417 Expectation failed Callers can enforce (a)sync behaviour

Slide 43

Slide 43 text

Background jobs at scale

Slide 44

Slide 44 text

DelayedJob is easy to get started

Slide 45

Slide 45 text

• No additional infrastructure DelayedJob is easy to get started

Slide 46

Slide 46 text

• No additional infrastructure • ActiveRecord DelayedJob is easy to get started

Slide 47

Slide 47 text

ActiveJob makes swapping backends easy

Slide 48

Slide 48 text

DelayedJob has downsides at scale

Slide 49

Slide 49 text

• Overhead of relational database DelayedJob has downsides at scale

Slide 50

Slide 50 text

• Overhead of relational database • Workers monitored from outside DelayedJob has downsides at scale

Slide 51

Slide 51 text

• Overhead of relational database • Workers monitored from outside • Frequently needs workers to restart DelayedJob has downsides at scale

Slide 52

Slide 52 text

• Overhead of relational database • Workers monitored from outside • Frequently needs workers to restart • Hard to keep track DelayedJob has downsides at scale

Slide 53

Slide 53 text

Resque scales

Slide 54

Slide 54 text

• Redis Resque scales

Slide 55

Slide 55 text

• Redis • Parent-child forking for workers Resque scales

Slide 56

Slide 56 text

• Redis • Parent-child forking for workers • Rarely needs workers to restart Resque scales

Slide 57

Slide 57 text

• Redis • Parent-child forking for workers • Rarely needs workers to restart • Easy to keep track, since workers manage their own state Resque scales

Slide 58

Slide 58 text

• Redis • Parent-child forking for workers • Rarely needs workers to restart • Easy to keep track, since workers manage their own state • Memory hungry Resque scales

Slide 59

Slide 59 text

Sidekiq scales

Slide 60

Slide 60 text

• Resque compatible Sidekiq scales

Slide 61

Slide 61 text

• Resque compatible • Worker uses threads instead of child processes Sidekiq scales

Slide 62

Slide 62 text

• Resque compatible • Worker uses threads instead of child processes • Fast Sidekiq scales

Slide 63

Slide 63 text

• Resque compatible • Worker uses threads instead of child processes • Fast • Less memory hungry Sidekiq scales

Slide 64

Slide 64 text

• Resque compatible • Worker uses threads instead of child processes • Fast • Less memory hungry • Requires thread safe code Sidekiq scales

Slide 65

Slide 65 text

Sharding

Slide 66

Slide 66 text

Database migrations

Slide 67

Slide 67 text

Backfills & Updates

Slide 68

Slide 68 text

Large collections

Slide 69

Slide 69 text

• Split job into Large collections

Slide 70

Slide 70 text

• Split job into • Collection Large collections

Slide 71

Slide 71 text

• Split job into • Collection • Task to be done Large collections

Slide 72

Slide 72 text

• Split job into • Collection • Task to be done • Checkpoint after iteration & requeue Large collections

Slide 73

Slide 73 text

Interruptible job with automatic resuming

Slide 74

Slide 74 text

• Allows for frequent deployments Interruptible job with automatic resuming

Slide 75

Slide 75 text

• Allows for frequent deployments • Disaster prevention Interruptible job with automatic resuming

Slide 76

Slide 76 text

• Allows for frequent deployments • Disaster prevention • Data integrity Interruptible job with automatic resuming

Slide 77

Slide 77 text

Controlling iterations

Slide 78

Slide 78 text

• Progress tracking Controlling iterations

Slide 79

Slide 79 text

• Progress tracking • Parallelization Controlling iterations

Slide 80

Slide 80 text

Simplicity

Slide 81

Slide 81 text

Background jobs

Slide 82

Slide 82 text

• Benefit apps of all sizes Background jobs

Slide 83

Slide 83 text

• Benefit apps of all sizes • Require trade-offs Background jobs

Slide 84

Slide 84 text

• Benefit apps of all sizes • Require trade-offs • Keep code simple at scale Background jobs

Slide 85

Slide 85 text

Thanks!
 Questions?
 @titanoboa42
 
 https://www.shopify.com/careers