Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Background jobs at scale (Montreal.rb)

Background jobs at scale (Montreal.rb)

Talks at Montreal.rb

Kerstin Puschke

February 26, 2019
Tweet

More Decks by Kerstin Puschke

Other Decks in Programming

Transcript

  1. Kerstin Puschke
    @titanoboa42
    Background jobs at scale

    View full-size slide

  2. @titanoboa42
    Scaling applications using
    background jobs
    keeping code simple

    View full-size slide

  3. @titanoboa42
    Outline

    View full-size slide

  4. @titanoboa42
    • Introduction to background jobs
    Outline

    View full-size slide

  5. @titanoboa42
    • Introduction to background jobs
    • Features
    Outline

    View full-size slide

  6. @titanoboa42
    • Introduction to background jobs
    • Features
    • Mastering challenges
    Outline

    View full-size slide

  7. @titanoboa42
    Outline

    View full-size slide

  8. @titanoboa42
    • Being RESTful
    Outline

    View full-size slide

  9. @titanoboa42
    • Being RESTful
    • Background jobs at scale
    Outline

    View full-size slide

  10. @titanoboa42
    • Being RESTful
    • Background jobs at scale
    • Summary
    Outline

    View full-size slide

  11. @titanoboa42
    Introduction to background jobs

    View full-size slide

  12. @titanoboa42
    Background job:

    Work to be done later
    App
    Server
    Worker

    View full-size slide

  13. @titanoboa42
    Asynchronous
    communication
    App
    Server
    Message
    Queue
    Worker

    View full-size slide

  14. @titanoboa42
    Asynchronous
    communication
    App
    Server
    Message
    Queue
    Worker
    Task
    Queue

    View full-size slide

  15. @titanoboa42
    Asynchronous
    communication
    App
    Server
    Message
    Queue
    Worker Worker
    Worker
    Task
    Queue

    View full-size slide

  16. @titanoboa42
    Background job backend:

    task queue & broker

    View full-size slide

  17. @titanoboa42
    Encapsulating

    async communication

    View full-size slide

  18. @titanoboa42
    Features

    View full-size slide

  19. @titanoboa42
    Task
    Queue
    Response times
    App
    Server
    Worker

    View full-size slide

  20. @titanoboa42
    Task
    Queue
    Spikeability
    App
    Server
    Worker

    View full-size slide

  21. @titanoboa42
    Task
    Queue
    Parallelization
    App
    Server
    Worker Worker
    Worker

    View full-size slide

  22. @titanoboa42
    Task
    Queue
    Retries
    App
    Server
    Worker Worker
    Worker

    View full-size slide

  23. @titanoboa42
    Prioritization
    App
    Server
    Worker Worker
    High Prio
    Queue
    Low Prio
    Queue

    View full-size slide

  24. @titanoboa42
    Prioritization
    App
    Server
    Worker Worker
    High Prio
    Queue
    Low Prio
    Queue

    View full-size slide

  25. @titanoboa42
    Prioritization
    App
    Server
    Worker Worker
    High Prio
    Queue
    Low Prio
    Queue

    View full-size slide

  26. @titanoboa42
    Mastering challenges

    View full-size slide

  27. @titanoboa42
    No exactly once delivery

    View full-size slide

  28. @titanoboa42
    • “At least” vs. “at most” once delivery
    No exactly once delivery

    View full-size slide

  29. @titanoboa42
    • “At least” vs. “at most” once delivery
    • Idempotent jobs & at least once delivery
    No exactly once delivery

    View full-size slide

  30. @titanoboa42
    Out of order delivery

    View full-size slide

  31. @titanoboa42
    • If order matters, queue sequentially
    Out of order delivery

    View full-size slide

  32. @titanoboa42
    • If order matters, queue sequentially
    • First job queues follow up jobs
    Out of order delivery

    View full-size slide

  33. @titanoboa42
    Job queued and processed by different versions

    View full-size slide

  34. @titanoboa42
    • No breaking changes to job parameters
    Job queued and processed by different versions

    View full-size slide

  35. @titanoboa42
    • No breaking changes to job parameters
    • Changes need to be backwards compatible
    until legacy jobs have been processed
    Job queued and processed by different versions

    View full-size slide

  36. @titanoboa42
    Eventual consistency (at best)

    View full-size slide

  37. @titanoboa42
    • Prepare for inconsistency
    Eventual consistency (at best)

    View full-size slide

  38. @titanoboa42
    • Prepare for inconsistency
    • Trade-off lack of consistency guarantees vs.
    benefits of background jobs
    Eventual consistency (at best)

    View full-size slide

  39. @titanoboa42
    Non-transactional queuing

    View full-size slide

  40. @titanoboa42
    • Don’t queue from within a db transaction
    Non-transactional queuing

    View full-size slide

  41. @titanoboa42
    • Don’t queue from within a db transaction
    • Job runs before commit, or if rollback
    Non-transactional queuing

    View full-size slide

  42. @titanoboa42
    • Don’t queue from within a db transaction
    • Job runs before commit, or if rollback
    • Commit before queuing or 

    stage transactionally
    Non-transactional queuing

    View full-size slide

  43. @titanoboa42
    Being RESTful

    View full-size slide

  44. @titanoboa42
    Don’t lie about resource creation

    View full-size slide

  45. @titanoboa42
    • 202 Accepted
    Don’t lie about resource creation

    View full-size slide

  46. @titanoboa42
    • 202 Accepted
    • Location: temporary resource
    Don’t lie about resource creation

    View full-size slide

  47. @titanoboa42
    • 202 Accepted
    • Location: temporary resource
    • 303 See other
    Don’t lie about resource creation

    View full-size slide

  48. @titanoboa42
    • 202 Accepted
    • Location: temporary resource
    • 303 See other
    • Location: does not represent target resource
    Don’t lie about resource creation

    View full-size slide

  49. @titanoboa42
    Callers can enforce (a)sync behaviour

    View full-size slide

  50. @titanoboa42
    • Expect header
    Callers can enforce (a)sync behaviour

    View full-size slide

  51. @titanoboa42
    • Expect header
    • 202-accepted
    Callers can enforce (a)sync behaviour

    View full-size slide

  52. @titanoboa42
    • Expect header
    • 202-accepted
    • 200-ok/201-created/204-no-content
    Callers can enforce (a)sync behaviour

    View full-size slide

  53. @titanoboa42
    • Expect header
    • 202-accepted
    • 200-ok/201-created/204-no-content
    • 417 Expectation failed
    Callers can enforce (a)sync behaviour

    View full-size slide

  54. @titanoboa42
    Background jobs at scale

    View full-size slide

  55. @titanoboa42
    DelayedJob is easy to get started

    View full-size slide

  56. @titanoboa42
    • No additional infrastructure
    DelayedJob is easy to get started

    View full-size slide

  57. @titanoboa42
    • No additional infrastructure
    • ActiveRecord
    DelayedJob is easy to get started

    View full-size slide

  58. @titanoboa42
    ActiveJob makes
    swapping backends easy

    View full-size slide

  59. @titanoboa42
    DelayedJob issues

    View full-size slide

  60. @titanoboa42
    • Overhead of relational database
    DelayedJob issues

    View full-size slide

  61. @titanoboa42
    • Overhead of relational database
    • Workers monitored from outside
    DelayedJob issues

    View full-size slide

  62. @titanoboa42
    • Overhead of relational database
    • Workers monitored from outside
    • Frequently needs workers to restart
    DelayedJob issues

    View full-size slide

  63. @titanoboa42
    Resque scales

    View full-size slide

  64. @titanoboa42
    • Redis - no relational db
    Resque scales

    View full-size slide

  65. @titanoboa42
    • Redis - no relational db
    • Parent-child forking for workers
    Resque scales

    View full-size slide

  66. @titanoboa42
    • Redis - no relational db
    • Parent-child forking for workers
    • Rarely needs workers to restart
    Resque scales

    View full-size slide

  67. @titanoboa42
    • Redis - no relational db
    • Parent-child forking for workers
    • Rarely needs workers to restart
    • Workers manage their own state
    Resque scales

    View full-size slide

  68. @titanoboa42
    Resque issues

    View full-size slide

  69. @titanoboa42
    • Child processes
    Resque issues

    View full-size slide

  70. @titanoboa42
    • Child processes
    • Memory hungry and slow
    Resque issues

    View full-size slide

  71. @titanoboa42
    Sidekiq scales

    View full-size slide

  72. @titanoboa42
    • Redis - no relational db
    Sidekiq scales

    View full-size slide

  73. @titanoboa42
    • Redis - no relational db
    • Threads instead of child processes
    Sidekiq scales

    View full-size slide

  74. @titanoboa42
    • Redis - no relational db
    • Threads instead of child processes
    • Fast and less memory hungry
    Sidekiq scales

    View full-size slide

  75. @titanoboa42
    Sidekiq issues

    View full-size slide

  76. @titanoboa42
    • Requires thread safe code
    Sidekiq issues

    View full-size slide

  77. @titanoboa42
    Long running jobs - Resque

    View full-size slide

  78. @titanoboa42
    • Prevent worker shutdown
    Long running jobs - Resque

    View full-size slide

  79. @titanoboa42
    • Prevent worker shutdown
    • No deployments
    Long running jobs - Resque

    View full-size slide

  80. @titanoboa42
    • Prevent worker shutdown
    • No deployments
    • Not cloud-friendly
    Long running jobs - Resque

    View full-size slide

  81. @titanoboa42
    • Aborted and requeued on shutdown
    Long running jobs - Sidekiq

    View full-size slide

  82. @titanoboa42
    • Aborted and requeued on shutdown
    • Job may not finish before being aborted again
    Long running jobs - Sidekiq

    View full-size slide

  83. @titanoboa42
    github.com

    /Shopify/job-iteration

    View full-size slide

  84. @titanoboa42
    Large collections

    View full-size slide

  85. @titanoboa42
    • Split job into collection and task to be done
    Large collections

    View full-size slide

  86. @titanoboa42
    • Split job into collection and task to be done
    • Checkpoint after iteration & requeue
    Large collections

    View full-size slide

  87. @titanoboa42
    Interruptible job with automatic resuming

    View full-size slide

  88. @titanoboa42
    • Shutdown workers anytime
    Interruptible job with automatic resuming

    View full-size slide

  89. @titanoboa42
    • Shutdown workers anytime
    • Disaster prevention
    Interruptible job with automatic resuming

    View full-size slide

  90. @titanoboa42
    • Shutdown workers anytime
    • Disaster prevention
    • Data integrity
    Interruptible job with automatic resuming

    View full-size slide

  91. @titanoboa42
    Abstracting scaling issues

    simplifies 

    concrete background jobs

    View full-size slide

  92. @titanoboa42
    github.com

    /Shopify/job-iteration

    View full-size slide

  93. @titanoboa42
    Background jobs

    View full-size slide

  94. @titanoboa42
    • Benefit apps of all sizes
    Background jobs

    View full-size slide

  95. @titanoboa42
    • Benefit apps of all sizes
    • Require trade-offs
    Background jobs

    View full-size slide

  96. @titanoboa42
    • Benefit apps of all sizes
    • Require trade-offs
    • Keep code simple at scale
    Background jobs

    View full-size slide

  97. Thanks!

    Questions?

    @titanoboa42


    https://www.shopify.com/careers

    View full-size slide