Background jobs at scale

Background jobs at scale

Talk at RubyUnconfEU

5e8e44a4f6632772c47925006aff31d9?s=128

Kerstin Puschke

May 05, 2018
Tweet

Transcript

  1. Kerstin Puschke @titanoboa42 Background jobs at scale

  2. None
  3. Scaling applications using background jobs keeping code simple

  4. Outline

  5. • Introduction to background jobs Outline

  6. • Introduction to background jobs • Scaling applications Outline

  7. • Introduction to background jobs • Scaling applications • Mastering

    challenges Outline
  8. Outline

  9. • Being RESTful Outline

  10. • Being RESTful • Background jobs at scale Outline

  11. • Being RESTful • Background jobs at scale • Summary

    Outline
  12. Introduction to background jobs

  13. Decoupling user facing request from time consuming task App Server

    Worker
  14. Asynchronous communication App Server Message Queue Worker

  15. Asynchronous communication App Server Message Queue Worker Task Queue

  16. Asynchronous communication App Server Message Queue Worker Worker Worker Task

    Queue
  17. Background job backend:
 task queue & broker App Server Task

    Queue Broker Worker Worker Worker
  18. Scaling applications

  19. Task Queue Spikeability App Server Worker

  20. Task Queue Spikeability App Server Worker Worker Worker

  21. Task Queue Parallelization App Server Worker Worker Worker

  22. Task Queue Retries & Redundancy App Server Worker Worker Worker

  23. Low Prio Queue Prioritization & Specialization App Server High Prio

    Queue
  24. Low Prio Queue Prioritization & Specialization App Server Worker Worker

    High Prio Queue
  25. Low Prio Queue Prioritization & Specialization App Server Worker Worker

    High Prio Queue
  26. Low Prio Queue Prioritization & Specialization App Server Worker Worker

    Worker High Prio Queue Special Queue Worker
  27. Mastering challenges

  28. Data inconsistency

  29. Out-of-order delivery

  30. No exactly-once delivery

  31. Processing time

  32. Being RESTful

  33. Don’t lie about resource creation

  34. • 202 Accepted Don’t lie about resource creation

  35. • 202 Accepted • Location: temporary resource Don’t lie about

    resource creation
  36. • 202 Accepted • Location: temporary resource • 303 See

    other Don’t lie about resource creation
  37. • 202 Accepted • Location: temporary resource • 303 See

    other • Location: does not represent target resource Don’t lie about resource creation
  38. Callers can enforce (a)sync behaviour

  39. • Expect header Callers can enforce (a)sync behaviour

  40. • Expect header • 202-accepted Callers can enforce (a)sync behaviour

  41. • Expect header • 202-accepted • 200-ok/201-created/204-no-content Callers can enforce

    (a)sync behaviour
  42. • Expect header • 202-accepted • 200-ok/201-created/204-no-content • 417 Expectation

    failed Callers can enforce (a)sync behaviour
  43. Background jobs at scale

  44. DelayedJob is easy to get started

  45. • No additional infrastructure DelayedJob is easy to get started

  46. • No additional infrastructure • ActiveRecord DelayedJob is easy to

    get started
  47. ActiveJob makes swapping backends easy

  48. DelayedJob has downsides at scale

  49. • Overhead of relational database DelayedJob has downsides at scale

  50. • Overhead of relational database • Workers monitored from outside

    DelayedJob has downsides at scale
  51. • Overhead of relational database • Workers monitored from outside

    • Frequently needs workers to restart DelayedJob has downsides at scale
  52. • Overhead of relational database • Workers monitored from outside

    • Frequently needs workers to restart • Hard to keep track DelayedJob has downsides at scale
  53. Resque scales

  54. • Redis Resque scales

  55. • Redis • Parent-child forking for workers Resque scales

  56. • Redis • Parent-child forking for workers • Rarely needs

    workers to restart Resque scales
  57. • Redis • Parent-child forking for workers • Rarely needs

    workers to restart • Easy to keep track, since workers manage their own state Resque scales
  58. • Redis • Parent-child forking for workers • Rarely needs

    workers to restart • Easy to keep track, since workers manage their own state • Memory hungry Resque scales
  59. Sidekiq scales

  60. • Resque compatible Sidekiq scales

  61. • Resque compatible • Worker uses threads instead of child

    processes Sidekiq scales
  62. • Resque compatible • Worker uses threads instead of child

    processes • Fast Sidekiq scales
  63. • Resque compatible • Worker uses threads instead of child

    processes • Fast • Less memory hungry Sidekiq scales
  64. • Resque compatible • Worker uses threads instead of child

    processes • Fast • Less memory hungry • Requires thread safe code Sidekiq scales
  65. Sharding

  66. Database migrations

  67. Backfills & Updates

  68. Large collections

  69. • Split job into Large collections

  70. • Split job into • Collection Large collections

  71. • Split job into • Collection • Task to be

    done Large collections
  72. • Split job into • Collection • Task to be

    done • Checkpoint after iteration & requeue Large collections
  73. Interruptible job with automatic resuming

  74. • Allows for frequent deployments Interruptible job with automatic resuming

  75. • Allows for frequent deployments • Disaster prevention Interruptible job

    with automatic resuming
  76. • Allows for frequent deployments • Disaster prevention • Data

    integrity Interruptible job with automatic resuming
  77. Controlling iterations

  78. • Progress tracking Controlling iterations

  79. • Progress tracking • Parallelization Controlling iterations

  80. Simplicity

  81. Background jobs

  82. • Benefit apps of all sizes Background jobs

  83. • Benefit apps of all sizes • Require trade-offs Background

    jobs
  84. • Benefit apps of all sizes • Require trade-offs •

    Keep code simple at scale Background jobs
  85. Thanks!
 Questions?
 @titanoboa42
 
 https://www.shopify.com/careers