$30 off During Our Annual Pro Sale. View Details »

Background jobs at scale

Background jobs at scale

Talk at RubyUnconfEU

Kerstin Puschke

May 05, 2018
Tweet

More Decks by Kerstin Puschke

Other Decks in Programming

Transcript

  1. Kerstin Puschke
    @titanoboa42
    Background jobs at scale

    View Slide

  2. View Slide

  3. Scaling applications using
    background jobs
    keeping code simple

    View Slide

  4. Outline

    View Slide

  5. • Introduction to background jobs
    Outline

    View Slide

  6. • Introduction to background jobs
    • Scaling applications
    Outline

    View Slide

  7. • Introduction to background jobs
    • Scaling applications
    • Mastering challenges
    Outline

    View Slide

  8. Outline

    View Slide

  9. • Being RESTful
    Outline

    View Slide

  10. • Being RESTful
    • Background jobs at scale
    Outline

    View Slide

  11. • Being RESTful
    • Background jobs at scale
    • Summary
    Outline

    View Slide

  12. Introduction to background jobs

    View Slide

  13. Decoupling user
    facing request from
    time consuming task
    App
    Server
    Worker

    View Slide

  14. Asynchronous
    communication
    App
    Server
    Message
    Queue
    Worker

    View Slide

  15. Asynchronous
    communication
    App
    Server
    Message
    Queue
    Worker
    Task
    Queue

    View Slide

  16. Asynchronous
    communication
    App
    Server
    Message
    Queue
    Worker Worker
    Worker
    Task
    Queue

    View Slide

  17. Background job
    backend:

    task queue & broker
    App
    Server
    Task
    Queue
    Broker
    Worker Worker
    Worker

    View Slide

  18. Scaling applications

    View Slide

  19. Task
    Queue
    Spikeability
    App
    Server
    Worker

    View Slide

  20. Task
    Queue
    Spikeability
    App
    Server
    Worker Worker
    Worker

    View Slide

  21. Task
    Queue
    Parallelization
    App
    Server
    Worker Worker
    Worker

    View Slide

  22. Task
    Queue
    Retries & Redundancy
    App
    Server
    Worker Worker
    Worker

    View Slide

  23. Low Prio
    Queue
    Prioritization &
    Specialization
    App
    Server
    High Prio
    Queue

    View Slide

  24. Low Prio
    Queue
    Prioritization &
    Specialization
    App
    Server
    Worker
    Worker
    High Prio
    Queue

    View Slide

  25. Low Prio
    Queue
    Prioritization &
    Specialization
    App
    Server
    Worker
    Worker
    High Prio
    Queue

    View Slide

  26. Low Prio
    Queue
    Prioritization &
    Specialization
    App
    Server
    Worker Worker
    Worker
    High Prio
    Queue
    Special
    Queue
    Worker

    View Slide

  27. Mastering challenges

    View Slide

  28. Data inconsistency

    View Slide

  29. Out-of-order delivery

    View Slide

  30. No exactly-once
    delivery

    View Slide

  31. Processing time

    View Slide

  32. Being RESTful

    View Slide

  33. Don’t lie about resource creation

    View Slide

  34. • 202 Accepted
    Don’t lie about resource creation

    View Slide

  35. • 202 Accepted
    • Location: temporary resource
    Don’t lie about resource creation

    View Slide

  36. • 202 Accepted
    • Location: temporary resource
    • 303 See other
    Don’t lie about resource creation

    View Slide

  37. • 202 Accepted
    • Location: temporary resource
    • 303 See other
    • Location: does not represent target resource
    Don’t lie about resource creation

    View Slide

  38. Callers can enforce (a)sync behaviour

    View Slide

  39. • Expect header
    Callers can enforce (a)sync behaviour

    View Slide

  40. • Expect header
    • 202-accepted
    Callers can enforce (a)sync behaviour

    View Slide

  41. • Expect header
    • 202-accepted
    • 200-ok/201-created/204-no-content
    Callers can enforce (a)sync behaviour

    View Slide

  42. • Expect header
    • 202-accepted
    • 200-ok/201-created/204-no-content
    • 417 Expectation failed
    Callers can enforce (a)sync behaviour

    View Slide

  43. Background jobs at scale

    View Slide

  44. DelayedJob is easy to get started

    View Slide

  45. • No additional infrastructure
    DelayedJob is easy to get started

    View Slide

  46. • No additional infrastructure
    • ActiveRecord
    DelayedJob is easy to get started

    View Slide

  47. ActiveJob makes
    swapping backends easy

    View Slide

  48. DelayedJob has downsides at scale

    View Slide

  49. • Overhead of relational database
    DelayedJob has downsides at scale

    View Slide

  50. • Overhead of relational database
    • Workers monitored from outside
    DelayedJob has downsides at scale

    View Slide

  51. • Overhead of relational database
    • Workers monitored from outside
    • Frequently needs workers to restart
    DelayedJob has downsides at scale

    View Slide

  52. • Overhead of relational database
    • Workers monitored from outside
    • Frequently needs workers to restart
    • Hard to keep track
    DelayedJob has downsides at scale

    View Slide

  53. Resque scales

    View Slide

  54. • Redis
    Resque scales

    View Slide

  55. • Redis
    • Parent-child forking for workers
    Resque scales

    View Slide

  56. • Redis
    • Parent-child forking for workers
    • Rarely needs workers to restart
    Resque scales

    View Slide

  57. • Redis
    • Parent-child forking for workers
    • Rarely needs workers to restart
    • Easy to keep track, since workers manage their own state
    Resque scales

    View Slide

  58. • Redis
    • Parent-child forking for workers
    • Rarely needs workers to restart
    • Easy to keep track, since workers manage their own state
    • Memory hungry
    Resque scales

    View Slide

  59. Sidekiq scales

    View Slide

  60. • Resque compatible
    Sidekiq scales

    View Slide

  61. • Resque compatible
    • Worker uses threads instead of child processes
    Sidekiq scales

    View Slide

  62. • Resque compatible
    • Worker uses threads instead of child processes
    • Fast
    Sidekiq scales

    View Slide

  63. • Resque compatible
    • Worker uses threads instead of child processes
    • Fast
    • Less memory hungry
    Sidekiq scales

    View Slide

  64. • Resque compatible
    • Worker uses threads instead of child processes
    • Fast
    • Less memory hungry
    • Requires thread safe code
    Sidekiq scales

    View Slide

  65. Sharding

    View Slide

  66. Database migrations

    View Slide

  67. Backfills & Updates

    View Slide

  68. Large collections

    View Slide

  69. • Split job into
    Large collections

    View Slide

  70. • Split job into
    • Collection
    Large collections

    View Slide

  71. • Split job into
    • Collection
    • Task to be done
    Large collections

    View Slide

  72. • Split job into
    • Collection
    • Task to be done
    • Checkpoint after iteration & requeue
    Large collections

    View Slide

  73. Interruptible job with automatic resuming

    View Slide

  74. • Allows for frequent deployments
    Interruptible job with automatic resuming

    View Slide

  75. • Allows for frequent deployments
    • Disaster prevention
    Interruptible job with automatic resuming

    View Slide

  76. • Allows for frequent deployments
    • Disaster prevention
    • Data integrity
    Interruptible job with automatic resuming

    View Slide

  77. Controlling iterations

    View Slide

  78. • Progress tracking
    Controlling iterations

    View Slide

  79. • Progress tracking
    • Parallelization
    Controlling iterations

    View Slide

  80. Simplicity

    View Slide

  81. Background jobs

    View Slide

  82. • Benefit apps of all sizes
    Background jobs

    View Slide

  83. • Benefit apps of all sizes
    • Require trade-offs
    Background jobs

    View Slide

  84. • Benefit apps of all sizes
    • Require trade-offs
    • Keep code simple at scale
    Background jobs

    View Slide

  85. Thanks!

    Questions?

    @titanoboa42


    https://www.shopify.com/careers

    View Slide