Upgrade to Pro — share decks privately, control downloads, hide ads and more …

High availability by offloading work - background jobs, message queues, or Kafka

High availability by offloading work - background jobs, message queues, or Kafka

Talk at RubyConf Colombia 2019

Kerstin Puschke

September 20, 2019
Tweet

More Decks by Kerstin Puschke

Other Decks in Technology

Transcript

  1. Kerstin Puschke
    @titanoboa42
    High availability by offloading work -
    background jobs, message queues, or Kafka

    View Slide

  2. @titanoboa42

    View Slide

  3. @titanoboa42
    Different approaches to offload work to
    ensure high availability

    View Slide

  4. @titanoboa42
    • High availability & offloading work
    Outline

    View Slide

  5. @titanoboa42
    • High availability & offloading work
    • Background jobs
    Outline

    View Slide

  6. @titanoboa42
    • High availability & offloading work
    • Background jobs
    • Message oriented middleware
    Outline

    View Slide

  7. @titanoboa42
    • Event log
    Outline

    View Slide

  8. @titanoboa42
    • Event log
    • Summary
    Outline

    View Slide

  9. @titanoboa42
    High availability & offloading work

    View Slide

  10. @titanoboa42
    High Availability

    View Slide

  11. @titanoboa42

    users

    can interact

    with the system
    High Availability

    View Slide

  12. @titanoboa42

    users

    can interact meaningfully

    with the system
    High Availability

    View Slide

  13. @titanoboa42
    community of

    users

    can interact meaningfully

    with the system
    High Availability

    View Slide

  14. @titanoboa42
    community of

    users

    can interact meaningfully

    with the system whenever needed
    High Availability

    View Slide

  15. @titanoboa42
    community of

    users

    can interact meaningfully

    with the system whenever needed
    High Availability
    community of

    users

    can interact meaningfully

    with the system whenever needed

    View Slide

  16. @titanoboa42
    Background jobs

    View Slide

  17. @titanoboa42
    • Resque
    Background job backends

    View Slide

  18. @titanoboa42
    • Resque
    • Sidekiq
    Background job backends

    View Slide

  19. @titanoboa42
    • Resque
    • Sidekiq
    • …
    Background job backends

    View Slide

  20. @titanoboa42
    Background job:

    Unit of work 

    to be done later
    App
    Server
    Worker

    View Slide

  21. @titanoboa42
    Asynchronous
    communication
    App
    Server
    Message
    Queue
    Worker

    View Slide

  22. @titanoboa42
    Asynchronous
    communication
    App
    Server
    Message
    Queue
    Worker
    Task
    Queue

    View Slide

  23. @titanoboa42
    Asynchronous
    communication
    App
    Server
    Message
    Queue
    Worker Worker
    Worker
    Task
    Queue

    View Slide

  24. @titanoboa42
    Background job backend:

    task queue & broker

    View Slide

  25. @titanoboa42
    Encapsulating

    async communication

    View Slide

  26. @titanoboa42
    Features

    View Slide

  27. @titanoboa42
    Task
    Queue
    Response times
    App
    Server
    Worker

    View Slide

  28. @titanoboa42
    Task
    Queue
    Spikeability
    App
    Server
    Worker

    View Slide

  29. @titanoboa42
    Task
    Queue
    Parallelization
    App
    Server
    Worker Worker
    Worker

    View Slide

  30. @titanoboa42
    Task
    Queue
    Retries
    App
    Server
    Worker Worker
    Worker

    View Slide

  31. @titanoboa42
    Mastering challenges

    View Slide

  32. @titanoboa42
    Job queued and processed by different versions

    View Slide

  33. @titanoboa42
    • No breaking changes to job parameters
    Job queued and processed by different versions

    View Slide

  34. @titanoboa42
    • No breaking changes to job parameters
    • Changes need to be backwards compatible
    until legacy jobs have been processed
    Job queued and processed by different versions

    View Slide

  35. @titanoboa42
    No exactly once delivery

    View Slide

  36. @titanoboa42
    • “At least” vs. “at most” once delivery
    No exactly once delivery

    View Slide

  37. @titanoboa42
    • “At least” vs. “at most” once delivery
    • Idempotent jobs & at least once delivery
    No exactly once delivery

    View Slide

  38. @titanoboa42
    Non-transactional queuing

    View Slide

  39. @titanoboa42
    • Don’t queue from within a db transaction
    Non-transactional queuing

    View Slide

  40. @titanoboa42
    • Don’t queue from within a db transaction
    • Job runs before commit, or in case of rollback
    Non-transactional queuing

    View Slide

  41. @titanoboa42
    • Don’t queue from within a db transaction
    • Job runs before commit, or in case of rollback
    • Commit first: Job not guaranteed to be queued
    Non-transactional queuing

    View Slide

  42. @titanoboa42
    Non-transactional queuing

    View Slide

  43. @titanoboa42
    • Stage transactionally
    Non-transactional queuing

    View Slide

  44. @titanoboa42
    • Stage transactionally
    • Scheduler queues job, updates staging data
    Non-transactional queuing

    View Slide

  45. @titanoboa42
    Local transactions

    View Slide

  46. @titanoboa42
    • Eventual consistency at best
    Local transactions

    View Slide

  47. @titanoboa42
    • Eventual consistency at best
    • SAGA command/orchestration
    Local transactions

    View Slide

  48. @titanoboa42
    Out of order delivery

    View Slide

  49. @titanoboa42
    • SAGA events/choreography: jobs queue jobs
    Out of order delivery

    View Slide

  50. @titanoboa42
    • SAGA events/choreography: jobs queue jobs
    • easy to build, hard to maintain or debug
    Out of order delivery

    View Slide

  51. @titanoboa42
    • SAGA events/choreography: jobs queue jobs
    • easy to build, hard to maintain or debug
    • SAGA command/orchestrator
    Out of order delivery

    View Slide

  52. @titanoboa42
    Long running jobs - Resque

    View Slide

  53. @titanoboa42
    • Prevent worker shutdown
    Long running jobs - Resque

    View Slide

  54. @titanoboa42
    • Prevent worker shutdown
    • No deployments
    Long running jobs - Resque

    View Slide

  55. @titanoboa42
    • Prevent worker shutdown
    • No deployments
    • Not cloud-friendly
    Long running jobs - Resque

    View Slide

  56. @titanoboa42
    • Aborted and requeued
    Long running jobs - Sidekiq

    View Slide

  57. @titanoboa42
    • Aborted and requeued
    • Job may not finish before being aborted again
    Long running jobs - Sidekiq

    View Slide

  58. @titanoboa42
    Large collections

    View Slide

  59. @titanoboa42
    • Split job into collection and task to be done
    Large collections

    View Slide

  60. @titanoboa42
    • Split job into collection and task to be done
    • Checkpoint after iteration & requeue
    Large collections

    View Slide

  61. @titanoboa42
    Interruptible job with automatic resuming

    View Slide

  62. @titanoboa42
    • Shutdown workers anytime
    Interruptible job with automatic resuming

    View Slide

  63. @titanoboa42
    • Shutdown workers anytime
    • Disaster prevention
    Interruptible job with automatic resuming

    View Slide

  64. @titanoboa42
    • Shutdown workers anytime
    • Disaster prevention
    • Data integrity
    Interruptible job with automatic resuming

    View Slide

  65. @titanoboa42
    github.com

    /Shopify/job-iteration

    View Slide

  66. @titanoboa42
    Abstracting scaling issues

    simplifies 

    concrete background jobs

    View Slide

  67. @titanoboa42
    Task
    Queue
    Background jobs are
    ruby objects
    App
    Server
    Worker

    View Slide

  68. @titanoboa42
    Task
    Queue
    Background jobs are
    ruby objects
    App
    Server
    Worker
    Broker
    Broker

    View Slide

  69. @titanoboa42
    Offloading work to a worker
    running the same code base

    View Slide

  70. @titanoboa42
    Background jobs

    Summary

    View Slide

  71. @titanoboa42
    • Based on task queues
    Background jobs

    View Slide

  72. @titanoboa42
    • Based on task queues
    • Complex overall system, simple concrete jobs
    Background jobs

    View Slide

  73. @titanoboa42
    • Based on task queues
    • Complex overall system, simple concrete jobs
    • Great for monolith
    Background jobs

    View Slide

  74. @titanoboa42
    Message oriented middleware

    View Slide

  75. @titanoboa42
    • Implementations: RabbitMQ, ActiveMQ,…
    Message oriented middleware

    View Slide

  76. @titanoboa42
    • Implementations: RabbitMQ, ActiveMQ,…
    • Protocols: AMQP, MQTT, Stomp,…
    Message oriented middleware

    View Slide

  77. @titanoboa42
    Message
    Queue
    Messaging
    Middleware
    App
    Server
    Producer
    Data-based interface
    Worker
    Consumer

    View Slide

  78. @titanoboa42
    Message
    Queue
    Messaging
    Middleware
    App
    Server
    Producer
    Data-based interface
    Worker
    Consumer
    Broker

    View Slide

  79. @titanoboa42
    Features

    View Slide

  80. @titanoboa42
    Commands & Events

    View Slide

  81. @titanoboa42
    Propagating updates
    Business
    Partners
    Support
    Contracts
    Orders

    View Slide

  82. @titanoboa42
    Propagating updates
    Business
    Partners
    Support
    Contracts
    Orders

    View Slide

  83. @titanoboa42
    Messaging

    Middleware
    Resiliency
    Business
    Partners
    Orders

    View Slide

  84. @titanoboa42
    Messaging

    Middleware
    Resiliency
    Business
    Partners

    View Slide

  85. @titanoboa42
    Messaging

    Middleware
    Resiliency
    Business
    Partners
    Orders

    View Slide

  86. @titanoboa42
    Topic with queues

    provides

    advanced routing
    App
    Server
    Business

    Partners
    Support

    Contracts
    Orders
    Messaging
    Middleware

    View Slide

  87. @titanoboa42
    Topic with queues

    provides

    advanced routing
    App
    Server
    Message
    Queue
    Business

    Partners
    Support

    Contracts
    Orders
    Message
    Queue
    Messaging
    Middleware

    View Slide

  88. @titanoboa42
    Messaging
    Middleware
    Anonymity for
    producer and
    consumer
    Business
    Partners
    Support
    Contracts
    Orders

    View Slide

  89. @titanoboa42
    Messaging
    Middleware
    Anonymity for
    producer and
    consumer
    Business
    Partners
    Invoices
    Support
    Contracts
    Orders

    View Slide

  90. @titanoboa42
    Messaging
    Middleware
    Anonymity for
    producer and
    consumer
    FraudScore
    Orders
    Support
    Contracts

    View Slide

  91. @titanoboa42
    Messaging
    Middleware
    Anonymity for
    producer and
    consumer
    Invoices
    FraudScore
    Orders
    Support
    Contracts

    View Slide

  92. @titanoboa42
    Mastering challenges

    View Slide

  93. @titanoboa42
    Keep breaking changes manageable

    View Slide

  94. @titanoboa42
    • Avoid n:m routing
    Keep breaking changes manageable

    View Slide

  95. @titanoboa42
    • Avoid n:m routing
    • Better representation of domain:

    multiple messages routed 1:n, n:1 instead
    Keep breaking changes manageable

    View Slide

  96. @titanoboa42
    • No exactly once delivery
    Lack of guarantees

    View Slide

  97. @titanoboa42
    • No exactly once delivery
    • No strong consistency
    Lack of guarantees

    View Slide

  98. @titanoboa42
    • No exactly once delivery
    • No strong consistency
    • Out of order delivery
    Lack of guarantees

    View Slide

  99. @titanoboa42
    No single source of truth

    View Slide

  100. @titanoboa42
    • Messages removed after processing
    No single source of truth

    View Slide

  101. @titanoboa42
    • Messages removed after processing
    • No replayability
    No single source of truth

    View Slide

  102. @titanoboa42
    Offloading work to decoupled
    services with no notion of system
    wide state

    View Slide

  103. @titanoboa42
    Message oriented middleware

    Summary

    View Slide

  104. @titanoboa42
    • Based on queues and topics
    Message oriented middleware

    View Slide

  105. @titanoboa42
    • Based on queues and topics
    • Complex overall system
    Message oriented middleware

    View Slide

  106. @titanoboa42
    • Based on queues and topics
    • Complex overall system
    • Simple message consumers
    Message oriented middleware

    View Slide

  107. @titanoboa42
    • Great for decoupled microservices
    Message oriented middleware

    View Slide

  108. @titanoboa42
    • Great for decoupled microservices
    • No system wide state
    Message oriented middleware

    View Slide

  109. @titanoboa42
    Event log

    View Slide

  110. @titanoboa42
    • Kafka
    Event logs

    View Slide

  111. @titanoboa42
    • Kafka
    • …
    Event logs

    View Slide

  112. @titanoboa42
    • Events persisted into append-only log
    Event log

    View Slide

  113. @titanoboa42
    • Events persisted into append-only log
    • Consumers read shared log
    Event log

    View Slide

  114. @titanoboa42
    • Events persisted into append-only log
    • Consumers read shared log
    • Stateless broker (no queues)
    Event log

    View Slide

  115. @titanoboa42
    High throughput

    View Slide

  116. @titanoboa42
    Single source of truth

    View Slide

  117. @titanoboa42
    • Events are not removed after processing
    Single source of truth

    View Slide

  118. @titanoboa42
    • Events are not removed after processing
    • Replayability
    Single source of truth

    View Slide

  119. @titanoboa42
    Offloading work to services
    keeping the notion of system
    wide state

    View Slide

  120. @titanoboa42
    Event log

    Summary

    View Slide

  121. @titanoboa42
    • Based on shared log, no queues
    Event logs

    View Slide

  122. @titanoboa42
    • Based on shared log, no queues
    • Complex overall system
    Event logs

    View Slide

  123. @titanoboa42
    Event logs

    View Slide

  124. @titanoboa42
    • Single source of truth (e.g. for event sourcing)
    Event logs

    View Slide

  125. @titanoboa42
    • Single source of truth (e.g. for event sourcing)
    • High throughput applications
    Event logs

    View Slide

  126. @titanoboa42
    Summary

    View Slide

  127. @titanoboa42
    • Queues
    Background jobs

    View Slide

  128. @titanoboa42
    • Queues
    • For monolithic code base
    Background jobs

    View Slide

  129. @titanoboa42
    • Topics and Queues
    Message oriented middleware

    View Slide

  130. @titanoboa42
    • Topics and Queues
    • For decoupled microservices
    Message oriented middleware

    View Slide

  131. @titanoboa42
    • Shared log, no queues
    Event logs

    View Slide

  132. @titanoboa42
    • Shared log, no queues
    • For event sourcing & high throughput
    applications
    Event logs

    View Slide

  133. @titanoboa42
    BFCM video

    View Slide

  134. @titanoboa42
    BFCM video

    View Slide

  135. Thanks!

    Questions?

    @titanoboa42


    https://www.shopify.com/careers

    View Slide