Slide 1

Slide 1 text

Kerstin Puschke @titanoboa42 High availability by offloading work - background jobs, message queues, or Kafka

Slide 2

Slide 2 text

@titanoboa42

Slide 3

Slide 3 text

@titanoboa42 Different approaches to offload work to ensure high availability

Slide 4

Slide 4 text

@titanoboa42 • High availability & offloading work Outline

Slide 5

Slide 5 text

@titanoboa42 • High availability & offloading work • Background jobs Outline

Slide 6

Slide 6 text

@titanoboa42 • High availability & offloading work • Background jobs • Message oriented middleware Outline

Slide 7

Slide 7 text

@titanoboa42 • Event log Outline

Slide 8

Slide 8 text

@titanoboa42 • Event log • Summary Outline

Slide 9

Slide 9 text

@titanoboa42 High availability & offloading work

Slide 10

Slide 10 text

@titanoboa42 High Availability

Slide 11

Slide 11 text

@titanoboa42 
 users
 can interact
 with the system High Availability

Slide 12

Slide 12 text

@titanoboa42 
 users
 can interact meaningfully
 with the system High Availability

Slide 13

Slide 13 text

@titanoboa42 community of
 users
 can interact meaningfully
 with the system High Availability

Slide 14

Slide 14 text

@titanoboa42 community of
 users
 can interact meaningfully
 with the system whenever needed High Availability

Slide 15

Slide 15 text

@titanoboa42 community of
 users
 can interact meaningfully
 with the system whenever needed High Availability community of
 users
 can interact meaningfully
 with the system whenever needed

Slide 16

Slide 16 text

@titanoboa42 Background jobs

Slide 17

Slide 17 text

@titanoboa42 • Resque Background job backends

Slide 18

Slide 18 text

@titanoboa42 • Resque • Sidekiq Background job backends

Slide 19

Slide 19 text

@titanoboa42 • Resque • Sidekiq • … Background job backends

Slide 20

Slide 20 text

@titanoboa42 Background job:
 Unit of work 
 to be done later App Server Worker

Slide 21

Slide 21 text

@titanoboa42 Asynchronous communication App Server Message Queue Worker

Slide 22

Slide 22 text

@titanoboa42 Asynchronous communication App Server Message Queue Worker Task Queue

Slide 23

Slide 23 text

@titanoboa42 Asynchronous communication App Server Message Queue Worker Worker Worker Task Queue

Slide 24

Slide 24 text

@titanoboa42 Background job backend:
 task queue & broker

Slide 25

Slide 25 text

@titanoboa42 Encapsulating
 async communication

Slide 26

Slide 26 text

@titanoboa42 Features

Slide 27

Slide 27 text

@titanoboa42 Task Queue Response times App Server Worker

Slide 28

Slide 28 text

@titanoboa42 Task Queue Spikeability App Server Worker

Slide 29

Slide 29 text

@titanoboa42 Task Queue Parallelization App Server Worker Worker Worker

Slide 30

Slide 30 text

@titanoboa42 Task Queue Retries App Server Worker Worker Worker

Slide 31

Slide 31 text

@titanoboa42 Mastering challenges

Slide 32

Slide 32 text

@titanoboa42 Job queued and processed by different versions

Slide 33

Slide 33 text

@titanoboa42 • No breaking changes to job parameters Job queued and processed by different versions

Slide 34

Slide 34 text

@titanoboa42 • No breaking changes to job parameters • Changes need to be backwards compatible until legacy jobs have been processed Job queued and processed by different versions

Slide 35

Slide 35 text

@titanoboa42 No exactly once delivery

Slide 36

Slide 36 text

@titanoboa42 • “At least” vs. “at most” once delivery No exactly once delivery

Slide 37

Slide 37 text

@titanoboa42 • “At least” vs. “at most” once delivery • Idempotent jobs & at least once delivery No exactly once delivery

Slide 38

Slide 38 text

@titanoboa42 Non-transactional queuing

Slide 39

Slide 39 text

@titanoboa42 • Don’t queue from within a db transaction Non-transactional queuing

Slide 40

Slide 40 text

@titanoboa42 • Don’t queue from within a db transaction • Job runs before commit, or in case of rollback Non-transactional queuing

Slide 41

Slide 41 text

@titanoboa42 • Don’t queue from within a db transaction • Job runs before commit, or in case of rollback • Commit first: Job not guaranteed to be queued Non-transactional queuing

Slide 42

Slide 42 text

@titanoboa42 Non-transactional queuing

Slide 43

Slide 43 text

@titanoboa42 • Stage transactionally Non-transactional queuing

Slide 44

Slide 44 text

@titanoboa42 • Stage transactionally • Scheduler queues job, updates staging data Non-transactional queuing

Slide 45

Slide 45 text

@titanoboa42 Local transactions

Slide 46

Slide 46 text

@titanoboa42 • Eventual consistency at best Local transactions

Slide 47

Slide 47 text

@titanoboa42 • Eventual consistency at best • SAGA command/orchestration Local transactions

Slide 48

Slide 48 text

@titanoboa42 Out of order delivery

Slide 49

Slide 49 text

@titanoboa42 • SAGA events/choreography: jobs queue jobs Out of order delivery

Slide 50

Slide 50 text

@titanoboa42 • SAGA events/choreography: jobs queue jobs • easy to build, hard to maintain or debug Out of order delivery

Slide 51

Slide 51 text

@titanoboa42 • SAGA events/choreography: jobs queue jobs • easy to build, hard to maintain or debug • SAGA command/orchestrator Out of order delivery

Slide 52

Slide 52 text

@titanoboa42 Long running jobs - Resque

Slide 53

Slide 53 text

@titanoboa42 • Prevent worker shutdown Long running jobs - Resque

Slide 54

Slide 54 text

@titanoboa42 • Prevent worker shutdown • No deployments Long running jobs - Resque

Slide 55

Slide 55 text

@titanoboa42 • Prevent worker shutdown • No deployments • Not cloud-friendly Long running jobs - Resque

Slide 56

Slide 56 text

@titanoboa42 • Aborted and requeued Long running jobs - Sidekiq

Slide 57

Slide 57 text

@titanoboa42 • Aborted and requeued • Job may not finish before being aborted again Long running jobs - Sidekiq

Slide 58

Slide 58 text

@titanoboa42 Large collections

Slide 59

Slide 59 text

@titanoboa42 • Split job into collection and task to be done Large collections

Slide 60

Slide 60 text

@titanoboa42 • Split job into collection and task to be done • Checkpoint after iteration & requeue Large collections

Slide 61

Slide 61 text

@titanoboa42 Interruptible job with automatic resuming

Slide 62

Slide 62 text

@titanoboa42 • Shutdown workers anytime Interruptible job with automatic resuming

Slide 63

Slide 63 text

@titanoboa42 • Shutdown workers anytime • Disaster prevention Interruptible job with automatic resuming

Slide 64

Slide 64 text

@titanoboa42 • Shutdown workers anytime • Disaster prevention • Data integrity Interruptible job with automatic resuming

Slide 65

Slide 65 text

@titanoboa42 github.com
 /Shopify/job-iteration

Slide 66

Slide 66 text

@titanoboa42 Abstracting scaling issues
 simplifies 
 concrete background jobs

Slide 67

Slide 67 text

@titanoboa42 Task Queue Background jobs are ruby objects App Server Worker

Slide 68

Slide 68 text

@titanoboa42 Task Queue Background jobs are ruby objects App Server Worker Broker Broker

Slide 69

Slide 69 text

@titanoboa42 Offloading work to a worker running the same code base

Slide 70

Slide 70 text

@titanoboa42 Background jobs
 Summary

Slide 71

Slide 71 text

@titanoboa42 • Based on task queues Background jobs

Slide 72

Slide 72 text

@titanoboa42 • Based on task queues • Complex overall system, simple concrete jobs Background jobs

Slide 73

Slide 73 text

@titanoboa42 • Based on task queues • Complex overall system, simple concrete jobs • Great for monolith Background jobs

Slide 74

Slide 74 text

@titanoboa42 Message oriented middleware

Slide 75

Slide 75 text

@titanoboa42 • Implementations: RabbitMQ, ActiveMQ,… Message oriented middleware

Slide 76

Slide 76 text

@titanoboa42 • Implementations: RabbitMQ, ActiveMQ,… • Protocols: AMQP, MQTT, Stomp,… Message oriented middleware

Slide 77

Slide 77 text

@titanoboa42 Message Queue Messaging Middleware App Server Producer Data-based interface Worker Consumer

Slide 78

Slide 78 text

@titanoboa42 Message Queue Messaging Middleware App Server Producer Data-based interface Worker Consumer Broker

Slide 79

Slide 79 text

@titanoboa42 Features

Slide 80

Slide 80 text

@titanoboa42 Commands & Events

Slide 81

Slide 81 text

@titanoboa42 Propagating updates Business Partners Support Contracts Orders

Slide 82

Slide 82 text

@titanoboa42 Propagating updates Business Partners Support Contracts Orders

Slide 83

Slide 83 text

@titanoboa42 Messaging
 Middleware Resiliency Business Partners Orders

Slide 84

Slide 84 text

@titanoboa42 Messaging
 Middleware Resiliency Business Partners

Slide 85

Slide 85 text

@titanoboa42 Messaging
 Middleware Resiliency Business Partners Orders

Slide 86

Slide 86 text

@titanoboa42 Topic with queues
 provides
 advanced routing App Server Business
 Partners Support
 Contracts Orders Messaging Middleware

Slide 87

Slide 87 text

@titanoboa42 Topic with queues
 provides
 advanced routing App Server Message Queue Business
 Partners Support
 Contracts Orders Message Queue Messaging Middleware

Slide 88

Slide 88 text

@titanoboa42 Messaging Middleware Anonymity for producer and consumer Business Partners Support Contracts Orders

Slide 89

Slide 89 text

@titanoboa42 Messaging Middleware Anonymity for producer and consumer Business Partners Invoices Support Contracts Orders

Slide 90

Slide 90 text

@titanoboa42 Messaging Middleware Anonymity for producer and consumer FraudScore Orders Support Contracts

Slide 91

Slide 91 text

@titanoboa42 Messaging Middleware Anonymity for producer and consumer Invoices FraudScore Orders Support Contracts

Slide 92

Slide 92 text

@titanoboa42 Mastering challenges

Slide 93

Slide 93 text

@titanoboa42 Keep breaking changes manageable

Slide 94

Slide 94 text

@titanoboa42 • Avoid n:m routing Keep breaking changes manageable

Slide 95

Slide 95 text

@titanoboa42 • Avoid n:m routing • Better representation of domain:
 multiple messages routed 1:n, n:1 instead Keep breaking changes manageable

Slide 96

Slide 96 text

@titanoboa42 • No exactly once delivery Lack of guarantees

Slide 97

Slide 97 text

@titanoboa42 • No exactly once delivery • No strong consistency Lack of guarantees

Slide 98

Slide 98 text

@titanoboa42 • No exactly once delivery • No strong consistency • Out of order delivery Lack of guarantees

Slide 99

Slide 99 text

@titanoboa42 No single source of truth

Slide 100

Slide 100 text

@titanoboa42 • Messages removed after processing No single source of truth

Slide 101

Slide 101 text

@titanoboa42 • Messages removed after processing • No replayability No single source of truth

Slide 102

Slide 102 text

@titanoboa42 Offloading work to decoupled services with no notion of system wide state

Slide 103

Slide 103 text

@titanoboa42 Message oriented middleware
 Summary

Slide 104

Slide 104 text

@titanoboa42 • Based on queues and topics Message oriented middleware

Slide 105

Slide 105 text

@titanoboa42 • Based on queues and topics • Complex overall system Message oriented middleware

Slide 106

Slide 106 text

@titanoboa42 • Based on queues and topics • Complex overall system • Simple message consumers Message oriented middleware

Slide 107

Slide 107 text

@titanoboa42 • Great for decoupled microservices Message oriented middleware

Slide 108

Slide 108 text

@titanoboa42 • Great for decoupled microservices • No system wide state Message oriented middleware

Slide 109

Slide 109 text

@titanoboa42 Event log

Slide 110

Slide 110 text

@titanoboa42 • Kafka Event logs

Slide 111

Slide 111 text

@titanoboa42 • Kafka • … Event logs

Slide 112

Slide 112 text

@titanoboa42 • Events persisted into append-only log Event log

Slide 113

Slide 113 text

@titanoboa42 • Events persisted into append-only log • Consumers read shared log Event log

Slide 114

Slide 114 text

@titanoboa42 • Events persisted into append-only log • Consumers read shared log • Stateless broker (no queues) Event log

Slide 115

Slide 115 text

@titanoboa42 High throughput

Slide 116

Slide 116 text

@titanoboa42 Single source of truth

Slide 117

Slide 117 text

@titanoboa42 • Events are not removed after processing Single source of truth

Slide 118

Slide 118 text

@titanoboa42 • Events are not removed after processing • Replayability Single source of truth

Slide 119

Slide 119 text

@titanoboa42 Offloading work to services keeping the notion of system wide state

Slide 120

Slide 120 text

@titanoboa42 Event log
 Summary

Slide 121

Slide 121 text

@titanoboa42 • Based on shared log, no queues Event logs

Slide 122

Slide 122 text

@titanoboa42 • Based on shared log, no queues • Complex overall system Event logs

Slide 123

Slide 123 text

@titanoboa42 Event logs

Slide 124

Slide 124 text

@titanoboa42 • Single source of truth (e.g. for event sourcing) Event logs

Slide 125

Slide 125 text

@titanoboa42 • Single source of truth (e.g. for event sourcing) • High throughput applications Event logs

Slide 126

Slide 126 text

@titanoboa42 Summary

Slide 127

Slide 127 text

@titanoboa42 • Queues Background jobs

Slide 128

Slide 128 text

@titanoboa42 • Queues • For monolithic code base Background jobs

Slide 129

Slide 129 text

@titanoboa42 • Topics and Queues Message oriented middleware

Slide 130

Slide 130 text

@titanoboa42 • Topics and Queues • For decoupled microservices Message oriented middleware

Slide 131

Slide 131 text

@titanoboa42 • Shared log, no queues Event logs

Slide 132

Slide 132 text

@titanoboa42 • Shared log, no queues • For event sourcing & high throughput applications Event logs

Slide 133

Slide 133 text

@titanoboa42 BFCM video

Slide 134

Slide 134 text

@titanoboa42 BFCM video

Slide 135

Slide 135 text

Thanks!
 Questions?
 @titanoboa42
 
 https://www.shopify.com/careers