Slide 1

Slide 1 text

Chapman: Building a Distributed Job Queue in MongoDB Rick Copeland @rick446 @synappio [email protected]

Slide 2

Slide 2 text

@rick446 @synappio Getting to Know One Another

Slide 3

Slide 3 text

@rick446 @synappio Getting to Know One Another Rick

Slide 4

Slide 4 text

@rick446 @synappio Getting to Know One Another Rick

Slide 5

Slide 5 text

@rick446 @synappio What You’ll Learn

Slide 6

Slide 6 text

@rick446 @synappio What You’ll Learn How to…

Slide 7

Slide 7 text

@rick446 @synappio What You’ll Learn How to… Build a task queue in MongoDB

Slide 8

Slide 8 text

@rick446 @synappio What You’ll Learn How to… Build a task queue in MongoDB

Slide 9

Slide 9 text

@rick446 @synappio What You’ll Learn How to… Build a task queue in MongoDB Bring consistency to distributed systems (without transactions)

Slide 10

Slide 10 text

@rick446 @synappio What You’ll Learn How to… Build a task queue in MongoDB Bring consistency to distributed systems (without transactions)

Slide 11

Slide 11 text

@rick446 @synappio What You’ll Learn How to… Build a task queue in MongoDB Bring consistency to distributed systems (without transactions) Build low-latency reactive systems

Slide 12

Slide 12 text

@rick446 @synappio Why a Queue?

Slide 13

Slide 13 text

@rick446 @synappio Why a Queue? • Long-running task (or longer than the web can wait)

Slide 14

Slide 14 text

@rick446 @synappio Why a Queue? • Long-running task (or longer than the web can wait) • Farm out chunks of work for performance

Slide 15

Slide 15 text

@rick446 @synappio Things I Worry About

Slide 16

Slide 16 text

@rick446 @synappio Things I Worry About • Priority

Slide 17

Slide 17 text

@rick446 @synappio Things I Worry About • Priority • Latency

Slide 18

Slide 18 text

@rick446 @synappio Things I Worry About • Priority • Latency • Unreliable workers

Slide 19

Slide 19 text

@rick446 @synappio Queue Options

Slide 20

Slide 20 text

@rick446 @synappio Queue Options • SQS? No priority

Slide 21

Slide 21 text

@rick446 @synappio Queue Options • SQS? No priority • Redis? Can’t overflow memory

Slide 22

Slide 22 text

@rick446 @synappio Queue Options • SQS? No priority • Redis? Can’t overflow memory • Rabbit-MQ? Lack of visibility

Slide 23

Slide 23 text

@rick446 @synappio Queue Options • SQS? No priority • Redis? Can’t overflow memory • Rabbit-MQ? Lack of visibility • ZeroMQ? Lack of persistence

Slide 24

Slide 24 text

@rick446 @synappio Queue Options • SQS? No priority • Redis? Can’t overflow memory • Rabbit-MQ? Lack of visibility • ZeroMQ? Lack of persistence • What about MongoDB?

Slide 25

Slide 25 text

@rick446 @synappio Chapman Graham Arthur Chapman 8 January 1941 – 4 October 1989

Slide 26

Slide 26 text

@rick446 @synappio Roadmap

Slide 27

Slide 27 text

@rick446 @synappio Roadmap • Building a scheduled priority queue

Slide 28

Slide 28 text

@rick446 @synappio Roadmap • Building a scheduled priority queue • Handling unreliable workers

Slide 29

Slide 29 text

@rick446 @synappio Roadmap • Building a scheduled priority queue • Handling unreliable workers • Shared resources

Slide 30

Slide 30 text

@rick446 @synappio Roadmap • Building a scheduled priority queue • Handling unreliable workers • Shared resources • Managing Latency

Slide 31

Slide 31 text

@rick446 @synappio Building a Scheduled Priority Queue

Slide 32

Slide 32 text

@rick446 @synappio Step 1: Simple Queue db.message.insert({! "_id" : NumberLong("3784707300388732067"),! "data" : BinData(...),! "s" : {! "status" : "ready",! "ts_enqueue" : ISODate("2015-03-02T15:27:29.228Z")! }! });! ! db.message.ensureIndex({'s.status': 1, 's.ts_enqueue': 1});! ! db.runCommand(! {! findAndModify: "message",! query: { 's.status': 'ready' },! sort: {'s.ts_enqueue': 1},! update: { '$set': {'s.status': 'reserved'} },! }! );!

Slide 33

Slide 33 text

@rick446 @synappio Step 1: Simple Queue db.message.insert({! "_id" : NumberLong("3784707300388732067"),! "data" : BinData(...),! "s" : {! "status" : "ready",! "ts_enqueue" : ISODate("2015-03-02T15:27:29.228Z")! }! });! ! db.message.ensureIndex({'s.status': 1, 's.ts_enqueue': 1});! ! db.runCommand(! {! findAndModify: "message",! query: { 's.status': 'ready' },! sort: {'s.ts_enqueue': 1},! update: { '$set': {'s.status': 'reserved'} },! }! );! FIFO

Slide 34

Slide 34 text

@rick446 @synappio Step 1: Simple Queue db.message.insert({! "_id" : NumberLong("3784707300388732067"),! "data" : BinData(...),! "s" : {! "status" : "ready",! "ts_enqueue" : ISODate("2015-03-02T15:27:29.228Z")! }! });! ! db.message.ensureIndex({'s.status': 1, 's.ts_enqueue': 1});! ! db.runCommand(! {! findAndModify: "message",! query: { 's.status': 'ready' },! sort: {'s.ts_enqueue': 1},! update: { '$set': {'s.status': 'reserved'} },! }! );! FIFO Get earliest message for processing

Slide 35

Slide 35 text

@rick446 @synappio Step 1: Simple Queue

Slide 36

Slide 36 text

@rick446 @synappio Step 1: Simple Queue Good

Slide 37

Slide 37 text

@rick446 @synappio Step 1: Simple Queue Good • Guaranteed FIFO

Slide 38

Slide 38 text

@rick446 @synappio Step 1: Simple Queue Good • Guaranteed FIFO Bad

Slide 39

Slide 39 text

@rick446 @synappio Step 1: Simple Queue Good • Guaranteed FIFO Bad • No priority (other than FIFO)

Slide 40

Slide 40 text

@rick446 @synappio Step 1: Simple Queue Good • Guaranteed FIFO Bad • No priority (other than FIFO) • No handling of worker problems

Slide 41

Slide 41 text

@rick446 @synappio Step 2: Scheduled Messages db.message.insert({! "_id" : NumberLong("3784707300388732067"),! "data" : BinData(...),! "s" : {! "status" : "ready",! “ts_after" : ISODate(…),! "ts_enqueue" : ISODate("2015-03-02T15:27:29.228Z")! }! });! ! db.message.ensureIndex(! {'s.status': 1, 's.ts_enqueue': 1});! ! db.runCommand(! {! findAndModify: "message",! query: { 's.status': 'ready', ’s.ts_after': {$lt: now }},! sort: {'s.ts_enqueue': 1},! update: { '$set': {'s.status': 'reserved'} },! }! );

Slide 42

Slide 42 text

@rick446 @synappio Step 2: Scheduled Messages db.message.insert({! "_id" : NumberLong("3784707300388732067"),! "data" : BinData(...),! "s" : {! "status" : "ready",! “ts_after" : ISODate(…),! "ts_enqueue" : ISODate("2015-03-02T15:27:29.228Z")! }! });! ! db.message.ensureIndex(! {'s.status': 1, 's.ts_enqueue': 1});! ! db.runCommand(! {! findAndModify: "message",! query: { 's.status': 'ready', ’s.ts_after': {$lt: now }},! sort: {'s.ts_enqueue': 1},! update: { '$set': {'s.status': 'reserved'} },! }! ); Min Valid Time

Slide 43

Slide 43 text

@rick446 @synappio Step 2: Scheduled Messages db.message.insert({! "_id" : NumberLong("3784707300388732067"),! "data" : BinData(...),! "s" : {! "status" : "ready",! “ts_after" : ISODate(…),! "ts_enqueue" : ISODate("2015-03-02T15:27:29.228Z")! }! });! ! db.message.ensureIndex(! {'s.status': 1, 's.ts_enqueue': 1});! ! db.runCommand(! {! findAndModify: "message",! query: { 's.status': 'ready', ’s.ts_after': {$lt: now }},! sort: {'s.ts_enqueue': 1},! update: { '$set': {'s.status': 'reserved'} },! }! ); Min Valid Time Get earliest message for processing

Slide 44

Slide 44 text

@rick446 @synappio Step 2: Scheduled Messages

Slide 45

Slide 45 text

@rick446 @synappio Step 2: Scheduled Messages Good

Slide 46

Slide 46 text

@rick446 @synappio Step 2: Scheduled Messages Good • Easy to build periodic tasks

Slide 47

Slide 47 text

@rick446 @synappio Step 2: Scheduled Messages Good • Easy to build periodic tasks Bad

Slide 48

Slide 48 text

@rick446 @synappio Step 2: Scheduled Messages Good • Easy to build periodic tasks Bad • Be careful with the word “now”

Slide 49

Slide 49 text

@rick446 @synappio Step 3: Priority db.message.insert({! "_id" : NumberLong("3784707300388732067"),! "data" : BinData(...),! "s" : {! "status" : "ready",! "pri": 30128,! "ts_enqueue" : ISODate("2015-03-02T15:27:29.228Z")! }! });! ! db.message.ensureIndex({'s.status': 1, 's.pri': -1, 's.ts_enqueue': 1});! ! db.runCommand(! {! findAndModify: "message",! query: { 's.status': 'ready' },! sort: {'s.pri': -1, 's.ts_enqueue': 1},! update: { '$set': {'s.status': 'reserved'} },! }! );

Slide 50

Slide 50 text

@rick446 @synappio Step 3: Priority db.message.insert({! "_id" : NumberLong("3784707300388732067"),! "data" : BinData(...),! "s" : {! "status" : "ready",! "pri": 30128,! "ts_enqueue" : ISODate("2015-03-02T15:27:29.228Z")! }! });! ! db.message.ensureIndex({'s.status': 1, 's.pri': -1, 's.ts_enqueue': 1});! ! db.runCommand(! {! findAndModify: "message",! query: { 's.status': 'ready' },! sort: {'s.pri': -1, 's.ts_enqueue': 1},! update: { '$set': {'s.status': 'reserved'} },! }! ); Add Priority

Slide 51

Slide 51 text

@rick446 @synappio Step 3: Priority db.message.insert({! "_id" : NumberLong("3784707300388732067"),! "data" : BinData(...),! "s" : {! "status" : "ready",! "pri": 30128,! "ts_enqueue" : ISODate("2015-03-02T15:27:29.228Z")! }! });! ! db.message.ensureIndex({'s.status': 1, 's.pri': -1, 's.ts_enqueue': 1});! ! db.runCommand(! {! findAndModify: "message",! query: { 's.status': 'ready' },! sort: {'s.pri': -1, 's.ts_enqueue': 1},! update: { '$set': {'s.status': 'reserved'} },! }! ); Add Priority

Slide 52

Slide 52 text

@rick446 @synappio Step 3: Priority

Slide 53

Slide 53 text

@rick446 @synappio Step 3: Priority Good

Slide 54

Slide 54 text

@rick446 @synappio Step 3: Priority Good • Priorities are handled

Slide 55

Slide 55 text

@rick446 @synappio Step 3: Priority Good • Priorities are handled • Guaranteed FIFO within a priority

Slide 56

Slide 56 text

@rick446 @synappio Step 3: Priority Good • Priorities are handled • Guaranteed FIFO within a priority Bad

Slide 57

Slide 57 text

@rick446 @synappio Step 3: Priority Good • Priorities are handled • Guaranteed FIFO within a priority Bad • No handling of worker problems

Slide 58

Slide 58 text

@rick446 @synappio Handling Unreliable Workers

Slide 59

Slide 59 text

@rick446 @synappio Approach 1 Timeouts db.message.insert({! "_id" : NumberLong("3784707300388732067"),! "data" : BinData(...),! "s" : {! "status" : "ready",! "pri": 30128,! "ts_enqueue" : ISODate("2015-03-02T15:27:29.228Z"),! "ts_timeout" : ISODate("2025-01-01T00:00:00.000Z")! }! });! ! db.message.ensureIndex({“s.status": 1, “s.ts_timeout": 1})! !

Slide 60

Slide 60 text

@rick446 @synappio Approach 1 Timeouts db.message.insert({! "_id" : NumberLong("3784707300388732067"),! "data" : BinData(...),! "s" : {! "status" : "ready",! "pri": 30128,! "ts_enqueue" : ISODate("2015-03-02T15:27:29.228Z"),! "ts_timeout" : ISODate("2025-01-01T00:00:00.000Z")! }! });! ! db.message.ensureIndex({“s.status": 1, “s.ts_timeout": 1})! ! Far-future placeholder

Slide 61

Slide 61 text

@rick446 @synappio // Reserve message! db.runCommand(! {! findAndModify: "message",! query: { 's.status': 'ready' },! sort: {'s.pri': -1, 's.ts_enqueue': 1},! update: { '$set': {! 's.status': 'reserved',! 's.ts_timeout': now + processing_time } }! }! );! ! // Timeout message ("unlock")! db.message.update(! {'s.ts_status': 'reserved', 's.ts_timeout': {'$lt': now}},! {'$set': {'s.status': 'ready'}},! {'multi': true}); Approach 1 Timeouts

Slide 62

Slide 62 text

@rick446 @synappio // Reserve message! db.runCommand(! {! findAndModify: "message",! query: { 's.status': 'ready' },! sort: {'s.pri': -1, 's.ts_enqueue': 1},! update: { '$set': {! 's.status': 'reserved',! 's.ts_timeout': now + processing_time } }! }! );! ! // Timeout message ("unlock")! db.message.update(! {'s.ts_status': 'reserved', 's.ts_timeout': {'$lt': now}},! {'$set': {'s.status': 'ready'}},! {'multi': true}); Client sets timeout Approach 1 Timeouts

Slide 63

Slide 63 text

@rick446 @synappio Approach 1 Timeouts

Slide 64

Slide 64 text

@rick446 @synappio Approach 1 Timeouts Good

Slide 65

Slide 65 text

@rick446 @synappio Approach 1 Timeouts Good • Worker failure handled via timeout

Slide 66

Slide 66 text

@rick446 @synappio Approach 1 Timeouts Good • Worker failure handled via timeout Bad

Slide 67

Slide 67 text

@rick446 @synappio Approach 1 Timeouts Good • Worker failure handled via timeout Bad • Requires periodic “unlock” task

Slide 68

Slide 68 text

@rick446 @synappio Approach 1 Timeouts Good • Worker failure handled via timeout Bad • Requires periodic “unlock” task • Slow (but “live”) workers can cause spurious timeouts

Slide 69

Slide 69 text

@rick446 @synappio db.message.insert({! "_id" : NumberLong("3784707300388732067"),! "data" : BinData(...),! "s" : {! "status" : "ready",! "pri": 30128,! "cli": "--------------------------"! "ts_enqueue" : ISODate("2015-03-02T..."),! "ts_timeout" : ISODate("2025-...")! }! }); Approach 2 Worker Identity

Slide 70

Slide 70 text

@rick446 @synappio db.message.insert({! "_id" : NumberLong("3784707300388732067"),! "data" : BinData(...),! "s" : {! "status" : "ready",! "pri": 30128,! "cli": "--------------------------"! "ts_enqueue" : ISODate("2015-03-02T..."),! "ts_timeout" : ISODate("2025-...")! }! }); Client / worker placeholder Approach 2 Worker Identity

Slide 71

Slide 71 text

@rick446 @synappio // Reserve message! db.runCommand(! {! findAndModify: "message",! query: { 's.status': 'ready' },! sort: {'s.pri': -1, 's.ts_enqueue': 1},! update: { '$set': {! 's.status': 'reserved',! 's.cli': ‘client_name:pid',! 's.ts_timeout': now + processing_time } }! }! );! ! // Unlock “dead” client messages! db.message.update(! {'s.status': 'reserved', ! 's.cli': {'$nin': active_clients} },! {'$set': {'s.status': 'ready'}},! {'multi': true});! Approach 2 Worker Identity

Slide 72

Slide 72 text

@rick446 @synappio // Reserve message! db.runCommand(! {! findAndModify: "message",! query: { 's.status': 'ready' },! sort: {'s.pri': -1, 's.ts_enqueue': 1},! update: { '$set': {! 's.status': 'reserved',! 's.cli': ‘client_name:pid',! 's.ts_timeout': now + processing_time } }! }! );! ! // Unlock “dead” client messages! db.message.update(! {'s.status': 'reserved', ! 's.cli': {'$nin': active_clients} },! {'$set': {'s.status': 'ready'}},! {'multi': true});! Mark the worker who reserved the message Approach 2 Worker Identity

Slide 73

Slide 73 text

@rick446 @synappio // Reserve message! db.runCommand(! {! findAndModify: "message",! query: { 's.status': 'ready' },! sort: {'s.pri': -1, 's.ts_enqueue': 1},! update: { '$set': {! 's.status': 'reserved',! 's.cli': ‘client_name:pid',! 's.ts_timeout': now + processing_time } }! }! );! ! // Unlock “dead” client messages! db.message.update(! {'s.status': 'reserved', ! 's.cli': {'$nin': active_clients} },! {'$set': {'s.status': 'ready'}},! {'multi': true});! Mark the worker who reserved the message Messages reserved by dead workers are unlocked Approach 2 Worker Identity

Slide 74

Slide 74 text

@rick446 @synappio Approach 2 Worker Identity

Slide 75

Slide 75 text

@rick446 @synappio Approach 2 Worker Identity Good

Slide 76

Slide 76 text

@rick446 @synappio Approach 2 Worker Identity Good • Worker failure handled via out- of-band detection of live workers

Slide 77

Slide 77 text

@rick446 @synappio Approach 2 Worker Identity Good • Worker failure handled via out- of-band detection of live workers • Handles slow workers

Slide 78

Slide 78 text

@rick446 @synappio Approach 2 Worker Identity Good • Worker failure handled via out- of-band detection of live workers • Handles slow workers

Slide 79

Slide 79 text

@rick446 @synappio Approach 2 Worker Identity Good • Worker failure handled via out- of-band detection of live workers • Handles slow workers Bad

Slide 80

Slide 80 text

@rick446 @synappio Approach 2 Worker Identity Good • Worker failure handled via out- of-band detection of live workers • Handles slow workers Bad • Requires periodic “unlock” task

Slide 81

Slide 81 text

@rick446 @synappio Approach 2 Worker Identity Good • Worker failure handled via out- of-band detection of live workers • Handles slow workers Bad • Requires periodic “unlock” task • Unlock updates can be slow

Slide 82

Slide 82 text

@rick446 @synappio Shared Resources

Slide 83

Slide 83 text

@rick446 @synappio Complex Tasks Group check_smtp Analyze Results Update Reports Pipeline

Slide 84

Slide 84 text

@rick446 @synappio Semaphores

Slide 85

Slide 85 text

@rick446 @synappio Semaphores • Some services perform connection- throttling (e.g. Mailchimp)

Slide 86

Slide 86 text

@rick446 @synappio Semaphores • Some services perform connection- throttling (e.g. Mailchimp) • Some services just have a hard time with 144 threads hitting them simultaneously

Slide 87

Slide 87 text

@rick446 @synappio Semaphores • Some services perform connection- throttling (e.g. Mailchimp) • Some services just have a hard time with 144 threads hitting them simultaneously • Need a way to limit our concurrency

Slide 88

Slide 88 text

@rick446 @synappio Semaphores Semaphore Active: msg1, msg2, msg3, … Capacity: 16 Queued: msg17, msg18, msg19, …

Slide 89

Slide 89 text

@rick446 @synappio Semaphores Semaphore Active: msg1, msg2, msg3, … Capacity: 16 Queued: msg17, msg18, msg19, … • Keep active and queued messages in arrays

Slide 90

Slide 90 text

@rick446 @synappio Semaphores Semaphore Active: msg1, msg2, msg3, … Capacity: 16 Queued: msg17, msg18, msg19, … • Keep active and queued messages in arrays • Releasing the semaphore makes queued messages available for dispatch

Slide 91

Slide 91 text

@rick446 @synappio Semaphores Semaphore Active: msg1, msg2, msg3, … Capacity: 16 Queued: msg17, msg18, msg19, … • Keep active and queued messages in arrays • Releasing the semaphore makes queued messages available for dispatch • Use $slice (2.6) to keep arrays the right size

Slide 92

Slide 92 text

@rick446 @synappio Semaphores: Acquire db.semaphore.insert({! '_id': 'semaphore-name',! 'value': 16,! 'active': [],! 'queued': []});! ! def acquire(sem_id, msg_id, sem_size):! sem = db.semaphore.find_and_modify(! {'_id': sem_id},! update={'$push': {! 'active': {! '$each': [msg_id], ! '$slice': sem_size},! 'queued': msg_id}},! new=True)! if msg_id in sem['active']:! db.semaphore.update(! {'_id': 'semaphore-name'},! {'$pull': {'queued': msg_id}})! return True! return False

Slide 93

Slide 93 text

@rick446 @synappio Semaphores: Acquire db.semaphore.insert({! '_id': 'semaphore-name',! 'value': 16,! 'active': [],! 'queued': []});! ! def acquire(sem_id, msg_id, sem_size):! sem = db.semaphore.find_and_modify(! {'_id': sem_id},! update={'$push': {! 'active': {! '$each': [msg_id], ! '$slice': sem_size},! 'queued': msg_id}},! new=True)! if msg_id in sem['active']:! db.semaphore.update(! {'_id': 'semaphore-name'},! {'$pull': {'queued': msg_id}})! return True! return False Pessimistic update

Slide 94

Slide 94 text

@rick446 @synappio Semaphores: Acquire db.semaphore.insert({! '_id': 'semaphore-name',! 'value': 16,! 'active': [],! 'queued': []});! ! def acquire(sem_id, msg_id, sem_size):! sem = db.semaphore.find_and_modify(! {'_id': sem_id},! update={'$push': {! 'active': {! '$each': [msg_id], ! '$slice': sem_size},! 'queued': msg_id}},! new=True)! if msg_id in sem['active']:! db.semaphore.update(! {'_id': 'semaphore-name'},! {'$pull': {'queued': msg_id}})! return True! return False Pessimistic update Compensation

Slide 95

Slide 95 text

@rick446 @synappio Semaphores: Release def release(sem_id, msg_id, sem_size):! sem = db.semaphore.find_and_modify(! {'_id': sem_id},! update={'$pull': {! 'active': msg_id, ! 'queued': msg_id}},! new=True)! ! while len(sem['active']) < sem_size and sem['queued']:! wake_msg_ids = sem['queued'][:sem_size]! updated = self.cls.m.find_and_modify(! {'_id': sem_id},! update={'$pullAll': {'queued': wake_msg_ids}},! new=True)! for msgid in wake_msg_ids:! make_dispatchable(msgid)! sem = updated

Slide 96

Slide 96 text

@rick446 @synappio Semaphores: Release def release(sem_id, msg_id, sem_size):! sem = db.semaphore.find_and_modify(! {'_id': sem_id},! update={'$pull': {! 'active': msg_id, ! 'queued': msg_id}},! new=True)! ! while len(sem['active']) < sem_size and sem['queued']:! wake_msg_ids = sem['queued'][:sem_size]! updated = self.cls.m.find_and_modify(! {'_id': sem_id},! update={'$pullAll': {'queued': wake_msg_ids}},! new=True)! for msgid in wake_msg_ids:! make_dispatchable(msgid)! sem = updated Actually release

Slide 97

Slide 97 text

@rick446 @synappio Semaphores: Release def release(sem_id, msg_id, sem_size):! sem = db.semaphore.find_and_modify(! {'_id': sem_id},! update={'$pull': {! 'active': msg_id, ! 'queued': msg_id}},! new=True)! ! while len(sem['active']) < sem_size and sem['queued']:! wake_msg_ids = sem['queued'][:sem_size]! updated = self.cls.m.find_and_modify(! {'_id': sem_id},! update={'$pullAll': {'queued': wake_msg_ids}},! new=True)! for msgid in wake_msg_ids:! make_dispatchable(msgid)! sem = updated Actually release Awaken queued message(s)

Slide 98

Slide 98 text

@rick446 @synappio Semaphores: Release def release(sem_id, msg_id, sem_size):! sem = db.semaphore.find_and_modify(! {'_id': sem_id},! update={'$pull': {! 'active': msg_id, ! 'queued': msg_id}},! new=True)! ! while len(sem['active']) < sem_size and sem['queued']:! wake_msg_ids = sem['queued'][:sem_size]! updated = self.cls.m.find_and_modify(! {'_id': sem_id},! update={'$pullAll': {'queued': wake_msg_ids}},! new=True)! for msgid in wake_msg_ids:! make_dispatchable(msgid)! sem = updated Actually release Awaken queued message(s) Some magic (covered later)

Slide 99

Slide 99 text

@rick446 @synappio Message States ready acquire queued busy

Slide 100

Slide 100 text

@rick446 @synappio Message States ready acquire queued busy • Reserve the message

Slide 101

Slide 101 text

@rick446 @synappio Message States ready acquire queued busy • Reserve the message • Acquire resources

Slide 102

Slide 102 text

@rick446 @synappio Message States ready acquire queued busy • Reserve the message • Acquire resources • Process the message

Slide 103

Slide 103 text

@rick446 @synappio Message States ready acquire queued busy • Reserve the message • Acquire resources • Process the message • Release resources

Slide 104

Slide 104 text

@rick446 @synappio Reserve a Message msg = db.message.find_and_modify(! {'s.status': 'ready'},! sort=[('s.sub_status', -1), ('s.pri', -1), ('s.ts', 1)],! update={'$set': {'s.w': worker, 's.status': 'acquire'}},! new=True) message.s == {! pri: 10,! semaphores: ['foo'],! status: 'ready',! sub_status: 0,! w: '----------',! ...} message.s == {! pri: 10,! semaphores: ['foo'],! status: 'acquire! sub_status: 0,! w: worker,! ...}

Slide 105

Slide 105 text

@rick446 @synappio Reserve a Message msg = db.message.find_and_modify(! {'s.status': 'ready'},! sort=[('s.sub_status', -1), ('s.pri', -1), ('s.ts', 1)],! update={'$set': {'s.w': worker, 's.status': 'acquire'}},! new=True) message.s == {! pri: 10,! semaphores: ['foo'],! status: 'ready',! sub_status: 0,! w: '----------',! ...} Required semaphores message.s == {! pri: 10,! semaphores: ['foo'],! status: 'acquire! sub_status: 0,! w: worker,! ...}

Slide 106

Slide 106 text

@rick446 @synappio Reserve a Message msg = db.message.find_and_modify(! {'s.status': 'ready'},! sort=[('s.sub_status', -1), ('s.pri', -1), ('s.ts', 1)],! update={'$set': {'s.w': worker, 's.status': 'acquire'}},! new=True) message.s == {! pri: 10,! semaphores: ['foo'],! status: 'ready',! sub_status: 0,! w: '----------',! ...} Required semaphores # semaphores acquired message.s == {! pri: 10,! semaphores: ['foo'],! status: 'acquire! sub_status: 0,! w: worker,! ...}

Slide 107

Slide 107 text

@rick446 @synappio Reserve a Message msg = db.message.find_and_modify(! {'s.status': 'ready'},! sort=[('s.sub_status', -1), ('s.pri', -1), ('s.ts', 1)],! update={'$set': {'s.w': worker, 's.status': 'acquire'}},! new=True) message.s == {! pri: 10,! semaphores: ['foo'],! status: 'ready',! sub_status: 0,! w: '----------',! ...} Required semaphores # semaphores acquired message.s == {! pri: 10,! semaphores: ['foo'],! status: 'acquire! sub_status: 0,! w: worker,! ...} Prefer partially-acquired messages

Slide 108

Slide 108 text

@rick446 @synappio Acquire Resources def acquire_resources(msg):! for i, sem_id in enumerate(msg['s']['semaphores']):! if i < msg['sub_status']: # already acquired! continue! sem = db.semaphore.find_one({'_id': 'sem_id'})! if try_acquire_resource(sem_id, msg['_id'], sem['value']):! db.message.update(! {'_id': msg['_id']}, {'$set': {'s.sub_status': i}})! else:! return False! db.message.update(! {'_id': msg['_id']}, {'$set': {'s.status': 'busy'}})! return True

Slide 109

Slide 109 text

@rick446 @synappio Acquire Resources def acquire_resources(msg):! for i, sem_id in enumerate(msg['s']['semaphores']):! if i < msg['sub_status']: # already acquired! continue! sem = db.semaphore.find_one({'_id': 'sem_id'})! if try_acquire_resource(sem_id, msg['_id'], sem['value']):! db.message.update(! {'_id': msg['_id']}, {'$set': {'s.sub_status': i}})! else:! return False! db.message.update(! {'_id': msg['_id']}, {'$set': {'s.status': 'busy'}})! return True Save forward progress

Slide 110

Slide 110 text

@rick446 @synappio Acquire Resources def acquire_resources(msg):! for i, sem_id in enumerate(msg['s']['semaphores']):! if i < msg['sub_status']: # already acquired! continue! sem = db.semaphore.find_one({'_id': 'sem_id'})! if try_acquire_resource(sem_id, msg['_id'], sem['value']):! db.message.update(! {'_id': msg['_id']}, {'$set': {'s.sub_status': i}})! else:! return False! db.message.update(! {'_id': msg['_id']}, {'$set': {'s.status': 'busy'}})! return True Save forward progress Failure to acquire (already queued)

Slide 111

Slide 111 text

@rick446 @synappio Acquire Resources def acquire_resources(msg):! for i, sem_id in enumerate(msg['s']['semaphores']):! if i < msg['sub_status']: # already acquired! continue! sem = db.semaphore.find_one({'_id': 'sem_id'})! if try_acquire_resource(sem_id, msg['_id'], sem['value']):! db.message.update(! {'_id': msg['_id']}, {'$set': {'s.sub_status': i}})! else:! return False! db.message.update(! {'_id': msg['_id']}, {'$set': {'s.status': 'busy'}})! return True Save forward progress Failure to acquire (already queued) Resources acquired, message ready to be processed

Slide 112

Slide 112 text

@rick446 @synappio Acquire Resources def try_acquire_resource(sem_id, msg_id, sem_size):! '''Version 1 (race condition)'''! if reserve(sem_id, msg_id, sem_size):! return True! else:! db.message.update(! {'_id': msg_id},! {'$set': {'s.status': 'queued'}})! return False

Slide 113

Slide 113 text

@rick446 @synappio Acquire Resources def try_acquire_resource(sem_id, msg_id, sem_size):! '''Version 1 (race condition)'''! if reserve(sem_id, msg_id, sem_size):! return True! else:! db.message.update(! {'_id': msg_id},! {'$set': {'s.status': 'queued'}})! return False Here be dragons!

Slide 114

Slide 114 text

@rick446 @synappio Release Resources (v1) “magic” def make_dispatchable(msg_id):! '''Version 1 (race condition)'''! db.message.update(! {'_id': msg_id, 's.status': 'queued'},! {'$set': {'s.status': 'ready'}})

Slide 115

Slide 115 text

@rick446 @synappio Release Resources (v1) “magic” def make_dispatchable(msg_id):! '''Version 1 (race condition)'''! db.message.update(! {'_id': msg_id, 's.status': 'queued'},! {'$set': {'s.status': 'ready'}}) But what if s.status == ‘acquire’?

Slide 116

Slide 116 text

@rick446 @synappio Release Resources (v1) “magic” def make_dispatchable(msg_id):! '''Version 1 (race condition)'''! db.message.update(! {'_id': msg_id, 's.status': 'queued'},! {'$set': {'s.status': 'ready'}}) But what if s.status == ‘acquire’?

Slide 117

Slide 117 text

@rick446 @synappio Release Resources (v1) “magic” def make_dispatchable(msg_id):! '''Version 1 (race condition)'''! db.message.update(! {'_id': msg_id, 's.status': 'queued'},! {'$set': {'s.status': 'ready'}}) But what if s.status == ‘acquire’? That’s the dragon.

Slide 118

Slide 118 text

@rick446 @synappio Release Resources (v2) def make_dispatchable(msg_id):! res = db.message.update(! {'_id': msg_id, 's.status': 'acquire'},! {'$set': {'s.event': True}})! if not res['updatedExisting']:! db.message.update(! {'_id': msg_id, 's.status': 'queued'},! {'$set': {'s.status': 'ready'}})

Slide 119

Slide 119 text

@rick446 @synappio Release Resources (v2) def make_dispatchable(msg_id):! res = db.message.update(! {'_id': msg_id, 's.status': 'acquire'},! {'$set': {'s.event': True}})! if not res['updatedExisting']:! db.message.update(! {'_id': msg_id, 's.status': 'queued'},! {'$set': {'s.status': 'ready'}}) Hey, something happened!

Slide 120

Slide 120 text

@rick446 @synappio Acquire Resources (v2) def try_acquire_resource(sem_id, msg_id, sem_size):! '''Version 2'''! while True:! db.message.update(! {'_id': msg_id}, {'$set': {'event': False}})! if reserve(sem_id, msg_id, sem_size):! return True! else:! res = db.message.update(! {'_id': msg_id, 's.event': False},! {'$set': {'s.status': 'queued'}})! if not res['updatedExisting']:! # Someone released this message; try again! continue! return False

Slide 121

Slide 121 text

@rick446 @synappio Acquire Resources (v2) def try_acquire_resource(sem_id, msg_id, sem_size):! '''Version 2'''! while True:! db.message.update(! {'_id': msg_id}, {'$set': {'event': False}})! if reserve(sem_id, msg_id, sem_size):! return True! else:! res = db.message.update(! {'_id': msg_id, 's.event': False},! {'$set': {'s.status': 'queued'}})! if not res['updatedExisting']:! # Someone released this message; try again! continue! return False Nothing’s happened yet!

Slide 122

Slide 122 text

@rick446 @synappio Acquire Resources (v2) def try_acquire_resource(sem_id, msg_id, sem_size):! '''Version 2'''! while True:! db.message.update(! {'_id': msg_id}, {'$set': {'event': False}})! if reserve(sem_id, msg_id, sem_size):! return True! else:! res = db.message.update(! {'_id': msg_id, 's.event': False},! {'$set': {'s.status': 'queued'}})! if not res['updatedExisting']:! # Someone released this message; try again! continue! return False Nothing’s happened yet! Check if something happened

Slide 123

Slide 123 text

@rick446 @synappio One More Race…. def release(sem_id, msg_id, sem_size):! sem = db.semaphore.find_and_modify(! {'_id': sem_id},! update={'$pull': {! 'active': msg_id, ! 'queued': msg_id}},! new=True)! ! while len(sem['active']) < sem_size and sem['queued']:! wake_msg_ids = sem['queued'][:sem_size]! updated = self.cls.m.find_and_modify(! {'_id': sem_id},! update={'$pullAll': {'queued': wake_msg_ids}},! new=True)! for msgid in wake_msg_ids:! make_dispatchable(msgid)! sem = updated

Slide 124

Slide 124 text

@rick446 @synappio One More Race…. def release(sem_id, msg_id, sem_size):! sem = db.semaphore.find_and_modify(! {'_id': sem_id},! update={'$pull': {! 'active': msg_id, ! 'queued': msg_id}},! new=True)! ! while len(sem['active']) < sem_size and sem['queued']:! wake_msg_ids = sem['queued'][:sem_size]! updated = self.cls.m.find_and_modify(! {'_id': sem_id},! update={'$pullAll': {'queued': wake_msg_ids}},! new=True)! for msgid in wake_msg_ids:! make_dispatchable(msgid)! sem = updated

Slide 125

Slide 125 text

@rick446 @synappio Compensate! def fixup_queued_messages():! for msg in db.message.find({'s.status': 'queued'}):! sem_id = msg['semaphores'][msg['s']['sub_status']]! sem = db.semaphore.find_one(! {'_id': sem_id, 'queued': msg['_id']})! if sem is None:! db.message.m.update(! {'_id': msg['_id'], ! 's.status': 'queued', ! 's.sub_status': msg['sub_status']},! {'$set': {'s.status': 'ready'}})

Slide 126

Slide 126 text

@rick446 @synappio Managing Latency

Slide 127

Slide 127 text

@rick446 @synappio Managing Latency • Reserving messages is expensive • Use Pub/Sub system instead • Publish to the channel whenever a message is ready to be handled • Each worker subscribes to the channel • Workers only ‘poll’ when they have a chance of getting work

Slide 128

Slide 128 text

Capped Collections Capped Collection • Fixed size • Fast inserts • “Tailable” cursors Tailable Cursor

Slide 129

Slide 129 text

Capped Collections Capped Collection • Fixed size • Fast inserts • “Tailable” cursors Tailable Cursor

Slide 130

Slide 130 text

Capped Collections Capped Collection • Fixed size • Fast inserts • “Tailable” cursors Tailable Cursor

Slide 131

Slide 131 text

Capped Collections Capped Collection • Fixed size • Fast inserts • “Tailable” cursors Tailable Cursor

Slide 132

Slide 132 text

Capped Collections Capped Collection • Fixed size • Fast inserts • “Tailable” cursors Tailable Cursor

Slide 133

Slide 133 text

Capped Collections Capped Collection • Fixed size • Fast inserts • “Tailable” cursors Tailable Cursor

Slide 134

Slide 134 text

@rick446 @synappio Getting a Tailable Cursor def get_cursor(collection, topic_re, await_data=True):! options = { 'tailable': True }! if await_data:! options['await_data'] = True! cur = collection.find(! { 'k': topic_re },! **options)! cur = cur.hint([('$natural', 1)]) # ensure we don't use any indexes! return cur

Slide 135

Slide 135 text

@rick446 @synappio Getting a Tailable Cursor def get_cursor(collection, topic_re, await_data=True):! options = { 'tailable': True }! if await_data:! options['await_data'] = True! cur = collection.find(! { 'k': topic_re },! **options)! cur = cur.hint([('$natural', 1)]) # ensure we don't use any indexes! return cur Make cursor tailable

Slide 136

Slide 136 text

@rick446 @synappio Getting a Tailable Cursor def get_cursor(collection, topic_re, await_data=True):! options = { 'tailable': True }! if await_data:! options['await_data'] = True! cur = collection.find(! { 'k': topic_re },! **options)! cur = cur.hint([('$natural', 1)]) # ensure we don't use any indexes! return cur Holds open cursor for a while Make cursor tailable

Slide 137

Slide 137 text

@rick446 @synappio Getting a Tailable Cursor def get_cursor(collection, topic_re, await_data=True):! options = { 'tailable': True }! if await_data:! options['await_data'] = True! cur = collection.find(! { 'k': topic_re },! **options)! cur = cur.hint([('$natural', 1)]) # ensure we don't use any indexes! return cur Holds open cursor for a while Make cursor tailable Don’t use indexes

Slide 138

Slide 138 text

@rick446 @synappio Getting a Tailable Cursor def get_cursor(collection, topic_re, await_data=True):! options = { 'tailable': True }! if await_data:! options['await_data'] = True! cur = collection.find(! { 'k': topic_re },! **options)! cur = cur.hint([('$natural', 1)]) # ensure we don't use any indexes! return cur import re, time! while True:! cur = get_cursor(! db.capped_collection, ! re.compile('^foo'), ! await_data=True)! for msg in cur:! do_something(msg)! time.sleep(0.1) Holds open cursor for a while Make cursor tailable Don’t use indexes

Slide 139

Slide 139 text

@rick446 @synappio Getting a Tailable Cursor def get_cursor(collection, topic_re, await_data=True):! options = { 'tailable': True }! if await_data:! options['await_data'] = True! cur = collection.find(! { 'k': topic_re },! **options)! cur = cur.hint([('$natural', 1)]) # ensure we don't use any indexes! return cur import re, time! while True:! cur = get_cursor(! db.capped_collection, ! re.compile('^foo'), ! await_data=True)! for msg in cur:! do_something(msg)! time.sleep(0.1) Holds open cursor for a while Make cursor tailable Don’t use indexes Still some polling when no producer, so don’t spin too fast

Slide 140

Slide 140 text

@rick446 @synappio Building in retry... def get_cursor(collection, topic_re, last_id=-1, await_data=True):! options = { 'tailable': True }! spec = { ! 'id': { '$gt': last_id }, # only new messages! 'k': topic_re }! if await_data:! options['await_data'] = True! cur = collection.find(spec, **options)! cur = cur.hint([('$natural', 1)]) # ensure we don't use any indexes! return cur

Slide 141

Slide 141 text

@rick446 @synappio Building in retry... def get_cursor(collection, topic_re, last_id=-1, await_data=True):! options = { 'tailable': True }! spec = { ! 'id': { '$gt': last_id }, # only new messages! 'k': topic_re }! if await_data:! options['await_data'] = True! cur = collection.find(spec, **options)! cur = cur.hint([('$natural', 1)]) # ensure we don't use any indexes! return cur Integer autoincrement “id”

Slide 142

Slide 142 text

@rick446 @synappio Ludicrous Speed from pymongo.cursor import _QUERY_OPTIONS! ! def get_cursor(collection, topic_re, last_id=-1, await_data=True):! options = { 'tailable': True }! spec = { ! 'ts': { '$gt': last_id }, # only new messages! 'k': topic_re }! if await_data:! options['await_data'] = True! cur = collection.find(spec, **options)! cur = cur.hint([('$natural', 1)]) # ensure we don't use any indexes! if await:! cur = cur.add_option(_QUERY_OPTIONS['oplog_replay'])! return cur

Slide 143

Slide 143 text

@rick446 @synappio Ludicrous Speed from pymongo.cursor import _QUERY_OPTIONS! ! def get_cursor(collection, topic_re, last_id=-1, await_data=True):! options = { 'tailable': True }! spec = { ! 'ts': { '$gt': last_id }, # only new messages! 'k': topic_re }! if await_data:! options['await_data'] = True! cur = collection.find(spec, **options)! cur = cur.hint([('$natural', 1)]) # ensure we don't use any indexes! if await:! cur = cur.add_option(_QUERY_OPTIONS['oplog_replay'])! return cur id ==> ts

Slide 144

Slide 144 text

@rick446 @synappio Ludicrous Speed from pymongo.cursor import _QUERY_OPTIONS! ! def get_cursor(collection, topic_re, last_id=-1, await_data=True):! options = { 'tailable': True }! spec = { ! 'ts': { '$gt': last_id }, # only new messages! 'k': topic_re }! if await_data:! options['await_data'] = True! cur = collection.find(spec, **options)! cur = cur.hint([('$natural', 1)]) # ensure we don't use any indexes! if await:! cur = cur.add_option(_QUERY_OPTIONS['oplog_replay'])! return cur id ==> ts Co-opt the oplog_replay option

Slide 145

Slide 145 text

@rick446 @synappio The Oplog • Capped collection that records all operations for replication • Includes a ‘ts’ field suitable for oplog_replay • Does not require a separate publish operation (all changes are automatically “published”)

Slide 146

Slide 146 text

@rick446 @synappio Using the Oplog def oplog_await(oplog, spec):! '''Await the very next message on the oplog satisfying the spec'''! last = oplog.find_one(spec, sort=[('$natural', -1)])! if last is None:! return # Can't await unless there is an existing message satisfying spec! await_spec = dict(spec)! last_ts = last['ts']! await_spec['ts'] = {'$gt': bson.Timestamp(last_ts.time, last_ts.inc - 1)}! curs = oplog.find(await_spec, tailable=True, await_data=True)! curs = curs.hint([('$natural', 1)])! curs = curs.add_option(_QUERY_OPTIONS['oplog_replay'])! curs.next() # should always find 1 element! try:! return curs.next()! except StopIteration:! return None

Slide 147

Slide 147 text

@rick446 @synappio Using the Oplog def oplog_await(oplog, spec):! '''Await the very next message on the oplog satisfying the spec'''! last = oplog.find_one(spec, sort=[('$natural', -1)])! if last is None:! return # Can't await unless there is an existing message satisfying spec! await_spec = dict(spec)! last_ts = last['ts']! await_spec['ts'] = {'$gt': bson.Timestamp(last_ts.time, last_ts.inc - 1)}! curs = oplog.find(await_spec, tailable=True, await_data=True)! curs = curs.hint([('$natural', 1)])! curs = curs.add_option(_QUERY_OPTIONS['oplog_replay'])! curs.next() # should always find 1 element! try:! return curs.next()! except StopIteration:! return None most recent oplog entry

Slide 148

Slide 148 text

@rick446 @synappio Using the Oplog def oplog_await(oplog, spec):! '''Await the very next message on the oplog satisfying the spec'''! last = oplog.find_one(spec, sort=[('$natural', -1)])! if last is None:! return # Can't await unless there is an existing message satisfying spec! await_spec = dict(spec)! last_ts = last['ts']! await_spec['ts'] = {'$gt': bson.Timestamp(last_ts.time, last_ts.inc - 1)}! curs = oplog.find(await_spec, tailable=True, await_data=True)! curs = curs.hint([('$natural', 1)])! curs = curs.add_option(_QUERY_OPTIONS['oplog_replay'])! curs.next() # should always find 1 element! try:! return curs.next()! except StopIteration:! return None most recent oplog entry finds most recent plus following entries

Slide 149

Slide 149 text

@rick446 @synappio Using the Oplog def oplog_await(oplog, spec):! '''Await the very next message on the oplog satisfying the spec'''! last = oplog.find_one(spec, sort=[('$natural', -1)])! if last is None:! return # Can't await unless there is an existing message satisfying spec! await_spec = dict(spec)! last_ts = last['ts']! await_spec['ts'] = {'$gt': bson.Timestamp(last_ts.time, last_ts.inc - 1)}! curs = oplog.find(await_spec, tailable=True, await_data=True)! curs = curs.hint([('$natural', 1)])! curs = curs.add_option(_QUERY_OPTIONS['oplog_replay'])! curs.next() # should always find 1 element! try:! return curs.next()! except StopIteration:! return None most recent oplog entry finds most recent plus following entries skip most recent

Slide 150

Slide 150 text

@rick446 @synappio Using the Oplog def oplog_await(oplog, spec):! '''Await the very next message on the oplog satisfying the spec'''! last = oplog.find_one(spec, sort=[('$natural', -1)])! if last is None:! return # Can't await unless there is an existing message satisfying spec! await_spec = dict(spec)! last_ts = last['ts']! await_spec['ts'] = {'$gt': bson.Timestamp(last_ts.time, last_ts.inc - 1)}! curs = oplog.find(await_spec, tailable=True, await_data=True)! curs = curs.hint([('$natural', 1)])! curs = curs.add_option(_QUERY_OPTIONS['oplog_replay'])! curs.next() # should always find 1 element! try:! return curs.next()! except StopIteration:! return None most recent oplog entry finds most recent plus following entries skip most recent return on anything new

Slide 151

Slide 151 text

@rick446 @synappio What We’ve Learned

Slide 152

Slide 152 text

@rick446 @synappio What We’ve Learned How to…

Slide 153

Slide 153 text

@rick446 @synappio What We’ve Learned How to… Build a task queue in MongoDB

Slide 154

Slide 154 text

@rick446 @synappio What We’ve Learned How to… Build a task queue in MongoDB

Slide 155

Slide 155 text

@rick446 @synappio What We’ve Learned How to… Build a task queue in MongoDB Bring consistency to distributed systems (without transactions)

Slide 156

Slide 156 text

@rick446 @synappio What We’ve Learned How to… Build a task queue in MongoDB Bring consistency to distributed systems (without transactions)

Slide 157

Slide 157 text

@rick446 @synappio What We’ve Learned How to… Build a task queue in MongoDB Bring consistency to distributed systems (without transactions) Build low-latency reactive systems

Slide 158

Slide 158 text

@rick446 @synappio Tips

Slide 159

Slide 159 text

@rick446 @synappio Tips • findAndModify is ideal for queues

Slide 160

Slide 160 text

@rick446 @synappio Tips • findAndModify is ideal for queues

Slide 161

Slide 161 text

@rick446 @synappio Tips • findAndModify is ideal for queues • Atomic update + compensation brings consistency to your distributed system

Slide 162

Slide 162 text

@rick446 @synappio Tips • findAndModify is ideal for queues • Atomic update + compensation brings consistency to your distributed system

Slide 163

Slide 163 text

@rick446 @synappio Tips • findAndModify is ideal for queues • Atomic update + compensation brings consistency to your distributed system • Use the oplog to build reactive, low-latency systems

Slide 164

Slide 164 text

Questions? Rick Copeland [email protected] @rick446