Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Chapman: Building a High-Performance Distributed Task Service with MongoDB

rick446
June 02, 2015

Chapman: Building a High-Performance Distributed Task Service with MongoDB

When you're building a web application, you want to respond to every request as quickly as possible. The usual approach is to use an asynchronous job queue like Sidekiq, Resque, Celery, RQ, or a number of other frameworks to handle those tasks outside the request/response cycle in a separate 'worker' process. Unfortunately, many of these frameworks either require the deployment of Redis, RabbitMQ, or some other request broker, or they resort to polling a database for new work to do. Chapman is a distributed task queue built on MongoDB that avoids gratuitous polling, using tailable cursors with the oplog to provide notifications of incoming work. Inspired by Celery, Chapman also supports task graphs, where multiple tasks that depend on each other can be executed by the system asynchronously. Come learn how Synapp.io is using MongoDB and Chapman to handle its core data processing needs.

rick446

June 02, 2015
Tweet

More Decks by rick446

Other Decks in Technology

Transcript

  1. @rick446 @synappio What You’ll Learn How to… Build a task

    queue in MongoDB Bring consistency to distributed systems (without transactions)
  2. @rick446 @synappio What You’ll Learn How to… Build a task

    queue in MongoDB Bring consistency to distributed systems (without transactions)
  3. @rick446 @synappio What You’ll Learn How to… Build a task

    queue in MongoDB Bring consistency to distributed systems (without transactions) Build low-latency reactive systems
  4. @rick446 @synappio Why a Queue? • Long-running task (or longer

    than the web can wait) • Farm out chunks of work for performance
  5. @rick446 @synappio Queue Options • SQS? No priority • Redis?

    Can’t overflow memory • Rabbit-MQ? Lack of visibility
  6. @rick446 @synappio Queue Options • SQS? No priority • Redis?

    Can’t overflow memory • Rabbit-MQ? Lack of visibility • ZeroMQ? Lack of persistence
  7. @rick446 @synappio Queue Options • SQS? No priority • Redis?

    Can’t overflow memory • Rabbit-MQ? Lack of visibility • ZeroMQ? Lack of persistence • What about MongoDB?
  8. @rick446 @synappio Roadmap • Building a scheduled priority queue •

    Handling unreliable workers • Shared resources
  9. @rick446 @synappio Roadmap • Building a scheduled priority queue •

    Handling unreliable workers • Shared resources • Managing Latency
  10. @rick446 @synappio Step 1: Simple Queue db.message.insert({! "_id" : NumberLong("3784707300388732067"),!

    "data" : BinData(...),! "s" : {! "status" : "ready",! "ts_enqueue" : ISODate("2015-03-02T15:27:29.228Z")! }! });! ! db.message.ensureIndex({'s.status': 1, 's.ts_enqueue': 1});! ! db.runCommand(! {! findAndModify: "message",! query: { 's.status': 'ready' },! sort: {'s.ts_enqueue': 1},! update: { '$set': {'s.status': 'reserved'} },! }! );!
  11. @rick446 @synappio Step 1: Simple Queue db.message.insert({! "_id" : NumberLong("3784707300388732067"),!

    "data" : BinData(...),! "s" : {! "status" : "ready",! "ts_enqueue" : ISODate("2015-03-02T15:27:29.228Z")! }! });! ! db.message.ensureIndex({'s.status': 1, 's.ts_enqueue': 1});! ! db.runCommand(! {! findAndModify: "message",! query: { 's.status': 'ready' },! sort: {'s.ts_enqueue': 1},! update: { '$set': {'s.status': 'reserved'} },! }! );! FIFO
  12. @rick446 @synappio Step 1: Simple Queue db.message.insert({! "_id" : NumberLong("3784707300388732067"),!

    "data" : BinData(...),! "s" : {! "status" : "ready",! "ts_enqueue" : ISODate("2015-03-02T15:27:29.228Z")! }! });! ! db.message.ensureIndex({'s.status': 1, 's.ts_enqueue': 1});! ! db.runCommand(! {! findAndModify: "message",! query: { 's.status': 'ready' },! sort: {'s.ts_enqueue': 1},! update: { '$set': {'s.status': 'reserved'} },! }! );! FIFO Get earliest message for processing
  13. @rick446 @synappio Step 1: Simple Queue Good • Guaranteed FIFO

    Bad • No priority (other than FIFO) • No handling of worker problems
  14. @rick446 @synappio Step 2: Scheduled Messages db.message.insert({! "_id" : NumberLong("3784707300388732067"),!

    "data" : BinData(...),! "s" : {! "status" : "ready",! “ts_after" : ISODate(…),! "ts_enqueue" : ISODate("2015-03-02T15:27:29.228Z")! }! });! ! db.message.ensureIndex(! {'s.status': 1, 's.ts_enqueue': 1});! ! db.runCommand(! {! findAndModify: "message",! query: { 's.status': 'ready', ’s.ts_after': {$lt: now }},! sort: {'s.ts_enqueue': 1},! update: { '$set': {'s.status': 'reserved'} },! }! );
  15. @rick446 @synappio Step 2: Scheduled Messages db.message.insert({! "_id" : NumberLong("3784707300388732067"),!

    "data" : BinData(...),! "s" : {! "status" : "ready",! “ts_after" : ISODate(…),! "ts_enqueue" : ISODate("2015-03-02T15:27:29.228Z")! }! });! ! db.message.ensureIndex(! {'s.status': 1, 's.ts_enqueue': 1});! ! db.runCommand(! {! findAndModify: "message",! query: { 's.status': 'ready', ’s.ts_after': {$lt: now }},! sort: {'s.ts_enqueue': 1},! update: { '$set': {'s.status': 'reserved'} },! }! ); Min Valid Time
  16. @rick446 @synappio Step 2: Scheduled Messages db.message.insert({! "_id" : NumberLong("3784707300388732067"),!

    "data" : BinData(...),! "s" : {! "status" : "ready",! “ts_after" : ISODate(…),! "ts_enqueue" : ISODate("2015-03-02T15:27:29.228Z")! }! });! ! db.message.ensureIndex(! {'s.status': 1, 's.ts_enqueue': 1});! ! db.runCommand(! {! findAndModify: "message",! query: { 's.status': 'ready', ’s.ts_after': {$lt: now }},! sort: {'s.ts_enqueue': 1},! update: { '$set': {'s.status': 'reserved'} },! }! ); Min Valid Time Get earliest message for processing
  17. @rick446 @synappio Step 2: Scheduled Messages Good • Easy to

    build periodic tasks Bad • Be careful with the word “now”
  18. @rick446 @synappio Step 3: Priority db.message.insert({! "_id" : NumberLong("3784707300388732067"),! "data"

    : BinData(...),! "s" : {! "status" : "ready",! "pri": 30128,! "ts_enqueue" : ISODate("2015-03-02T15:27:29.228Z")! }! });! ! db.message.ensureIndex({'s.status': 1, 's.pri': -1, 's.ts_enqueue': 1});! ! db.runCommand(! {! findAndModify: "message",! query: { 's.status': 'ready' },! sort: {'s.pri': -1, 's.ts_enqueue': 1},! update: { '$set': {'s.status': 'reserved'} },! }! );
  19. @rick446 @synappio Step 3: Priority db.message.insert({! "_id" : NumberLong("3784707300388732067"),! "data"

    : BinData(...),! "s" : {! "status" : "ready",! "pri": 30128,! "ts_enqueue" : ISODate("2015-03-02T15:27:29.228Z")! }! });! ! db.message.ensureIndex({'s.status': 1, 's.pri': -1, 's.ts_enqueue': 1});! ! db.runCommand(! {! findAndModify: "message",! query: { 's.status': 'ready' },! sort: {'s.pri': -1, 's.ts_enqueue': 1},! update: { '$set': {'s.status': 'reserved'} },! }! ); Add Priority
  20. @rick446 @synappio Step 3: Priority db.message.insert({! "_id" : NumberLong("3784707300388732067"),! "data"

    : BinData(...),! "s" : {! "status" : "ready",! "pri": 30128,! "ts_enqueue" : ISODate("2015-03-02T15:27:29.228Z")! }! });! ! db.message.ensureIndex({'s.status': 1, 's.pri': -1, 's.ts_enqueue': 1});! ! db.runCommand(! {! findAndModify: "message",! query: { 's.status': 'ready' },! sort: {'s.pri': -1, 's.ts_enqueue': 1},! update: { '$set': {'s.status': 'reserved'} },! }! ); Add Priority
  21. @rick446 @synappio Step 3: Priority Good • Priorities are handled

    • Guaranteed FIFO within a priority Bad • No handling of worker problems
  22. @rick446 @synappio Approach 1 Timeouts db.message.insert({! "_id" : NumberLong("3784707300388732067"),! "data"

    : BinData(...),! "s" : {! "status" : "ready",! "pri": 30128,! "ts_enqueue" : ISODate("2015-03-02T15:27:29.228Z"),! "ts_timeout" : ISODate("2025-01-01T00:00:00.000Z")! }! });! ! db.message.ensureIndex({“s.status": 1, “s.ts_timeout": 1})! !
  23. @rick446 @synappio Approach 1 Timeouts db.message.insert({! "_id" : NumberLong("3784707300388732067"),! "data"

    : BinData(...),! "s" : {! "status" : "ready",! "pri": 30128,! "ts_enqueue" : ISODate("2015-03-02T15:27:29.228Z"),! "ts_timeout" : ISODate("2025-01-01T00:00:00.000Z")! }! });! ! db.message.ensureIndex({“s.status": 1, “s.ts_timeout": 1})! ! Far-future placeholder
  24. @rick446 @synappio // Reserve message! db.runCommand(! {! findAndModify: "message",! query:

    { 's.status': 'ready' },! sort: {'s.pri': -1, 's.ts_enqueue': 1},! update: { '$set': {! 's.status': 'reserved',! 's.ts_timeout': now + processing_time } }! }! );! ! // Timeout message ("unlock")! db.message.update(! {'s.ts_status': 'reserved', 's.ts_timeout': {'$lt': now}},! {'$set': {'s.status': 'ready'}},! {'multi': true}); Approach 1 Timeouts
  25. @rick446 @synappio // Reserve message! db.runCommand(! {! findAndModify: "message",! query:

    { 's.status': 'ready' },! sort: {'s.pri': -1, 's.ts_enqueue': 1},! update: { '$set': {! 's.status': 'reserved',! 's.ts_timeout': now + processing_time } }! }! );! ! // Timeout message ("unlock")! db.message.update(! {'s.ts_status': 'reserved', 's.ts_timeout': {'$lt': now}},! {'$set': {'s.status': 'ready'}},! {'multi': true}); Client sets timeout Approach 1 Timeouts
  26. @rick446 @synappio Approach 1 Timeouts Good • Worker failure handled

    via timeout Bad • Requires periodic “unlock” task
  27. @rick446 @synappio Approach 1 Timeouts Good • Worker failure handled

    via timeout Bad • Requires periodic “unlock” task • Slow (but “live”) workers can cause spurious timeouts
  28. @rick446 @synappio db.message.insert({! "_id" : NumberLong("3784707300388732067"),! "data" : BinData(...),! "s"

    : {! "status" : "ready",! "pri": 30128,! "cli": "--------------------------"! "ts_enqueue" : ISODate("2015-03-02T..."),! "ts_timeout" : ISODate("2025-...")! }! }); Approach 2 Worker Identity
  29. @rick446 @synappio db.message.insert({! "_id" : NumberLong("3784707300388732067"),! "data" : BinData(...),! "s"

    : {! "status" : "ready",! "pri": 30128,! "cli": "--------------------------"! "ts_enqueue" : ISODate("2015-03-02T..."),! "ts_timeout" : ISODate("2025-...")! }! }); Client / worker placeholder Approach 2 Worker Identity
  30. @rick446 @synappio // Reserve message! db.runCommand(! {! findAndModify: "message",! query:

    { 's.status': 'ready' },! sort: {'s.pri': -1, 's.ts_enqueue': 1},! update: { '$set': {! 's.status': 'reserved',! 's.cli': ‘client_name:pid',! 's.ts_timeout': now + processing_time } }! }! );! ! // Unlock “dead” client messages! db.message.update(! {'s.status': 'reserved', ! 's.cli': {'$nin': active_clients} },! {'$set': {'s.status': 'ready'}},! {'multi': true});! Approach 2 Worker Identity
  31. @rick446 @synappio // Reserve message! db.runCommand(! {! findAndModify: "message",! query:

    { 's.status': 'ready' },! sort: {'s.pri': -1, 's.ts_enqueue': 1},! update: { '$set': {! 's.status': 'reserved',! 's.cli': ‘client_name:pid',! 's.ts_timeout': now + processing_time } }! }! );! ! // Unlock “dead” client messages! db.message.update(! {'s.status': 'reserved', ! 's.cli': {'$nin': active_clients} },! {'$set': {'s.status': 'ready'}},! {'multi': true});! Mark the worker who reserved the message Approach 2 Worker Identity
  32. @rick446 @synappio // Reserve message! db.runCommand(! {! findAndModify: "message",! query:

    { 's.status': 'ready' },! sort: {'s.pri': -1, 's.ts_enqueue': 1},! update: { '$set': {! 's.status': 'reserved',! 's.cli': ‘client_name:pid',! 's.ts_timeout': now + processing_time } }! }! );! ! // Unlock “dead” client messages! db.message.update(! {'s.status': 'reserved', ! 's.cli': {'$nin': active_clients} },! {'$set': {'s.status': 'ready'}},! {'multi': true});! Mark the worker who reserved the message Messages reserved by dead workers are unlocked Approach 2 Worker Identity
  33. @rick446 @synappio Approach 2 Worker Identity Good • Worker failure

    handled via out- of-band detection of live workers
  34. @rick446 @synappio Approach 2 Worker Identity Good • Worker failure

    handled via out- of-band detection of live workers • Handles slow workers
  35. @rick446 @synappio Approach 2 Worker Identity Good • Worker failure

    handled via out- of-band detection of live workers • Handles slow workers
  36. @rick446 @synappio Approach 2 Worker Identity Good • Worker failure

    handled via out- of-band detection of live workers • Handles slow workers Bad
  37. @rick446 @synappio Approach 2 Worker Identity Good • Worker failure

    handled via out- of-band detection of live workers • Handles slow workers Bad • Requires periodic “unlock” task
  38. @rick446 @synappio Approach 2 Worker Identity Good • Worker failure

    handled via out- of-band detection of live workers • Handles slow workers Bad • Requires periodic “unlock” task • Unlock updates can be slow
  39. @rick446 @synappio Semaphores • Some services perform connection- throttling (e.g.

    Mailchimp) • Some services just have a hard time with 144 threads hitting them simultaneously
  40. @rick446 @synappio Semaphores • Some services perform connection- throttling (e.g.

    Mailchimp) • Some services just have a hard time with 144 threads hitting them simultaneously • Need a way to limit our concurrency
  41. @rick446 @synappio Semaphores Semaphore Active: msg1, msg2, msg3, … Capacity:

    16 Queued: msg17, msg18, msg19, … • Keep active and queued messages in arrays
  42. @rick446 @synappio Semaphores Semaphore Active: msg1, msg2, msg3, … Capacity:

    16 Queued: msg17, msg18, msg19, … • Keep active and queued messages in arrays • Releasing the semaphore makes queued messages available for dispatch
  43. @rick446 @synappio Semaphores Semaphore Active: msg1, msg2, msg3, … Capacity:

    16 Queued: msg17, msg18, msg19, … • Keep active and queued messages in arrays • Releasing the semaphore makes queued messages available for dispatch • Use $slice (2.6) to keep arrays the right size
  44. @rick446 @synappio Semaphores: Acquire db.semaphore.insert({! '_id': 'semaphore-name',! 'value': 16,! 'active':

    [],! 'queued': []});! ! def acquire(sem_id, msg_id, sem_size):! sem = db.semaphore.find_and_modify(! {'_id': sem_id},! update={'$push': {! 'active': {! '$each': [msg_id], ! '$slice': sem_size},! 'queued': msg_id}},! new=True)! if msg_id in sem['active']:! db.semaphore.update(! {'_id': 'semaphore-name'},! {'$pull': {'queued': msg_id}})! return True! return False
  45. @rick446 @synappio Semaphores: Acquire db.semaphore.insert({! '_id': 'semaphore-name',! 'value': 16,! 'active':

    [],! 'queued': []});! ! def acquire(sem_id, msg_id, sem_size):! sem = db.semaphore.find_and_modify(! {'_id': sem_id},! update={'$push': {! 'active': {! '$each': [msg_id], ! '$slice': sem_size},! 'queued': msg_id}},! new=True)! if msg_id in sem['active']:! db.semaphore.update(! {'_id': 'semaphore-name'},! {'$pull': {'queued': msg_id}})! return True! return False Pessimistic update
  46. @rick446 @synappio Semaphores: Acquire db.semaphore.insert({! '_id': 'semaphore-name',! 'value': 16,! 'active':

    [],! 'queued': []});! ! def acquire(sem_id, msg_id, sem_size):! sem = db.semaphore.find_and_modify(! {'_id': sem_id},! update={'$push': {! 'active': {! '$each': [msg_id], ! '$slice': sem_size},! 'queued': msg_id}},! new=True)! if msg_id in sem['active']:! db.semaphore.update(! {'_id': 'semaphore-name'},! {'$pull': {'queued': msg_id}})! return True! return False Pessimistic update Compensation
  47. @rick446 @synappio Semaphores: Release def release(sem_id, msg_id, sem_size):! sem =

    db.semaphore.find_and_modify(! {'_id': sem_id},! update={'$pull': {! 'active': msg_id, ! 'queued': msg_id}},! new=True)! ! while len(sem['active']) < sem_size and sem['queued']:! wake_msg_ids = sem['queued'][:sem_size]! updated = self.cls.m.find_and_modify(! {'_id': sem_id},! update={'$pullAll': {'queued': wake_msg_ids}},! new=True)! for msgid in wake_msg_ids:! make_dispatchable(msgid)! sem = updated
  48. @rick446 @synappio Semaphores: Release def release(sem_id, msg_id, sem_size):! sem =

    db.semaphore.find_and_modify(! {'_id': sem_id},! update={'$pull': {! 'active': msg_id, ! 'queued': msg_id}},! new=True)! ! while len(sem['active']) < sem_size and sem['queued']:! wake_msg_ids = sem['queued'][:sem_size]! updated = self.cls.m.find_and_modify(! {'_id': sem_id},! update={'$pullAll': {'queued': wake_msg_ids}},! new=True)! for msgid in wake_msg_ids:! make_dispatchable(msgid)! sem = updated Actually release
  49. @rick446 @synappio Semaphores: Release def release(sem_id, msg_id, sem_size):! sem =

    db.semaphore.find_and_modify(! {'_id': sem_id},! update={'$pull': {! 'active': msg_id, ! 'queued': msg_id}},! new=True)! ! while len(sem['active']) < sem_size and sem['queued']:! wake_msg_ids = sem['queued'][:sem_size]! updated = self.cls.m.find_and_modify(! {'_id': sem_id},! update={'$pullAll': {'queued': wake_msg_ids}},! new=True)! for msgid in wake_msg_ids:! make_dispatchable(msgid)! sem = updated Actually release Awaken queued message(s)
  50. @rick446 @synappio Semaphores: Release def release(sem_id, msg_id, sem_size):! sem =

    db.semaphore.find_and_modify(! {'_id': sem_id},! update={'$pull': {! 'active': msg_id, ! 'queued': msg_id}},! new=True)! ! while len(sem['active']) < sem_size and sem['queued']:! wake_msg_ids = sem['queued'][:sem_size]! updated = self.cls.m.find_and_modify(! {'_id': sem_id},! update={'$pullAll': {'queued': wake_msg_ids}},! new=True)! for msgid in wake_msg_ids:! make_dispatchable(msgid)! sem = updated Actually release Awaken queued message(s) Some magic (covered later)
  51. @rick446 @synappio Message States ready acquire queued busy • Reserve

    the message • Acquire resources • Process the message
  52. @rick446 @synappio Message States ready acquire queued busy • Reserve

    the message • Acquire resources • Process the message • Release resources
  53. @rick446 @synappio Reserve a Message msg = db.message.find_and_modify(! {'s.status': 'ready'},!

    sort=[('s.sub_status', -1), ('s.pri', -1), ('s.ts', 1)],! update={'$set': {'s.w': worker, 's.status': 'acquire'}},! new=True) message.s == {! pri: 10,! semaphores: ['foo'],! status: 'ready',! sub_status: 0,! w: '----------',! ...} message.s == {! pri: 10,! semaphores: ['foo'],! status: 'acquire! sub_status: 0,! w: worker,! ...}
  54. @rick446 @synappio Reserve a Message msg = db.message.find_and_modify(! {'s.status': 'ready'},!

    sort=[('s.sub_status', -1), ('s.pri', -1), ('s.ts', 1)],! update={'$set': {'s.w': worker, 's.status': 'acquire'}},! new=True) message.s == {! pri: 10,! semaphores: ['foo'],! status: 'ready',! sub_status: 0,! w: '----------',! ...} Required semaphores message.s == {! pri: 10,! semaphores: ['foo'],! status: 'acquire! sub_status: 0,! w: worker,! ...}
  55. @rick446 @synappio Reserve a Message msg = db.message.find_and_modify(! {'s.status': 'ready'},!

    sort=[('s.sub_status', -1), ('s.pri', -1), ('s.ts', 1)],! update={'$set': {'s.w': worker, 's.status': 'acquire'}},! new=True) message.s == {! pri: 10,! semaphores: ['foo'],! status: 'ready',! sub_status: 0,! w: '----------',! ...} Required semaphores # semaphores acquired message.s == {! pri: 10,! semaphores: ['foo'],! status: 'acquire! sub_status: 0,! w: worker,! ...}
  56. @rick446 @synappio Reserve a Message msg = db.message.find_and_modify(! {'s.status': 'ready'},!

    sort=[('s.sub_status', -1), ('s.pri', -1), ('s.ts', 1)],! update={'$set': {'s.w': worker, 's.status': 'acquire'}},! new=True) message.s == {! pri: 10,! semaphores: ['foo'],! status: 'ready',! sub_status: 0,! w: '----------',! ...} Required semaphores # semaphores acquired message.s == {! pri: 10,! semaphores: ['foo'],! status: 'acquire! sub_status: 0,! w: worker,! ...} Prefer partially-acquired messages
  57. @rick446 @synappio Acquire Resources def acquire_resources(msg):! for i, sem_id in

    enumerate(msg['s']['semaphores']):! if i < msg['sub_status']: # already acquired! continue! sem = db.semaphore.find_one({'_id': 'sem_id'})! if try_acquire_resource(sem_id, msg['_id'], sem['value']):! db.message.update(! {'_id': msg['_id']}, {'$set': {'s.sub_status': i}})! else:! return False! db.message.update(! {'_id': msg['_id']}, {'$set': {'s.status': 'busy'}})! return True
  58. @rick446 @synappio Acquire Resources def acquire_resources(msg):! for i, sem_id in

    enumerate(msg['s']['semaphores']):! if i < msg['sub_status']: # already acquired! continue! sem = db.semaphore.find_one({'_id': 'sem_id'})! if try_acquire_resource(sem_id, msg['_id'], sem['value']):! db.message.update(! {'_id': msg['_id']}, {'$set': {'s.sub_status': i}})! else:! return False! db.message.update(! {'_id': msg['_id']}, {'$set': {'s.status': 'busy'}})! return True Save forward progress
  59. @rick446 @synappio Acquire Resources def acquire_resources(msg):! for i, sem_id in

    enumerate(msg['s']['semaphores']):! if i < msg['sub_status']: # already acquired! continue! sem = db.semaphore.find_one({'_id': 'sem_id'})! if try_acquire_resource(sem_id, msg['_id'], sem['value']):! db.message.update(! {'_id': msg['_id']}, {'$set': {'s.sub_status': i}})! else:! return False! db.message.update(! {'_id': msg['_id']}, {'$set': {'s.status': 'busy'}})! return True Save forward progress Failure to acquire (already queued)
  60. @rick446 @synappio Acquire Resources def acquire_resources(msg):! for i, sem_id in

    enumerate(msg['s']['semaphores']):! if i < msg['sub_status']: # already acquired! continue! sem = db.semaphore.find_one({'_id': 'sem_id'})! if try_acquire_resource(sem_id, msg['_id'], sem['value']):! db.message.update(! {'_id': msg['_id']}, {'$set': {'s.sub_status': i}})! else:! return False! db.message.update(! {'_id': msg['_id']}, {'$set': {'s.status': 'busy'}})! return True Save forward progress Failure to acquire (already queued) Resources acquired, message ready to be processed
  61. @rick446 @synappio Acquire Resources def try_acquire_resource(sem_id, msg_id, sem_size):! '''Version 1

    (race condition)'''! if reserve(sem_id, msg_id, sem_size):! return True! else:! db.message.update(! {'_id': msg_id},! {'$set': {'s.status': 'queued'}})! return False
  62. @rick446 @synappio Acquire Resources def try_acquire_resource(sem_id, msg_id, sem_size):! '''Version 1

    (race condition)'''! if reserve(sem_id, msg_id, sem_size):! return True! else:! db.message.update(! {'_id': msg_id},! {'$set': {'s.status': 'queued'}})! return False Here be dragons!
  63. @rick446 @synappio Release Resources (v1) “magic” def make_dispatchable(msg_id):! '''Version 1

    (race condition)'''! db.message.update(! {'_id': msg_id, 's.status': 'queued'},! {'$set': {'s.status': 'ready'}})
  64. @rick446 @synappio Release Resources (v1) “magic” def make_dispatchable(msg_id):! '''Version 1

    (race condition)'''! db.message.update(! {'_id': msg_id, 's.status': 'queued'},! {'$set': {'s.status': 'ready'}}) But what if s.status == ‘acquire’?
  65. @rick446 @synappio Release Resources (v1) “magic” def make_dispatchable(msg_id):! '''Version 1

    (race condition)'''! db.message.update(! {'_id': msg_id, 's.status': 'queued'},! {'$set': {'s.status': 'ready'}}) But what if s.status == ‘acquire’?
  66. @rick446 @synappio Release Resources (v1) “magic” def make_dispatchable(msg_id):! '''Version 1

    (race condition)'''! db.message.update(! {'_id': msg_id, 's.status': 'queued'},! {'$set': {'s.status': 'ready'}}) But what if s.status == ‘acquire’? That’s the dragon.
  67. @rick446 @synappio Release Resources (v2) def make_dispatchable(msg_id):! res = db.message.update(!

    {'_id': msg_id, 's.status': 'acquire'},! {'$set': {'s.event': True}})! if not res['updatedExisting']:! db.message.update(! {'_id': msg_id, 's.status': 'queued'},! {'$set': {'s.status': 'ready'}})
  68. @rick446 @synappio Release Resources (v2) def make_dispatchable(msg_id):! res = db.message.update(!

    {'_id': msg_id, 's.status': 'acquire'},! {'$set': {'s.event': True}})! if not res['updatedExisting']:! db.message.update(! {'_id': msg_id, 's.status': 'queued'},! {'$set': {'s.status': 'ready'}}) Hey, something happened!
  69. @rick446 @synappio Acquire Resources (v2) def try_acquire_resource(sem_id, msg_id, sem_size):! '''Version

    2'''! while True:! db.message.update(! {'_id': msg_id}, {'$set': {'event': False}})! if reserve(sem_id, msg_id, sem_size):! return True! else:! res = db.message.update(! {'_id': msg_id, 's.event': False},! {'$set': {'s.status': 'queued'}})! if not res['updatedExisting']:! # Someone released this message; try again! continue! return False
  70. @rick446 @synappio Acquire Resources (v2) def try_acquire_resource(sem_id, msg_id, sem_size):! '''Version

    2'''! while True:! db.message.update(! {'_id': msg_id}, {'$set': {'event': False}})! if reserve(sem_id, msg_id, sem_size):! return True! else:! res = db.message.update(! {'_id': msg_id, 's.event': False},! {'$set': {'s.status': 'queued'}})! if not res['updatedExisting']:! # Someone released this message; try again! continue! return False Nothing’s happened yet!
  71. @rick446 @synappio Acquire Resources (v2) def try_acquire_resource(sem_id, msg_id, sem_size):! '''Version

    2'''! while True:! db.message.update(! {'_id': msg_id}, {'$set': {'event': False}})! if reserve(sem_id, msg_id, sem_size):! return True! else:! res = db.message.update(! {'_id': msg_id, 's.event': False},! {'$set': {'s.status': 'queued'}})! if not res['updatedExisting']:! # Someone released this message; try again! continue! return False Nothing’s happened yet! Check if something happened
  72. @rick446 @synappio One More Race…. def release(sem_id, msg_id, sem_size):! sem

    = db.semaphore.find_and_modify(! {'_id': sem_id},! update={'$pull': {! 'active': msg_id, ! 'queued': msg_id}},! new=True)! ! while len(sem['active']) < sem_size and sem['queued']:! wake_msg_ids = sem['queued'][:sem_size]! updated = self.cls.m.find_and_modify(! {'_id': sem_id},! update={'$pullAll': {'queued': wake_msg_ids}},! new=True)! for msgid in wake_msg_ids:! make_dispatchable(msgid)! sem = updated
  73. @rick446 @synappio One More Race…. def release(sem_id, msg_id, sem_size):! sem

    = db.semaphore.find_and_modify(! {'_id': sem_id},! update={'$pull': {! 'active': msg_id, ! 'queued': msg_id}},! new=True)! ! while len(sem['active']) < sem_size and sem['queued']:! wake_msg_ids = sem['queued'][:sem_size]! updated = self.cls.m.find_and_modify(! {'_id': sem_id},! update={'$pullAll': {'queued': wake_msg_ids}},! new=True)! for msgid in wake_msg_ids:! make_dispatchable(msgid)! sem = updated
  74. @rick446 @synappio Compensate! def fixup_queued_messages():! for msg in db.message.find({'s.status': 'queued'}):!

    sem_id = msg['semaphores'][msg['s']['sub_status']]! sem = db.semaphore.find_one(! {'_id': sem_id, 'queued': msg['_id']})! if sem is None:! db.message.m.update(! {'_id': msg['_id'], ! 's.status': 'queued', ! 's.sub_status': msg['sub_status']},! {'$set': {'s.status': 'ready'}})
  75. @rick446 @synappio Managing Latency • Reserving messages is expensive •

    Use Pub/Sub system instead • Publish to the channel whenever a message is ready to be handled • Each worker subscribes to the channel • Workers only ‘poll’ when they have a chance of getting work
  76. @rick446 @synappio Getting a Tailable Cursor def get_cursor(collection, topic_re, await_data=True):!

    options = { 'tailable': True }! if await_data:! options['await_data'] = True! cur = collection.find(! { 'k': topic_re },! **options)! cur = cur.hint([('$natural', 1)]) # ensure we don't use any indexes! return cur
  77. @rick446 @synappio Getting a Tailable Cursor def get_cursor(collection, topic_re, await_data=True):!

    options = { 'tailable': True }! if await_data:! options['await_data'] = True! cur = collection.find(! { 'k': topic_re },! **options)! cur = cur.hint([('$natural', 1)]) # ensure we don't use any indexes! return cur Make cursor tailable
  78. @rick446 @synappio Getting a Tailable Cursor def get_cursor(collection, topic_re, await_data=True):!

    options = { 'tailable': True }! if await_data:! options['await_data'] = True! cur = collection.find(! { 'k': topic_re },! **options)! cur = cur.hint([('$natural', 1)]) # ensure we don't use any indexes! return cur Holds open cursor for a while Make cursor tailable
  79. @rick446 @synappio Getting a Tailable Cursor def get_cursor(collection, topic_re, await_data=True):!

    options = { 'tailable': True }! if await_data:! options['await_data'] = True! cur = collection.find(! { 'k': topic_re },! **options)! cur = cur.hint([('$natural', 1)]) # ensure we don't use any indexes! return cur Holds open cursor for a while Make cursor tailable Don’t use indexes
  80. @rick446 @synappio Getting a Tailable Cursor def get_cursor(collection, topic_re, await_data=True):!

    options = { 'tailable': True }! if await_data:! options['await_data'] = True! cur = collection.find(! { 'k': topic_re },! **options)! cur = cur.hint([('$natural', 1)]) # ensure we don't use any indexes! return cur import re, time! while True:! cur = get_cursor(! db.capped_collection, ! re.compile('^foo'), ! await_data=True)! for msg in cur:! do_something(msg)! time.sleep(0.1) Holds open cursor for a while Make cursor tailable Don’t use indexes
  81. @rick446 @synappio Getting a Tailable Cursor def get_cursor(collection, topic_re, await_data=True):!

    options = { 'tailable': True }! if await_data:! options['await_data'] = True! cur = collection.find(! { 'k': topic_re },! **options)! cur = cur.hint([('$natural', 1)]) # ensure we don't use any indexes! return cur import re, time! while True:! cur = get_cursor(! db.capped_collection, ! re.compile('^foo'), ! await_data=True)! for msg in cur:! do_something(msg)! time.sleep(0.1) Holds open cursor for a while Make cursor tailable Don’t use indexes Still some polling when no producer, so don’t spin too fast
  82. @rick446 @synappio Building in retry... def get_cursor(collection, topic_re, last_id=-1, await_data=True):!

    options = { 'tailable': True }! spec = { ! 'id': { '$gt': last_id }, # only new messages! 'k': topic_re }! if await_data:! options['await_data'] = True! cur = collection.find(spec, **options)! cur = cur.hint([('$natural', 1)]) # ensure we don't use any indexes! return cur
  83. @rick446 @synappio Building in retry... def get_cursor(collection, topic_re, last_id=-1, await_data=True):!

    options = { 'tailable': True }! spec = { ! 'id': { '$gt': last_id }, # only new messages! 'k': topic_re }! if await_data:! options['await_data'] = True! cur = collection.find(spec, **options)! cur = cur.hint([('$natural', 1)]) # ensure we don't use any indexes! return cur Integer autoincrement “id”
  84. @rick446 @synappio Ludicrous Speed from pymongo.cursor import _QUERY_OPTIONS! ! def

    get_cursor(collection, topic_re, last_id=-1, await_data=True):! options = { 'tailable': True }! spec = { ! 'ts': { '$gt': last_id }, # only new messages! 'k': topic_re }! if await_data:! options['await_data'] = True! cur = collection.find(spec, **options)! cur = cur.hint([('$natural', 1)]) # ensure we don't use any indexes! if await:! cur = cur.add_option(_QUERY_OPTIONS['oplog_replay'])! return cur
  85. @rick446 @synappio Ludicrous Speed from pymongo.cursor import _QUERY_OPTIONS! ! def

    get_cursor(collection, topic_re, last_id=-1, await_data=True):! options = { 'tailable': True }! spec = { ! 'ts': { '$gt': last_id }, # only new messages! 'k': topic_re }! if await_data:! options['await_data'] = True! cur = collection.find(spec, **options)! cur = cur.hint([('$natural', 1)]) # ensure we don't use any indexes! if await:! cur = cur.add_option(_QUERY_OPTIONS['oplog_replay'])! return cur id ==> ts
  86. @rick446 @synappio Ludicrous Speed from pymongo.cursor import _QUERY_OPTIONS! ! def

    get_cursor(collection, topic_re, last_id=-1, await_data=True):! options = { 'tailable': True }! spec = { ! 'ts': { '$gt': last_id }, # only new messages! 'k': topic_re }! if await_data:! options['await_data'] = True! cur = collection.find(spec, **options)! cur = cur.hint([('$natural', 1)]) # ensure we don't use any indexes! if await:! cur = cur.add_option(_QUERY_OPTIONS['oplog_replay'])! return cur id ==> ts Co-opt the oplog_replay option
  87. @rick446 @synappio The Oplog • Capped collection that records all

    operations for replication • Includes a ‘ts’ field suitable for oplog_replay • Does not require a separate publish operation (all changes are automatically “published”)
  88. @rick446 @synappio Using the Oplog def oplog_await(oplog, spec):! '''Await the

    very next message on the oplog satisfying the spec'''! last = oplog.find_one(spec, sort=[('$natural', -1)])! if last is None:! return # Can't await unless there is an existing message satisfying spec! await_spec = dict(spec)! last_ts = last['ts']! await_spec['ts'] = {'$gt': bson.Timestamp(last_ts.time, last_ts.inc - 1)}! curs = oplog.find(await_spec, tailable=True, await_data=True)! curs = curs.hint([('$natural', 1)])! curs = curs.add_option(_QUERY_OPTIONS['oplog_replay'])! curs.next() # should always find 1 element! try:! return curs.next()! except StopIteration:! return None
  89. @rick446 @synappio Using the Oplog def oplog_await(oplog, spec):! '''Await the

    very next message on the oplog satisfying the spec'''! last = oplog.find_one(spec, sort=[('$natural', -1)])! if last is None:! return # Can't await unless there is an existing message satisfying spec! await_spec = dict(spec)! last_ts = last['ts']! await_spec['ts'] = {'$gt': bson.Timestamp(last_ts.time, last_ts.inc - 1)}! curs = oplog.find(await_spec, tailable=True, await_data=True)! curs = curs.hint([('$natural', 1)])! curs = curs.add_option(_QUERY_OPTIONS['oplog_replay'])! curs.next() # should always find 1 element! try:! return curs.next()! except StopIteration:! return None most recent oplog entry
  90. @rick446 @synappio Using the Oplog def oplog_await(oplog, spec):! '''Await the

    very next message on the oplog satisfying the spec'''! last = oplog.find_one(spec, sort=[('$natural', -1)])! if last is None:! return # Can't await unless there is an existing message satisfying spec! await_spec = dict(spec)! last_ts = last['ts']! await_spec['ts'] = {'$gt': bson.Timestamp(last_ts.time, last_ts.inc - 1)}! curs = oplog.find(await_spec, tailable=True, await_data=True)! curs = curs.hint([('$natural', 1)])! curs = curs.add_option(_QUERY_OPTIONS['oplog_replay'])! curs.next() # should always find 1 element! try:! return curs.next()! except StopIteration:! return None most recent oplog entry finds most recent plus following entries
  91. @rick446 @synappio Using the Oplog def oplog_await(oplog, spec):! '''Await the

    very next message on the oplog satisfying the spec'''! last = oplog.find_one(spec, sort=[('$natural', -1)])! if last is None:! return # Can't await unless there is an existing message satisfying spec! await_spec = dict(spec)! last_ts = last['ts']! await_spec['ts'] = {'$gt': bson.Timestamp(last_ts.time, last_ts.inc - 1)}! curs = oplog.find(await_spec, tailable=True, await_data=True)! curs = curs.hint([('$natural', 1)])! curs = curs.add_option(_QUERY_OPTIONS['oplog_replay'])! curs.next() # should always find 1 element! try:! return curs.next()! except StopIteration:! return None most recent oplog entry finds most recent plus following entries skip most recent
  92. @rick446 @synappio Using the Oplog def oplog_await(oplog, spec):! '''Await the

    very next message on the oplog satisfying the spec'''! last = oplog.find_one(spec, sort=[('$natural', -1)])! if last is None:! return # Can't await unless there is an existing message satisfying spec! await_spec = dict(spec)! last_ts = last['ts']! await_spec['ts'] = {'$gt': bson.Timestamp(last_ts.time, last_ts.inc - 1)}! curs = oplog.find(await_spec, tailable=True, await_data=True)! curs = curs.hint([('$natural', 1)])! curs = curs.add_option(_QUERY_OPTIONS['oplog_replay'])! curs.next() # should always find 1 element! try:! return curs.next()! except StopIteration:! return None most recent oplog entry finds most recent plus following entries skip most recent return on anything new
  93. @rick446 @synappio What We’ve Learned How to… Build a task

    queue in MongoDB Bring consistency to distributed systems (without transactions)
  94. @rick446 @synappio What We’ve Learned How to… Build a task

    queue in MongoDB Bring consistency to distributed systems (without transactions)
  95. @rick446 @synappio What We’ve Learned How to… Build a task

    queue in MongoDB Bring consistency to distributed systems (without transactions) Build low-latency reactive systems
  96. @rick446 @synappio Tips • findAndModify is ideal for queues •

    Atomic update + compensation brings consistency to your distributed system
  97. @rick446 @synappio Tips • findAndModify is ideal for queues •

    Atomic update + compensation brings consistency to your distributed system
  98. @rick446 @synappio Tips • findAndModify is ideal for queues •

    Atomic update + compensation brings consistency to your distributed system • Use the oplog to build reactive, low-latency systems