Slide 1

Slide 1 text

Chapman: Building a Distributed Job Queue in MongoDB Rick Copeland @rick446 [email protected] Friday, June 21, 13

Slide 2

Slide 2 text

Getting to Know One Another Friday, June 21, 13

Slide 3

Slide 3 text

Getting to Know One Another Rick Friday, June 21, 13

Slide 4

Slide 4 text

Getting to Know One Another Rick Friday, June 21, 13

Slide 5

Slide 5 text

Getting to Know One Another Rick Friday, June 21, 13

Slide 6

Slide 6 text

Roadmap Friday, June 21, 13

Slide 7

Slide 7 text

Roadmap • Define the problem Friday, June 21, 13

Slide 8

Slide 8 text

Roadmap • Define the problem • Schema design & operations Friday, June 21, 13

Slide 9

Slide 9 text

Roadmap • Define the problem • Schema design & operations • Types of tasks Friday, June 21, 13

Slide 10

Slide 10 text

Roadmap • Define the problem • Schema design & operations • Types of tasks • Reducing Polling Friday, June 21, 13

Slide 11

Slide 11 text

Requirements Digital Ocean Rackspace US Rackspace UK SMTP Server SMTP Server SMTP Server SMTP Server SMTP Server SMTP Server App Server Friday, June 21, 13

Slide 12

Slide 12 text

Requirements Group check_smtp Analyze Results Update Reports Pipeline Friday, June 21, 13

Slide 13

Slide 13 text

Requirements Friday, June 21, 13

Slide 14

Slide 14 text

Requirements MongoDB Friday, June 21, 13

Slide 15

Slide 15 text

Requirements MongoDB (of course) Friday, June 21, 13

Slide 16

Slide 16 text

Basic Ideas msg Chapman Insecure msg msg msg msg msg Task Task Worker Process Friday, June 21, 13

Slide 17

Slide 17 text

Job Queue Schema: Message { _id: ObjectId(...), task_id: ObjectId(...), slot: 'run', s: { status: 'ready', ts: ISODateTime(...), q: 'chapman', pri: 10, w: '----------', }, args: Binary(...), kwargs: Binary(...), send_args: Binary(...), send_kwargs: Binary(...) } Friday, June 21, 13

Slide 18

Slide 18 text

Job Queue Schema: Message { _id: ObjectId(...), task_id: ObjectId(...), slot: 'run', s: { status: 'ready', ts: ISODateTime(...), q: 'chapman', pri: 10, w: '----------', }, args: Binary(...), kwargs: Binary(...), send_args: Binary(...), send_kwargs: Binary(...) } Destination Task Friday, June 21, 13

Slide 19

Slide 19 text

Job Queue Schema: Message { _id: ObjectId(...), task_id: ObjectId(...), slot: 'run', s: { status: 'ready', ts: ISODateTime(...), q: 'chapman', pri: 10, w: '----------', }, args: Binary(...), kwargs: Binary(...), send_args: Binary(...), send_kwargs: Binary(...) } Task method to be run Destination Task Friday, June 21, 13

Slide 20

Slide 20 text

Job Queue Schema: Message { _id: ObjectId(...), task_id: ObjectId(...), slot: 'run', s: { status: 'ready', ts: ISODateTime(...), q: 'chapman', pri: 10, w: '----------', }, args: Binary(...), kwargs: Binary(...), send_args: Binary(...), send_kwargs: Binary(...) } Task method to be run Destination Task Scheduling / Synchronization Friday, June 21, 13

Slide 21

Slide 21 text

Job Queue Schema: Message { _id: ObjectId(...), task_id: ObjectId(...), slot: 'run', s: { status: 'ready', ts: ISODateTime(...), q: 'chapman', pri: 10, w: '----------', }, args: Binary(...), kwargs: Binary(...), send_args: Binary(...), send_kwargs: Binary(...) } Task method to be run Destination Task Scheduling / Synchronization Message arguments Friday, June 21, 13

Slide 22

Slide 22 text

Job Queue Schema: TaskState { _id: ObjectId(...), type: 'Group', parent_id: ObjectId(...), on_complete: ObjectId(...), mq: [ObjectId(...), ...], status: 'pending', options: { queue: 'chapman', priority: 10, immutable: false, ignore_result: true, } result: Binary(...), data: {...} } Friday, June 21, 13

Slide 23

Slide 23 text

Job Queue Schema: TaskState { _id: ObjectId(...), type: 'Group', parent_id: ObjectId(...), on_complete: ObjectId(...), mq: [ObjectId(...), ...], status: 'pending', options: { queue: 'chapman', priority: 10, immutable: false, ignore_result: true, } result: Binary(...), data: {...} } Python class registered for task Friday, June 21, 13

Slide 24

Slide 24 text

Job Queue Schema: TaskState { _id: ObjectId(...), type: 'Group', parent_id: ObjectId(...), on_complete: ObjectId(...), mq: [ObjectId(...), ...], status: 'pending', options: { queue: 'chapman', priority: 10, immutable: false, ignore_result: true, } result: Binary(...), data: {...} } Python class registered for task Parent task (if any) Friday, June 21, 13

Slide 25

Slide 25 text

Job Queue Schema: TaskState { _id: ObjectId(...), type: 'Group', parent_id: ObjectId(...), on_complete: ObjectId(...), mq: [ObjectId(...), ...], status: 'pending', options: { queue: 'chapman', priority: 10, immutable: false, ignore_result: true, } result: Binary(...), data: {...} } Python class registered for task Parent task (if any) Message to be sent on completion Friday, June 21, 13

Slide 26

Slide 26 text

Job Queue Schema: TaskState { _id: ObjectId(...), type: 'Group', parent_id: ObjectId(...), on_complete: ObjectId(...), mq: [ObjectId(...), ...], status: 'pending', options: { queue: 'chapman', priority: 10, immutable: false, ignore_result: true, } result: Binary(...), data: {...} } Python class registered for task Parent task (if any) Message to be sent on completion Enqueue messages on task Friday, June 21, 13

Slide 27

Slide 27 text

Message State: Naive Approach Friday, June 21, 13

Slide 28

Slide 28 text

Message State: Naive Approach Reserve Message Friday, June 21, 13

Slide 29

Slide 29 text

Message State: Naive Approach Reserve Message Try to lock task Friday, June 21, 13

Slide 30

Slide 30 text

Message State: Naive Approach Reserve Message Try to lock task Un- reserve Message Friday, June 21, 13

Slide 31

Slide 31 text

Message State: Naive Approach Reserve Message Try to lock task Un- reserve Message Reserve Message Try to lock task Un- reserve Message Friday, June 21, 13

Slide 32

Slide 32 text

Message State: Naive Approach Reserve Message Try to lock task Un- reserve Message Reserve Message Try to lock task Un- reserve Message Reserve Message Try to lock task Un- reserve Message Friday, June 21, 13

Slide 33

Slide 33 text

Message State: Naive Approach Reserve Message Try to lock task Un- reserve Message Reserve Message Try to lock task Un- reserve Message Reserve Message Try to lock task Un- reserve Message Reserve Message Try to lock task Un- reserve Message Friday, June 21, 13

Slide 34

Slide 34 text

Message State: Naive Approach Reserve Message Try to lock task Un- reserve Message Reserve Message Try to lock task Un- reserve Message Reserve Message Try to lock task Un- reserve Message Reserve Message Try to lock task Un- reserve Message Reserve Message Try to lock task Un- reserve Message Reserve Message Try to lock task Un- reserve Message Reserve Message Try to lock task Un- reserve Message Friday, June 21, 13

Slide 35

Slide 35 text

Message State: Naive Approach Reserve Message Try to lock task Un- reserve Message Reserve Message Try to lock task Un- reserve Message Reserve Message Try to lock task Un- reserve Message Reserve Message Try to lock task Un- reserve Message Reserve Message Try to lock task Un- reserve Message Reserve Message Try to lock task Un- reserve Message Reserve Message Try to lock task Un- reserve Message Reserve Message Try to lock task Un- reserve Message Reserve Message Try to lock task Un- reserve Message Reserve Message Try to lock task Un- reserve Message Friday, June 21, 13

Slide 36

Slide 36 text

Message State: Naive Approach Reserve Message Try to lock task Un- reserve Message Reserve Message Try to lock task Un- reserve Message Reserve Message Try to lock task Un- reserve Message Reserve Message Try to lock task Un- reserve Message Reserve Message Try to lock task Un- reserve Message Reserve Message Try to lock task Un- reserve Message Reserve Message Try to lock task Un- reserve Message Reserve Message Try to lock task Un- reserve Message Reserve Message Try to lock task Un- reserve Message Reserve Message Try to lock task Un- reserve Message Reserve Message Try to lock task Un- reserve Message Reserve Message Try to lock task Un- reserve Message Reserve Message Try to lock task Un- reserve Message Friday, June 21, 13

Slide 37

Slide 37 text

Message States: Reserve Message • findAndModify (‘ready’) • s.state => ‘q1’ • s.w => worker_id • $push _id onto task’s mq field • If msg is first in mq, s.state => ‘busy’ • Start processing • Otherwise, s.state => ‘q2’ ready q1 busy q2 next pending Friday, June 21, 13

Slide 38

Slide 38 text

Message States: Reserve Message • findAndModify (‘next’) • s.state => ‘busy’ • s.w => worker_id • start processing ready q1 busy q2 next pending Friday, June 21, 13

Slide 39

Slide 39 text

Message States: Retire Message • findAndModify TaskState • $pull message _id from ‘mq’ • findAndModify new first message in ‘mq’ if its s.state is in [‘q1’, ‘q2’] • s.state => ‘next’ ready q1 busy q2 next pending Friday, June 21, 13

Slide 40

Slide 40 text

Task States • States mainly advisory • success,failure transitions trigger on_complete message • ‘chained’ is a tail- call optimization pending active chained failure success Friday, June 21, 13

Slide 41

Slide 41 text

Handling Problems • Determine “live” worker ids • Find all messages in busy or q1 for those workers and make them “ready” Friday, June 21, 13

Slide 42

Slide 42 text

Basic Tasks: FunctionTask • Simplest task: run a function to completion, set the result to the return value • If a ‘Suspend’ exception is raised, move the task to ‘chained’ status • Other exceptions set task to ‘failure’, save traceback & exception @task(ignore_result=True, priority=50) def track(event, user_id=None, **kwargs): log.info('track(%s, %s...)', event, user_id) # ... Friday, June 21, 13

Slide 43

Slide 43 text

Digression: Task Chaining • Task state set to ‘chained’ • New “Chain” task is created that will • Call the “next” task • When the “next” task completes, also complete the “current” task @task(ignore_result=True, priority=50) def function_task(*args, **kwargs): # ... Chain.call(some_other_task) Friday, June 21, 13

Slide 44

Slide 44 text

Composite Tasks • on_complete message for each subtask with slot=retire_subtask, specifying subtask position & the result of the subtask • Different composite tasks implement ‘run’ and ‘retire_subtask’ differently task_state.update( { '_id': subtask_id }, { $set: { 'parent_id': composite_id, 'data.composite_position': position, 'options.ignore_result': false }} ) Friday, June 21, 13

Slide 45

Slide 45 text

Composite Task: Pipeline • Run • Send a ‘run’ message to the subtask with position=0 • Retire_subtask(position, result) • Send a ‘run’ message with the previous result to the subtask with position = (position+1), OR retire the Pipeline if no more tasks Friday, June 21, 13

Slide 46

Slide 46 text

Composite Task: Group/Barrier • Run • Send a ‘run’ message to all subtasks • Retire_subtask(position, result) • Decrement the num_waiting counter • If num_waiting is 0, retire the group • Collect subtask results (Group), complete group, delete subtasks Friday, June 21, 13

Slide 47

Slide 47 text

Reducing Polling • Reserving messages is expensive • Use Pub/Sub system instead • Publish to the channel whenever a message is ready to be handled • Each worker subscribes to the channel • Workers only ‘poll’ when they have a chance of getting work Friday, June 21, 13

Slide 48

Slide 48 text

Pub/Sub for MongoDB Capped Collection • Fixed size • Fast inserts • “Tailable” cursors Tailable Cursor Friday, June 21, 13

Slide 49

Slide 49 text

Pub/Sub for MongoDB Capped Collection • Fixed size • Fast inserts • “Tailable” cursors Tailable Cursor Friday, June 21, 13

Slide 50

Slide 50 text

Pub/Sub for MongoDB Capped Collection • Fixed size • Fast inserts • “Tailable” cursors Tailable Cursor Friday, June 21, 13

Slide 51

Slide 51 text

Pub/Sub for MongoDB Capped Collection • Fixed size • Fast inserts • “Tailable” cursors Tailable Cursor Friday, June 21, 13

Slide 52

Slide 52 text

Pub/Sub for MongoDB Capped Collection • Fixed size • Fast inserts • “Tailable” cursors Tailable Cursor Friday, June 21, 13

Slide 53

Slide 53 text

Pub/Sub for MongoDB Capped Collection • Fixed size • Fast inserts • “Tailable” cursors Tailable Cursor Friday, June 21, 13

Slide 54

Slide 54 text

Getting a Tailable Cursor def get_cursor(collection, topic_re, await_data=True): options = { 'tailable': True } if await_data: options['await_data'] = True cur = collection.find( { 'k': topic_re }, **options) cur = cur.hint([('$natural', 1)]) # ensure we don't use any indexes return cur Friday, June 21, 13

Slide 55

Slide 55 text

Getting a Tailable Cursor def get_cursor(collection, topic_re, await_data=True): options = { 'tailable': True } if await_data: options['await_data'] = True cur = collection.find( { 'k': topic_re }, **options) cur = cur.hint([('$natural', 1)]) # ensure we don't use any indexes return cur Make cursor tailable Friday, June 21, 13

Slide 56

Slide 56 text

Getting a Tailable Cursor def get_cursor(collection, topic_re, await_data=True): options = { 'tailable': True } if await_data: options['await_data'] = True cur = collection.find( { 'k': topic_re }, **options) cur = cur.hint([('$natural', 1)]) # ensure we don't use any indexes return cur Holds open cursor for a while Make cursor tailable Friday, June 21, 13

Slide 57

Slide 57 text

Getting a Tailable Cursor def get_cursor(collection, topic_re, await_data=True): options = { 'tailable': True } if await_data: options['await_data'] = True cur = collection.find( { 'k': topic_re }, **options) cur = cur.hint([('$natural', 1)]) # ensure we don't use any indexes return cur Holds open cursor for a while Make cursor tailable Don’t use indexes Friday, June 21, 13

Slide 58

Slide 58 text

Getting a Tailable Cursor def get_cursor(collection, topic_re, await_data=True): options = { 'tailable': True } if await_data: options['await_data'] = True cur = collection.find( { 'k': topic_re }, **options) cur = cur.hint([('$natural', 1)]) # ensure we don't use any indexes return cur import re, time while True: cur = get_cursor( db.capped_collection, re.compile('^foo'), await_data=True) for msg in cur: do_something(msg) time.sleep(0.1) Holds open cursor for a while Make cursor tailable Don’t use indexes Friday, June 21, 13

Slide 59

Slide 59 text

Getting a Tailable Cursor def get_cursor(collection, topic_re, await_data=True): options = { 'tailable': True } if await_data: options['await_data'] = True cur = collection.find( { 'k': topic_re }, **options) cur = cur.hint([('$natural', 1)]) # ensure we don't use any indexes return cur import re, time while True: cur = get_cursor( db.capped_collection, re.compile('^foo'), await_data=True) for msg in cur: do_something(msg) time.sleep(0.1) Holds open cursor for a while Make cursor tailable Don’t use indexes Still some polling when no producer, so don’t spin too fast Friday, June 21, 13

Slide 60

Slide 60 text

Building in retry... def get_cursor(collection, topic_re, last_id=-1, await_data=True): options = { 'tailable': True } spec = { 'id': { '$gt': last_id }, # only new messages 'k': topic_re } if await_data: options['await_data'] = True cur = collection.find(spec, **options) cur = cur.hint([('$natural', 1)]) # ensure we don't use any indexes return cur Friday, June 21, 13

Slide 61

Slide 61 text

Building in retry... def get_cursor(collection, topic_re, last_id=-1, await_data=True): options = { 'tailable': True } spec = { 'id': { '$gt': last_id }, # only new messages 'k': topic_re } if await_data: options['await_data'] = True cur = collection.find(spec, **options) cur = cur.hint([('$natural', 1)]) # ensure we don't use any indexes return cur Integer autoincrement “id” Friday, June 21, 13

Slide 62

Slide 62 text

Building auto-increment class Sequence(object): ... def next(self, sname, inc=1): doc = self._db[self._name].find_and_modify( query={'_id': sname}, update={'$inc': { 'value': inc } }, upsert=True, new=True) return doc['value'] Friday, June 21, 13

Slide 63

Slide 63 text

Building auto-increment class Sequence(object): ... def next(self, sname, inc=1): doc = self._db[self._name].find_and_modify( query={'_id': sname}, update={'$inc': { 'value': inc } }, upsert=True, new=True) return doc['value'] Atomically $inc the dedicated document Friday, June 21, 13

Slide 64

Slide 64 text

Ludicrous Speed from pymongo.cursor import _QUERY_OPTIONS def get_cursor(collection, topic_re, last_id=-1, await_data=True): options = { 'tailable': True } spec = { 'ts': { '$gt': last_id }, # only new messages 'k': topic_re } if await_data: options['await_data'] = True cur = collection.find(spec, **options) cur = cur.hint([('$natural', 1)]) # ensure we don't use any indexes if await: cur = cur.add_option(_QUERY_OPTIONS['oplog_replay']) return cur Friday, June 21, 13

Slide 65

Slide 65 text

Ludicrous Speed from pymongo.cursor import _QUERY_OPTIONS def get_cursor(collection, topic_re, last_id=-1, await_data=True): options = { 'tailable': True } spec = { 'ts': { '$gt': last_id }, # only new messages 'k': topic_re } if await_data: options['await_data'] = True cur = collection.find(spec, **options) cur = cur.hint([('$natural', 1)]) # ensure we don't use any indexes if await: cur = cur.add_option(_QUERY_OPTIONS['oplog_replay']) return cur id ==> ts Friday, June 21, 13

Slide 66

Slide 66 text

Ludicrous Speed from pymongo.cursor import _QUERY_OPTIONS def get_cursor(collection, topic_re, last_id=-1, await_data=True): options = { 'tailable': True } spec = { 'ts': { '$gt': last_id }, # only new messages 'k': topic_re } if await_data: options['await_data'] = True cur = collection.find(spec, **options) cur = cur.hint([('$natural', 1)]) # ensure we don't use any indexes if await: cur = cur.add_option(_QUERY_OPTIONS['oplog_replay']) return cur id ==> ts Co-opt the oplog_replay option Friday, June 21, 13

Slide 67

Slide 67 text

Performance Friday, June 21, 13

Slide 68

Slide 68 text

Performance Friday, June 21, 13

Slide 69

Slide 69 text

Questions? Rick Copeland [email protected] @rick446 Friday, June 21, 13