Slide 1

Slide 1 text

Celery - A Distributed Task Queue Duy Do (@duydo) 1

Slide 2

Slide 2 text

Outline 1. About 2. What is Celery? 3. Celery Architecture 4. Broker, Task, Worker 5. Monitoring 6. Coding 7. Q & A 2

Slide 3

Slide 3 text

About A father, a husband and a software engineer Passionate in distributed systems, real-time data processing, search engine Work @sentifi as a backend engineer Follow me @duydo 3

Slide 4

Slide 4 text

What is Celery? Distributed Task Queue Simple, fast, flexible, highly available, scalable Mature, feature rich Open source, BSD License Large community 4

Slide 5

Slide 5 text

What is Task Queue? Task Queue is a system for parallel execution of tasks 5 Client Worker Broker send tasks distribute tasks Worker distribute tasks

Slide 6

Slide 6 text

Celery Architecture 6 Client 1 Task Queue 2 … Task Queue N Task Queue 1 Broker Client 2 Worker 1 Worker 2 Task Result Storage distribute tasks distribute tasks send tasks send tasks store task results store task results get task result get task result

Slide 7

Slide 7 text

Broker The middle man holds the tasks (messages) Celery supports: • RabbitMQ, Redis • MongoDB, CouchDB • ZeroMQ, Amazon SQS, IronMQ 7

Slide 8

Slide 8 text

Task Task is a unit of work, building blocks in Celery apps Exists until it has been acknowledged Result of the tasks can be stored or ignored States: PENDING, STARTED, SUCCESS, FAILURE, RETRY, REVOKED Periodic task (cron jobs) 8

Slide 9

Slide 9 text

Define Tasks #  function  style   @app.task
 def  add(x,  y):
        return  x  *  y   #  class  style   class  AddTask(app.Task):
        def  run(self,  x,  y):
                return  x  +  y 9

Slide 10

Slide 10 text

Calling Tasks apply_async(args[,  kwargs[,  …]]) delay(*args,  **kwargs) calling(__call__)   e.g: • result  =  add.delay(1,  2) • result  =  add.apply_async((1,  2),   countdown=10) 10

Slide 11

Slide 11 text

Calling Task Options eta a specific date time that is the earliest time at which task will be executed countdown set eta by seconds into the future expires set task’s expire time serializer pickle (default), json, yaml and msgpack compression compress the messages using gzip or bzip2 queue route the tasks to different queues 11

Slide 12

Slide 12 text

Task Result result.ready() true if the task has been executed result.successful() true if the task executed successfully result.result the return value of the task or exception result.get() blocks until the task is complete, return result or exception 12

Slide 13

Slide 13 text

Tasks Workflows Signatures: Partials, Immutability, Callbacks The Primitives: Chains, Groups, Chords, Map & Starmap, Chunks 13

Slide 14

Slide 14 text

Signatures signature() wraps args, kwargs, options of a single task invocation in a way such that it can be: • passed to functions • serialized and sent across the wire like subtasks 14

Slide 15

Slide 15 text

Create Signatures #  ws.tasks.add(1,  2)
 
 s  =  signature('ws.tasks.add',  args=(1,  2),  countdown=10)   s  =  add.subtask((1,  2),  countdown=10)   s  =  add.s(1,  2)   s  =  add.s(1,  2,  debug=True)
 #  inspect  fields
 s.args    #  (1,  2)
 s.kwargs    #  {'debug':  True')
 s.options    #  {countdown=10}   #  execute  as  task   s.delay()   s.apply_async()   s() 15

Slide 16

Slide 16 text

Partial Signatures 16 Specifying additional args, kwargs or options to apply_async/delay to create partial • partial  =  add.s(1)   • partial.delay(2)  #  1  +  2   • partial.apply_async((2,))  #  1  +  2

Slide 17

Slide 17 text

Immutable Signatures 17 A signature can only be set with options Using si() to create immutable signature • add.si(1,  2)

Slide 18

Slide 18 text

Callbacks Signatures 18 Use the link arg of apply_sync to add callbacks add.apply_async((1,  2),  link=add.s(3))

Slide 19

Slide 19 text

Group 19 A signature takes a list of tasks should be applied in parallel s  =  group(add.s(i,  i)  for  i  in  xrange(5))   s().get()  =>  [0,  2,  4,  6,  8]

Slide 20

Slide 20 text

Chain 20 Chain of callbacks, think pipeline c  =  chain(add.s(1,  2),  add.s(3),  add.s(4))   c  =  chain(add.s(1,  2)  |  add.s(3)  |  add.s(4))   c().get()  =>  ((1  +  2)  +  3)  +  4

Slide 21

Slide 21 text

Chord 21 Like a group but with a callback c  =  chord((add.s(i,  i)  for  i  in  xrange(5)),   xsum.s())   c  =  chord(add.s(i,  i)  for  i  in  xrange(5)) (xsum.s())   c().get()  =>  20

Slide 22

Slide 22 text

Map 22 Like built-in map function c  =  task.map([1,  2,  3])   c()  =>  [task(1),  task(2),  task(3)]

Slide 23

Slide 23 text

Starmap 23 Same map except the args are applied as *args c  =  add.map([(1,  2),  (3,  4)])   c()  =>  [add(1,  2),  add(3,  4)]

Slide 24

Slide 24 text

Chunks 24 Chunking splits a long list of args to parts items  =  zip(xrange(10),  xrange(10))   c  =  add.chunks(items,  5)   c()  =>  [0,  2,  4,  6,  8],  [10,  12,  14,  16,  18]

Slide 25

Slide 25 text

Worker Auto reloading Auto scaling Time & Rate Limits Resource Leak Protection Scheduling User Components 25

Slide 26

Slide 26 text

Autoloading Automatically reloading the worker source code as it changes celery  worker  —autoreload 26

Slide 27

Slide 27 text

Autoscaling Dynamically resizing the worker pool depending on load or custom metrics defined by user celery  worker  —autoscale=8,2   =>  min  processes:  2,  max  processes:8 27

Slide 28

Slide 28 text

Time & Rate Limits number of tasks per second/minute/hour how long a task can be allowed to run 28

Slide 29

Slide 29 text

Resource Leak Protection Limit number of tasks a pool worker process can execute before it’s replaced by a new one celery  worker  —maxtaskperchild=10 29

Slide 30

Slide 30 text

Scheduling Specify the time to run a task in seconds, date time periodic tasks (interval, crontab expressions) 30

Slide 31

Slide 31 text

User Components Celery uses a dependency graph enabling fire grained control of the workers internally, called “bootsteps” Customize the worker components, e.g: ConsumerStep Add new components Bootsteps http://celery.readthedocs.org/en/latest/ userguide/extending.html 31

Slide 32

Slide 32 text

Monitoring Flower - Real-time Celery web monitor • Task progress and history • Show task details (arguments, start time, runtime, and more) • Graphs and statistics • Shutdown, restart worker instances • Control worker pool size, autoscaling settings • … 32

Slide 33

Slide 33 text

Coding… Get your hand dirty… 33

Slide 34

Slide 34 text

–Duy Do (@duydo) Thank you 34