Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Celery - A Distributed Task Queue

Duy Do
July 22, 2015

Celery - A Distributed Task Queue

The slide from Celery workshop at Sentifi

Duy Do

July 22, 2015
Tweet

More Decks by Duy Do

Other Decks in Programming

Transcript

  1. Outline 1. About 2. What is Celery? 3. Celery Architecture

    4. Broker, Task, Worker 5. Monitoring 6. Coding 7. Q & A 2
  2. About A father, a husband and a software engineer Passionate

    in distributed systems, real-time data processing, search engine Work @sentifi as a backend engineer Follow me @duydo 3
  3. What is Celery? Distributed Task Queue Simple, fast, flexible, highly

    available, scalable Mature, feature rich Open source, BSD License Large community 4
  4. What is Task Queue? Task Queue is a system for

    parallel execution of tasks 5 Client Worker Broker send tasks distribute tasks Worker distribute tasks
  5. Celery Architecture 6 Client 1 Task Queue 2 … Task

    Queue N Task Queue 1 Broker Client 2 Worker 1 Worker 2 Task Result Storage distribute tasks distribute tasks send tasks send tasks store task results store task results get task result get task result
  6. Broker The middle man holds the tasks (messages) Celery supports:

    • RabbitMQ, Redis • MongoDB, CouchDB • ZeroMQ, Amazon SQS, IronMQ 7
  7. Task Task is a unit of work, building blocks in

    Celery apps Exists until it has been acknowledged Result of the tasks can be stored or ignored States: PENDING, STARTED, SUCCESS, FAILURE, RETRY, REVOKED Periodic task (cron jobs) 8
  8. Define Tasks #  function  style   @app.task
 def  add(x,  y):


           return  x  *  y   #  class  style   class  AddTask(app.Task):
        def  run(self,  x,  y):
                return  x  +  y 9
  9. Calling Tasks apply_async(args[,  kwargs[,  …]]) delay(*args,  **kwargs) calling(__call__)   e.g:

    • result  =  add.delay(1,  2) • result  =  add.apply_async((1,  2),   countdown=10) 10
  10. Calling Task Options eta a specific date time that is

    the earliest time at which task will be executed countdown set eta by seconds into the future expires set task’s expire time serializer pickle (default), json, yaml and msgpack compression compress the messages using gzip or bzip2 queue route the tasks to different queues 11
  11. Task Result result.ready() true if the task has been executed

    result.successful() true if the task executed successfully result.result the return value of the task or exception result.get() blocks until the task is complete, return result or exception 12
  12. Signatures signature() wraps args, kwargs, options of a single task

    invocation in a way such that it can be: • passed to functions • serialized and sent across the wire like subtasks 14
  13. Create Signatures #  ws.tasks.add(1,  2)
 
 s  =  signature('ws.tasks.add',  args=(1,

     2),  countdown=10)   s  =  add.subtask((1,  2),  countdown=10)   s  =  add.s(1,  2)   s  =  add.s(1,  2,  debug=True)
 #  inspect  fields
 s.args    #  (1,  2)
 s.kwargs    #  {'debug':  True')
 s.options    #  {countdown=10}   #  execute  as  task   s.delay()   s.apply_async()   s() 15
  14. Partial Signatures 16 Specifying additional args, kwargs or options to

    apply_async/delay to create partial • partial  =  add.s(1)   • partial.delay(2)  #  1  +  2   • partial.apply_async((2,))  #  1  +  2
  15. Immutable Signatures 17 A signature can only be set with

    options Using si() to create immutable signature • add.si(1,  2)
  16. Callbacks Signatures 18 Use the link arg of apply_sync to

    add callbacks add.apply_async((1,  2),  link=add.s(3))
  17. Group 19 A signature takes a list of tasks should

    be applied in parallel s  =  group(add.s(i,  i)  for  i  in  xrange(5))   s().get()  =>  [0,  2,  4,  6,  8]
  18. Chain 20 Chain of callbacks, think pipeline c  =  chain(add.s(1,

     2),  add.s(3),  add.s(4))   c  =  chain(add.s(1,  2)  |  add.s(3)  |  add.s(4))   c().get()  =>  ((1  +  2)  +  3)  +  4
  19. Chord 21 Like a group but with a callback c

     =  chord((add.s(i,  i)  for  i  in  xrange(5)),   xsum.s())   c  =  chord(add.s(i,  i)  for  i  in  xrange(5)) (xsum.s())   c().get()  =>  20
  20. Map 22 Like built-in map function c  =  task.map([1,  2,

     3])   c()  =>  [task(1),  task(2),  task(3)]
  21. Starmap 23 Same map except the args are applied as

    *args c  =  add.map([(1,  2),  (3,  4)])   c()  =>  [add(1,  2),  add(3,  4)]
  22. Chunks 24 Chunking splits a long list of args to

    parts items  =  zip(xrange(10),  xrange(10))   c  =  add.chunks(items,  5)   c()  =>  [0,  2,  4,  6,  8],  [10,  12,  14,  16,  18]
  23. Worker Auto reloading Auto scaling Time & Rate Limits Resource

    Leak Protection Scheduling User Components 25
  24. Autoscaling Dynamically resizing the worker pool depending on load or

    custom metrics defined by user celery  worker  —autoscale=8,2   =>  min  processes:  2,  max  processes:8 27
  25. Resource Leak Protection Limit number of tasks a pool worker

    process can execute before it’s replaced by a new one celery  worker  —maxtaskperchild=10 29
  26. Scheduling Specify the time to run a task in seconds,

    date time periodic tasks (interval, crontab expressions) 30
  27. User Components Celery uses a dependency graph enabling fire grained

    control of the workers internally, called “bootsteps” Customize the worker components, e.g: ConsumerStep Add new components Bootsteps http://celery.readthedocs.org/en/latest/ userguide/extending.html 31
  28. Monitoring Flower - Real-time Celery web monitor • Task progress

    and history • Show task details (arguments, start time, runtime, and more) • Graphs and statistics • Shutdown, restart worker instances • Control worker pool size, autoscaling settings • … 32