Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Asynchronous working with Python/Django

Asynchronous working with Python/Django

Show the concept behind background processing by using Pub/Sub, AMQP or similar then introduce Celery.

Martin Alderete

November 10, 2015
Tweet

More Decks by Martin Alderete

Other Decks in Programming

Transcript

  1. Introduction to the issue “Summary” Coupled to the Request- Response

    cycle Can not move it to another host Prone to be a “bottleneck” Error handling is not DRY send_registration_email(user) its blocking It does not scale! The “heavy” work MUST be outside of the Request-Response cycle. Heavy: “Everything which could add an unnecessary delay, some overhead or it is not needed immediately”
  2. Intro to message Brokers (DS) Advantages Contains “FIFO-like” Queues Looks

    like a DICT (key-value) Allows to decouple the system Allows to distribute the system Allows the communication between technologies Allows to scale in a “more” natural way disadvantages Now System == Distributed System Adds more complexity to the stack Needs more maintenance
  3. Hardcore way! Summary... Simple and complex Independent from the Producer

    framework Decouple the system Move the heavy work to other place Absolute control from devs All should be done “from scratch” Stick to a single broker (Redis) Limited scalability Lots of code should be “re-written” Monitoring? Administration ?
  4. Proposed solutions: Celery Celery is an asynchronous task queue/job queue

    based on distributed message passing. It is focused on real-time operation, but supports scheduling as well. The execution units, called tasks, are executed concurrently on a single or more worker servers.
  5. Celery pip install celery Celery = Python + Broker +

    Batteries included! www.amqp.org
  6. Celery Application Entry point of everything related to Celery. Create

    a single (aka “app”). http://docs.celeryproject.org/en/latest/django/first-steps-with-django.html
  7. Tasks The base of every Celery application. Have 2 responsibilities:

    Define what happen when a task is called. Define what to do when a worker receive the a task. Every task has a name. Basically callables objects with “magic”. By convention are placed in tasks.py Created using a decorador: @shared_task (from celery import shared_task)
  8. Tasks They have many attributes which allow to define how

    the task behave, for example: Task.name Task.bind Task.queue Task.max_retries Task.default_retry_delay Task.rate_limit Task.time_limit Task.soft_time_limit Task.ignore_result several more… http://celery.readthedocs.org/en/latest/reference/celery.app.task.html
  9. Routing Routing, mechanism by which we can decide which Queue

    should receive the message of a new task toward a worker. RECOMMENDED instead of hardcoded ‘queue’ on the TASK!!!
  10. Tasks: Calling 2 ways: Using a shortcut and options defined

    at the moment a task is created (@shared_task, @task). Task.delay(arg1, kwarg1=value1) Using the “long” way, It allows to customize a task call modifying the default options. Task.apply_async(args=l, kwargs=d, **options) http://docs.celeryproject.org/en/latest/reference/celery.app.task.html#celery.app.task.Task.apply_async
  11. Workers: Consuming tasks celery -A projName worker --loglevel=info celery -A

    projName worker --concurrency=10 celery -A proj worker -P eventlet -c 1000 celery -A projName worker --autoscale=10,3 celery worker --help http://docs.celeryproject.org/en/latest/userguide/workers.html
  12. AsyncResult Celery provide us (if possible) an AsyncResult (a future)

    with the result of a task. To do this possible Celery uses a backend where it stores the result of the tasks. The backend is configured by CELERY_RESULT_BACKEND There are few available backends: cache (memcached), mongodb, redis, amqp, etc Each backend has its configuration. http://celery.readthedocs.org/en/latest/configuration.html#celery-result-backend
  13. Celery Canvas Celery provides mechanisms to group, chain, add callbacks,

    as well as process chunks. For this purpose Celery uses something called PRIMITIVES. group: Executes task in parallel.. chain: Links tasks, add callback ( f(g(a)) ). chord: A group plus a callback (Barrier). map: Similar to Python map(). chunks: Separates a list of elements in small parts. http://docs.celeryproject.org/en/latest/userguide/canvas.html
  14. Monitoring: Celery Flower Real time monitoring: Progress and historical. Details

    about the tasks. Graphs and stats. Remote Control: Status and stats of the workers. Shutdown or reboot workers. Control autoscaling and pool size. See tasks execution status tareas. Queues administrations. ETC… pip install flower celery -A projName flower --port=5555
  15. Thoughts and conclusions Messages Brokers: Came to stay Allow systems

    with flexible architectures Allow communication between technologies AMQP is a good protocol (www.amqp.org). Distributed Systems: Are complex but scalable Add complexity to the stack Allow to distribute work loads Require maintenance/monitoring Harder to debug (more when multi-worker) More services but smaller (micro-services)
  16. Thoughts and conclusions Celery: Is the framework for distributed systems

    Is the framework that each Pythonista should test when play with DS. Is a mature projects with good support. Has a good documentation. Is simple to configure and run. Is a WORLD to learn and understand in deep. Could be extended “easily”. (signals, management commands, remotes). Has LOTS of settings and features Should be monitored as a normal service. Something that I do not know… =)