Slide 1

Slide 1 text

“Asynchronous working with Python/Django” Martin Alderete @alderetemartin [email protected]

Slide 2

Slide 2 text

Introduction to the issue Remember: Quicker is BETTER!

Slide 3

Slide 3 text

Introduction to the issue “Summary” Coupled to the Request- Response cycle Can not move it to another host Prone to be a “bottleneck” Error handling is not DRY send_registration_email(user) its blocking It does not scale! The “heavy” work MUST be outside of the Request-Response cycle. Heavy: “Everything which could add an unnecessary delay, some overhead or it is not needed immediately”

Slide 4

Slide 4 text

Intro to message Brokers (DS) Distributed System Applicacion (producer) Broker messages Abstraction Worker (consumer) messages

Slide 5

Slide 5 text

Intro to message Brokers (DS) Redis, RabbitMQ, MongoDB and more...

Slide 6

Slide 6 text

Intro to message Brokers (DS) Advantages Contains “FIFO-like” Queues Looks like a DICT (key-value) Allows to decouple the system Allows to distribute the system Allows the communication between technologies Allows to scale in a “more” natural way disadvantages Now System == Distributed System Adds more complexity to the stack Needs more maintenance

Slide 7

Slide 7 text

Proposed solution: Hardcore... Hardcore = Python + Broker + handmade worker

Slide 8

Slide 8 text

Hardcore way! Producer pip install redis redis_conn = redis.Redis(host='localhost', port=6379)

Slide 9

Slide 9 text

Hardcore way! Consumer

Slide 10

Slide 10 text

Hardcore way! Summary... Simple and complex Independent from the Producer framework Decouple the system Move the heavy work to other place Absolute control from devs All should be done “from scratch” Stick to a single broker (Redis) Limited scalability Lots of code should be “re-written” Monitoring? Administration ?

Slide 11

Slide 11 text

Proposed solutions: Celery Celery is an asynchronous task queue/job queue based on distributed message passing. It is focused on real-time operation, but supports scheduling as well. The execution units, called tasks, are executed concurrently on a single or more worker servers.

Slide 12

Slide 12 text

Celery pip install celery Celery = Python + Broker + Batteries included! www.amqp.org

Slide 13

Slide 13 text

Celery Application Entry point of everything related to Celery. Create a single (aka “app”). http://docs.celeryproject.org/en/latest/django/first-steps-with-django.html

Slide 14

Slide 14 text

Tasks The base of every Celery application. Have 2 responsibilities: Define what happen when a task is called. Define what to do when a worker receive the a task. Every task has a name. Basically callables objects with “magic”. By convention are placed in tasks.py Created using a decorador: @shared_task (from celery import shared_task)

Slide 15

Slide 15 text

Tasks They have many attributes which allow to define how the task behave, for example: Task.name Task.bind Task.queue Task.max_retries Task.default_retry_delay Task.rate_limit Task.time_limit Task.soft_time_limit Task.ignore_result several more… http://celery.readthedocs.org/en/latest/reference/celery.app.task.html

Slide 16

Slide 16 text

Real world task (tasks.py)

Slide 17

Slide 17 text

Routing Routing, mechanism by which we can decide which Queue should receive the message of a new task toward a worker. RECOMMENDED instead of hardcoded ‘queue’ on the TASK!!!

Slide 18

Slide 18 text

Tasks: Calling 2 ways: Using a shortcut and options defined at the moment a task is created (@shared_task, @task). Task.delay(arg1, kwarg1=value1) Using the “long” way, It allows to customize a task call modifying the default options. Task.apply_async(args=l, kwargs=d, **options) http://docs.celeryproject.org/en/latest/reference/celery.app.task.html#celery.app.task.Task.apply_async

Slide 19

Slide 19 text

Workers: Consuming tasks celery -A projName worker --loglevel=info celery -A projName worker --concurrency=10 celery -A proj worker -P eventlet -c 1000 celery -A projName worker --autoscale=10,3 celery worker --help http://docs.celeryproject.org/en/latest/userguide/workers.html

Slide 20

Slide 20 text

AsyncResult Celery provide us (if possible) an AsyncResult (a future) with the result of a task. To do this possible Celery uses a backend where it stores the result of the tasks. The backend is configured by CELERY_RESULT_BACKEND There are few available backends: cache (memcached), mongodb, redis, amqp, etc Each backend has its configuration. http://celery.readthedocs.org/en/latest/configuration.html#celery-result-backend

Slide 21

Slide 21 text

AsyncResult: API Few methods and attributes… http://celery.readthedocs.org/en/latest/reference/celery.result.html

Slide 22

Slide 22 text

AsyncResult: API (from a task_id) http://celery.readthedocs.org/en/latest/reference/celery.result.html

Slide 23

Slide 23 text

CeleryBeat It is a periodic tasks planner (Bye cron?). celery -A projName beat

Slide 24

Slide 24 text

Celery Canvas Celery provides mechanisms to group, chain, add callbacks, as well as process chunks. For this purpose Celery uses something called PRIMITIVES. group: Executes task in parallel.. chain: Links tasks, add callback ( f(g(a)) ). chord: A group plus a callback (Barrier). map: Similar to Python map(). chunks: Separates a list of elements in small parts. http://docs.celeryproject.org/en/latest/userguide/canvas.html

Slide 25

Slide 25 text

Monitoring: Celery Flower

Slide 26

Slide 26 text

Monitoring: Celery Flower Real time monitoring: Progress and historical. Details about the tasks. Graphs and stats. Remote Control: Status and stats of the workers. Shutdown or reboot workers. Control autoscaling and pool size. See tasks execution status tareas. Queues administrations. ETC… pip install flower celery -A projName flower --port=5555

Slide 27

Slide 27 text

Thoughts and conclusions Messages Brokers: Came to stay Allow systems with flexible architectures Allow communication between technologies AMQP is a good protocol (www.amqp.org). Distributed Systems: Are complex but scalable Add complexity to the stack Allow to distribute work loads Require maintenance/monitoring Harder to debug (more when multi-worker) More services but smaller (micro-services)

Slide 28

Slide 28 text

Thoughts and conclusions Celery: Is the framework for distributed systems Is the framework that each Pythonista should test when play with DS. Is a mature projects with good support. Has a good documentation. Is simple to configure and run. Is a WORLD to learn and understand in deep. Could be extended “easily”. (signals, management commands, remotes). Has LOTS of settings and features Should be monitored as a normal service. Something that I do not know… =)

Slide 29

Slide 29 text

Extra: Distributed Locking REDIS https://gist.github.com/malderete/449cf92a16983c0bb412

Slide 30

Slide 30 text

Extra: Distributed Locking (Django’s cache)

Slide 31

Slide 31 text

Muito Obrigado ¿Perguntas? Martin Alderete @alderetemartin [email protected]

Slide 32

Slide 32 text

Muito Obrigado ¿Perguntas? while not manos.adormecida: aplaudir() Martin Alderete @alderetemartin [email protected]

Slide 33

Slide 33 text

http://www.celeryproject.org/ http://redis.io/ http://www.rabbitmq.com/ http://www.amqp.org/ http://zookeeper.apache.org/ https://github.com/brolewis/celery_mutex https://gist.github.com/malderete Links....