Slide 1

Slide 1 text

Asynchronous Tasks With Celery, Flask & RabbitMQ Héctor Benítez - Nearsoft

Slide 2

Slide 2 text

Who am I? Héctor Benítez @hectorbenitez hbenitez[at]nearsoft.com Software Developer at Nearsoft Lead Developer at PlanningPoker for Hangouts http://planningwithcards.com

Slide 3

Slide 3 text

Regular Web Flow ● Browser ● Request ● Server ○ Blocking tasks ● Response

Slide 4

Slide 4 text

Sending a mail with Flask

Slide 5

Slide 5 text

Sending a mail with Flask ● Coupled with Request- Response ● Can’t use a different machine ● Bottleneck ● Manual error Handling ● send_mail(email) is blocking ● Doesn’t scale!!!

Slide 6

Slide 6 text

Brokers

Slide 7

Slide 7 text

Brokers ● Queue (FIFO) ● Allows decoupling of the system ● Allows distributed system ● Technology crossover ● Scales easily

Slide 8

Slide 8 text

Brokers - The bad parts ● System becomes distributed ● Complexity ● Needs maintenance/monitoring

Slide 9

Slide 9 text

Sending a mail using RabbitMQ

Slide 10

Slide 10 text

Sending a mail using RabbitMQ ● No Framework ● Decouples the system ● Async tasks ● Full control ● From scratch (?) ● Works only with RabbitMQ ● Monitoring? ● Maintenance?

Slide 11

Slide 11 text

Celery

Slide 12

Slide 12 text

Celery Celery is an asynchronous task queue/job queue based on passing distributed messages. It is focused on real-time operation and it also supports scheduling. The execution units, called tasks, are executed concurrently on a single or more worker servers.

Slide 13

Slide 13 text

Use cases ● Out of the Request/Response cycle. Example: Sending emails asynchronously. ● Task in the Background. Example: Computational heavy jobs & Interacting with external APIs. ● Periodic Jobs.

Slide 14

Slide 14 text

ARCHITECTURE

Slide 15

Slide 15 text

ARCHITECTURE PRODUCER Produces a task for the queue.

Slide 16

Slide 16 text

ARCHITECTURE BROKER Stores the task backlog Answers, what work remains to be done? RabbitMQ, Redis, SQLAlchemy, Django's ORM, MongoDB...

Slide 17

Slide 17 text

ARCHITECTURE WORKER Execute and consumes tasks. Distributed.

Slide 18

Slide 18 text

ARCHITECTURE RESULTS BACKEND. Stores the results from our tasks. Redis, Redis, SQLAlchemy, Django's ORM, MongoDB... Optional!

Slide 19

Slide 19 text

Task ● Celery main objects ● 2 responsibilities: ○ Send the task ○ Receive the task in a worker ● Unique name ● Python callables with a decorator ● @task (from celery.task import task)

Slide 20

Slide 20 text

Task Pick your Favorite... @app.task def add(x, y): return x + y add(2, 4) class AddTask(app.Task): def run(self, x, y): return x + y AddTask().run(2, 4) # Async add.delay(2, 4) add.apply_aync(args=(2, 4), expires=30)

Slide 21

Slide 21 text

Task Common attributes of Tasks ● Task.name ● Task.bind ● Task.queue ● Task.max_retries ● Task.default_retry_delay ● Task.rate_limit ● Task.time_limit ● Task.soft_time_limit ● Task.ignore_result

Slide 22

Slide 22 text

Sending a mail with Celery

Slide 23

Slide 23 text

AsyncResult When we delay a task Celery provides an object called AsyncResult Celery needs a result backend to store this results: CELERY_RESULT_BACKEND

Slide 24

Slide 24 text

Sending mail with Celery with Backend

Slide 25

Slide 25 text

Workers #loglevel=INFO celery -A projName worker --loglevel=info #concurrency celery -A projName worker --concurrency=10 #evenlet (?) celery -A proj worker -P eventlet -c 1000 #autoscale celery -A projName worker --autoscale=10,3 Warning: Increased concurrency can quickly drain connections. Use a connection pooler (like pgbouncer)

Slide 26

Slide 26 text

Periodic Tasks from datetime import timedelta @app.periodic_task(run_every=timedelta(minutes=5)): def run_every_five(): pass from datetime import timedelta class RunEveryFive(app.PeriodicTask): run_every = timedelta(minutes=5) def run(self): pass

Slide 27

Slide 27 text

Periodic Tasks from datetime import timedelta @app.task(): def run_every_five(): pass CELERYBEAT_SCHEDULE = { 'run-every-five': { 'task': 'tasks.run_every_five', 'schedule': timedelta(seconds=30) }, }

Slide 28

Slide 28 text

Periodic Tasks - CRON Style from celery.schedules import crontab crontab(minute=0, hour='*/3') # Every 3 hours. crontab(day_of_week='sunday') # Every minute on Sundays. crontab(0, 0, 0, month_of_year='*/3') # First month of every quarter. @app.periodic_task(run_every=crontab(minute=0, hour=1)) def schedule_emails(): user_ids = User.objects.values_list('id', flat=True) for user_id in user_ids: send_daily_email.delay(user_id)

Slide 29

Slide 29 text

Periodic Tasks - CRON Style @app.task() def send_daily_email(user_id): user = User.objects.get(id=user_id) email = Email(user=user, body="Hi") email.send()

Slide 30

Slide 30 text

Periodic Tasks CELERY BEAT A.K.A THE SCHEDULER. celery -A project beat

Slide 31

Slide 31 text

Periodic Tasks NEVER RUN A BEAT + WORKER ON A SINGLE CELERY PROCESS. # Really bad idea.... celery -A project worker -B

Slide 32

Slide 32 text

Periodic Tasks FREQUENTLY RUNNING PERIODIC TASKS. BEWARE OF "TASK STACKING" Schedule task runs every 5 minutes. Tasks take 30 minutes. Schedule task stacks. Bad stuff.

Slide 33

Slide 33 text

Periodic Tasks EXPIRES! from time import sleep @app.periodic_task(expires=300, run_every=timedelta(minutes=5)) def schedule_task(): for _ in range(30): one_minute_task.delay() @app.task(expires=300) def one_minute_task(): sleep(60)

Slide 34

Slide 34 text

Celery Canvas ● group ● chain ● chord ● map ● starmap ● chunks http://docs.celeryproject.org/en/latest/userguide/canvas.html

Slide 35

Slide 35 text

Best Practices

Slide 36

Slide 36 text

Best Practices Never pass objects as arguments # Bad @app.task() def send_reminder(reminder): reminder.send_email()

Slide 37

Slide 37 text

Best Practices # Good @app.task() def send_reminder(pk): try: reminder = Reminder.objects.get(pk=pk) except Reminder.DoesNotExist: return reminder.send_email()

Slide 38

Slide 38 text

Best Practices Keep tasks granual. Can process more in parallel.

Slide 39

Slide 39 text

Best Practices Things go wrong in Tasks... from celery.exceptions import Retry @app.task(max_retries=10) def gather_data(): try: data = api.get_data() # etc, etc, ... except api.RateLimited as e: raise Retry(exc=e, when=e.cooldown) except api.IsDown: return

Slide 40

Slide 40 text

Best Practices Ensure a task is executed one at a time logger = get_task_logger(__name__) LOCK_EXPIRE = 60 * 5 # Lock expires in 5 minutes @task def import_feed(feed_url): feed_url_digest = md5(feed_url).hexdigest() lock_id = '{0}-lock-{1}'.format(self.name, feed_url_hexdigest) # cache.add fails if if the key already exists acquire_lock = lambda: cache.add(lock_id, 'true', LOCK_EXPIRE) release_lock = lambda: cache.delete(lock_id) logger.debug('Importing feed: %s', feed_url) if acquire_lock(): feed = Feed.objects.import_feed(feed_url) release_lock() return feed.url logger.debug( 'Feed %s is already being imported by another worker', feed_url)

Slide 41

Slide 41 text

Best Practices Important Settings # settings.py CELERY_IGNORE_RESULT = True CELERYD_TASK_SOFT_TIME_LIMIT = 500 CELERYD_TASK_TIME_LIMIT = 1000 # tasks.py app.task(ignore_result=True, soft_time_limit=60, time_limit=120) def add(x, y): pass # settings.py CELERYD_MAX_TASKS_PER_CHILD = 500 CELERYD_PREFETCH_MULTIPLIER = 4

Slide 42

Slide 42 text

Routing CELERY_ROUTES = { 'email.tasks.send_mail': { 'queue': 'priority', }, } # Or... send_mail.apply_async(queue="priority") celery -A project worker -Q email

Slide 43

Slide 43 text

Error Insight and Monitoring ● Sentry ● NewRelic ● RabbitMQ Management Plugin

Slide 44

Slide 44 text

Error Insight and Monitoring Celery Flower ● Real-time monitoring: ○ Progress and job history. ○ Task information. ○ Graphs and statistics. ○ Status and statistics of the workers. ○ View tasks that are running. ● Remote control: ○ Shutdown and restart workers. ○ Check autoscaling and pool size. ○ Management of queues. pip install flower celery flower -A projectName --port=5555

Slide 45

Slide 45 text

Final Thoughts ● Flexible Architectures ● Technology Crossover ● AMQP ● Complex but scalable ● Distributed systems helps distributed work ● Needs maintenance and monitoring ● Hard to debug ● Celery has a lot of cool features

Slide 46

Slide 46 text

Thanks

Slide 47

Slide 47 text

No content