Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Asynchronous Tasks With Celery, Flask & RabbitMQ

Hector Benitez
December 11, 2015

Asynchronous Tasks With Celery, Flask & RabbitMQ

Hector Benitez

December 11, 2015
Tweet

More Decks by Hector Benitez

Other Decks in Technology

Transcript

  1. Who am I? Héctor Benítez @hectorbenitez hbenitez[at]nearsoft.com Software Developer at

    Nearsoft Lead Developer at PlanningPoker for Hangouts http://planningwithcards.com
  2. Sending a mail with Flask • Coupled with Request- Response

    • Can’t use a different machine • Bottleneck • Manual error Handling • send_mail(email) is blocking • Doesn’t scale!!!
  3. Brokers • Queue (FIFO) • Allows decoupling of the system

    • Allows distributed system • Technology crossover • Scales easily
  4. Brokers - The bad parts • System becomes distributed •

    Complexity • Needs maintenance/monitoring
  5. Sending a mail using RabbitMQ • No Framework • Decouples

    the system • Async tasks • Full control • From scratch (?) • Works only with RabbitMQ • Monitoring? • Maintenance?
  6. Celery Celery is an asynchronous task queue/job queue based on

    passing distributed messages. It is focused on real-time operation and it also supports scheduling. The execution units, called tasks, are executed concurrently on a single or more worker servers.
  7. Use cases • Out of the Request/Response cycle. Example: Sending

    emails asynchronously. • Task in the Background. Example: Computational heavy jobs & Interacting with external APIs. • Periodic Jobs.
  8. ARCHITECTURE BROKER Stores the task backlog Answers, what work remains

    to be done? RabbitMQ, Redis, SQLAlchemy, Django's ORM, MongoDB...
  9. ARCHITECTURE RESULTS BACKEND. Stores the results from our tasks. Redis,

    Redis, SQLAlchemy, Django's ORM, MongoDB... Optional!
  10. Task • Celery main objects • 2 responsibilities: ◦ Send

    the task ◦ Receive the task in a worker • Unique name • Python callables with a decorator • @task (from celery.task import task)
  11. Task Pick your Favorite... @app.task def add(x, y): return x

    + y add(2, 4) class AddTask(app.Task): def run(self, x, y): return x + y AddTask().run(2, 4) # Async add.delay(2, 4) add.apply_aync(args=(2, 4), expires=30)
  12. Task Common attributes of Tasks • Task.name • Task.bind •

    Task.queue • Task.max_retries • Task.default_retry_delay • Task.rate_limit • Task.time_limit • Task.soft_time_limit • Task.ignore_result
  13. AsyncResult When we delay a task Celery provides an object

    called AsyncResult Celery needs a result backend to store this results: CELERY_RESULT_BACKEND
  14. Workers #loglevel=INFO celery -A projName worker --loglevel=info #concurrency celery -A

    projName worker --concurrency=10 #evenlet (?) celery -A proj worker -P eventlet -c 1000 #autoscale celery -A projName worker --autoscale=10,3 Warning: Increased concurrency can quickly drain connections. Use a connection pooler (like pgbouncer)
  15. Periodic Tasks from datetime import timedelta @app.periodic_task(run_every=timedelta(minutes=5)): def run_every_five(): pass

    from datetime import timedelta class RunEveryFive(app.PeriodicTask): run_every = timedelta(minutes=5) def run(self): pass
  16. Periodic Tasks from datetime import timedelta @app.task(): def run_every_five(): pass

    CELERYBEAT_SCHEDULE = { 'run-every-five': { 'task': 'tasks.run_every_five', 'schedule': timedelta(seconds=30) }, }
  17. Periodic Tasks - CRON Style from celery.schedules import crontab crontab(minute=0,

    hour='*/3') # Every 3 hours. crontab(day_of_week='sunday') # Every minute on Sundays. crontab(0, 0, 0, month_of_year='*/3') # First month of every quarter. @app.periodic_task(run_every=crontab(minute=0, hour=1)) def schedule_emails(): user_ids = User.objects.values_list('id', flat=True) for user_id in user_ids: send_daily_email.delay(user_id)
  18. Periodic Tasks - CRON Style @app.task() def send_daily_email(user_id): user =

    User.objects.get(id=user_id) email = Email(user=user, body="Hi") email.send()
  19. Periodic Tasks NEVER RUN A BEAT + WORKER ON A

    SINGLE CELERY PROCESS. # Really bad idea.... celery -A project worker -B
  20. Periodic Tasks FREQUENTLY RUNNING PERIODIC TASKS. BEWARE OF "TASK STACKING"

    Schedule task runs every 5 minutes. Tasks take 30 minutes. Schedule task stacks. Bad stuff.
  21. Periodic Tasks EXPIRES! from time import sleep @app.periodic_task(expires=300, run_every=timedelta(minutes=5)) def

    schedule_task(): for _ in range(30): one_minute_task.delay() @app.task(expires=300) def one_minute_task(): sleep(60)
  22. Celery Canvas • group • chain • chord • map

    • starmap • chunks http://docs.celeryproject.org/en/latest/userguide/canvas.html
  23. Best Practices Never pass objects as arguments # Bad @app.task()

    def send_reminder(reminder): reminder.send_email()
  24. Best Practices # Good @app.task() def send_reminder(pk): try: reminder =

    Reminder.objects.get(pk=pk) except Reminder.DoesNotExist: return reminder.send_email()
  25. Best Practices Things go wrong in Tasks... from celery.exceptions import

    Retry @app.task(max_retries=10) def gather_data(): try: data = api.get_data() # etc, etc, ... except api.RateLimited as e: raise Retry(exc=e, when=e.cooldown) except api.IsDown: return
  26. Best Practices Ensure a task is executed one at a

    time logger = get_task_logger(__name__) LOCK_EXPIRE = 60 * 5 # Lock expires in 5 minutes @task def import_feed(feed_url): feed_url_digest = md5(feed_url).hexdigest() lock_id = '{0}-lock-{1}'.format(self.name, feed_url_hexdigest) # cache.add fails if if the key already exists acquire_lock = lambda: cache.add(lock_id, 'true', LOCK_EXPIRE) release_lock = lambda: cache.delete(lock_id) logger.debug('Importing feed: %s', feed_url) if acquire_lock(): feed = Feed.objects.import_feed(feed_url) release_lock() return feed.url logger.debug( 'Feed %s is already being imported by another worker', feed_url)
  27. Best Practices Important Settings # settings.py CELERY_IGNORE_RESULT = True CELERYD_TASK_SOFT_TIME_LIMIT

    = 500 CELERYD_TASK_TIME_LIMIT = 1000 # tasks.py app.task(ignore_result=True, soft_time_limit=60, time_limit=120) def add(x, y): pass # settings.py CELERYD_MAX_TASKS_PER_CHILD = 500 CELERYD_PREFETCH_MULTIPLIER = 4
  28. Routing CELERY_ROUTES = { 'email.tasks.send_mail': { 'queue': 'priority', }, }

    # Or... send_mail.apply_async(queue="priority") celery -A project worker -Q email
  29. Error Insight and Monitoring Celery Flower • Real-time monitoring: ◦

    Progress and job history. ◦ Task information. ◦ Graphs and statistics. ◦ Status and statistics of the workers. ◦ View tasks that are running. • Remote control: ◦ Shutdown and restart workers. ◦ Check autoscaling and pool size. ◦ Management of queues. pip install flower celery flower -A projectName --port=5555
  30. Final Thoughts • Flexible Architectures • Technology Crossover • AMQP

    • Complex but scalable • Distributed systems helps distributed work • Needs maintenance and monitoring • Hard to debug • Celery has a lot of cool features