Who am I?
Héctor Benítez
@hectorbenitez
hbenitez[at]nearsoft.com
Software Developer at Nearsoft
Lead Developer at PlanningPoker for Hangouts
http://planningwithcards.com
Slide 3
Slide 3 text
Regular Web Flow
● Browser
● Request
● Server
○ Blocking tasks
● Response
Slide 4
Slide 4 text
Sending a mail with Flask
Slide 5
Slide 5 text
Sending a mail with Flask
● Coupled with Request- Response
● Can’t use a different machine
● Bottleneck
● Manual error Handling
● send_mail(email) is blocking
● Doesn’t scale!!!
Slide 6
Slide 6 text
Brokers
Slide 7
Slide 7 text
Brokers
● Queue (FIFO)
● Allows decoupling of the system
● Allows distributed system
● Technology crossover
● Scales easily
Slide 8
Slide 8 text
Brokers - The bad parts
● System becomes distributed
● Complexity
● Needs maintenance/monitoring
Slide 9
Slide 9 text
Sending a mail using RabbitMQ
Slide 10
Slide 10 text
Sending a mail using RabbitMQ
● No Framework
● Decouples the system
● Async tasks
● Full control
● From scratch (?)
● Works only with RabbitMQ
● Monitoring?
● Maintenance?
Slide 11
Slide 11 text
Celery
Slide 12
Slide 12 text
Celery
Celery is an asynchronous task queue/job queue
based on passing distributed messages. It is focused
on real-time operation and it also supports scheduling.
The execution units, called tasks, are executed
concurrently on a single or more worker servers.
Slide 13
Slide 13 text
Use cases
● Out of the Request/Response cycle. Example: Sending
emails asynchronously.
● Task in the Background. Example: Computational heavy
jobs & Interacting with external APIs.
● Periodic Jobs.
Slide 14
Slide 14 text
ARCHITECTURE
Slide 15
Slide 15 text
ARCHITECTURE
PRODUCER
Produces a task for the queue.
Slide 16
Slide 16 text
ARCHITECTURE
BROKER
Stores the task backlog Answers,
what work remains to be done?
RabbitMQ, Redis, SQLAlchemy, Django's ORM,
MongoDB...
Slide 17
Slide 17 text
ARCHITECTURE
WORKER
Execute and consumes tasks. Distributed.
Slide 18
Slide 18 text
ARCHITECTURE
RESULTS BACKEND.
Stores the results from our tasks. Redis, Redis,
SQLAlchemy, Django's ORM, MongoDB... Optional!
Slide 19
Slide 19 text
Task
● Celery main objects
● 2 responsibilities:
○ Send the task
○ Receive the task in a worker
● Unique name
● Python callables with a decorator
● @task (from celery.task import task)
Slide 20
Slide 20 text
Task
Pick your Favorite...
@app.task
def add(x, y):
return x + y add(2, 4)
class AddTask(app.Task):
def run(self, x, y):
return x + y AddTask().run(2, 4)
# Async
add.delay(2, 4)
add.apply_aync(args=(2, 4), expires=30)
AsyncResult
When we delay a task Celery provides an object called
AsyncResult
Celery needs a result backend to store this results:
CELERY_RESULT_BACKEND
Slide 24
Slide 24 text
Sending mail with Celery
with Backend
Slide 25
Slide 25 text
Workers
#loglevel=INFO
celery -A projName worker --loglevel=info
#concurrency
celery -A projName worker --concurrency=10
#evenlet (?)
celery -A proj worker -P eventlet -c 1000
#autoscale
celery -A projName worker --autoscale=10,3
Warning: Increased concurrency can quickly drain connections. Use a
connection pooler (like pgbouncer)
Slide 26
Slide 26 text
Periodic Tasks
from datetime import timedelta
@app.periodic_task(run_every=timedelta(minutes=5)):
def run_every_five():
pass
from datetime import timedelta
class RunEveryFive(app.PeriodicTask):
run_every = timedelta(minutes=5)
def run(self):
pass
Periodic Tasks - CRON Style
from celery.schedules import crontab
crontab(minute=0, hour='*/3') # Every 3 hours. crontab(day_of_week='sunday')
# Every minute on Sundays.
crontab(0, 0, 0, month_of_year='*/3') # First month of every quarter.
@app.periodic_task(run_every=crontab(minute=0, hour=1))
def schedule_emails():
user_ids = User.objects.values_list('id', flat=True)
for user_id in user_ids:
send_daily_email.delay(user_id)
Periodic Tasks
CELERY BEAT A.K.A THE SCHEDULER.
celery -A project beat
Slide 31
Slide 31 text
Periodic Tasks
NEVER RUN A BEAT + WORKER
ON A SINGLE CELERY PROCESS.
# Really bad idea....
celery -A project worker -B
Slide 32
Slide 32 text
Periodic Tasks
FREQUENTLY RUNNING PERIODIC TASKS.
BEWARE OF "TASK STACKING"
Schedule task runs every 5 minutes. Tasks take 30
minutes.
Schedule task stacks. Bad stuff.
Slide 33
Slide 33 text
Periodic Tasks
EXPIRES!
from time import sleep
@app.periodic_task(expires=300, run_every=timedelta(minutes=5))
def schedule_task():
for _ in range(30):
one_minute_task.delay()
@app.task(expires=300)
def one_minute_task():
sleep(60)
Best Practices
Never pass objects as arguments
# Bad
@app.task()
def send_reminder(reminder):
reminder.send_email()
Slide 37
Slide 37 text
Best Practices
# Good
@app.task()
def send_reminder(pk):
try:
reminder = Reminder.objects.get(pk=pk)
except Reminder.DoesNotExist:
return reminder.send_email()
Slide 38
Slide 38 text
Best Practices
Keep tasks granual.
Can process more in parallel.
Slide 39
Slide 39 text
Best Practices Things go wrong in Tasks...
from celery.exceptions import Retry
@app.task(max_retries=10)
def gather_data():
try:
data = api.get_data()
# etc, etc, ...
except api.RateLimited as e:
raise Retry(exc=e, when=e.cooldown)
except api.IsDown:
return
Slide 40
Slide 40 text
Best Practices Ensure a task is executed one at a time
logger = get_task_logger(__name__)
LOCK_EXPIRE = 60 * 5 # Lock expires in 5 minutes
@task def import_feed(feed_url):
feed_url_digest = md5(feed_url).hexdigest()
lock_id = '{0}-lock-{1}'.format(self.name, feed_url_hexdigest)
# cache.add fails if if the key already exists
acquire_lock = lambda: cache.add(lock_id, 'true', LOCK_EXPIRE)
release_lock = lambda: cache.delete(lock_id)
logger.debug('Importing feed: %s', feed_url)
if acquire_lock():
feed = Feed.objects.import_feed(feed_url)
release_lock()
return feed.url
logger.debug( 'Feed %s is already being imported by another worker', feed_url)
Error Insight and Monitoring Celery Flower
● Real-time monitoring:
○ Progress and job history.
○ Task information.
○ Graphs and statistics.
○ Status and statistics of the workers.
○ View tasks that are running.
● Remote control:
○ Shutdown and restart workers.
○ Check autoscaling and pool size.
○ Management of queues.
pip install flower
celery flower -A projectName --port=5555
Slide 45
Slide 45 text
Final Thoughts
● Flexible Architectures
● Technology Crossover
● AMQP
● Complex but scalable
● Distributed systems helps distributed work
● Needs maintenance and monitoring
● Hard to debug
● Celery has a lot of cool features