Slide 41
Slide 41 text
I put aside Celerybeat earlier, but we do have to deal with it eventually.
There are two problems with beat. First, if you run two copies, your
scheduled jobs all get run twice. Even if you design your tasks carefully to
be idempoent (which you should), that's still a bit load increase. Second,
it's stateful. By default beat stores the last run time of each scheduled
task in a local file. There is django-celery-beat which moves the state
storage into the main database, but depending on how frequently you run
scheduled tasks, that might be a lot of write load on your SQL database.
Celery-beatx is a project which helps with both of these, it handles
locking between multiple instances of beat so only one is active at a time,
but you can still run multiple for redundancy, and it allows using Redis or
Memcache for state storage, which works better for the use case. But
there is a catch, beatx is Python 3 only. So failing that, we want to use a
statefulset.
Celerybeat
StatefulSet vs. BeatX
Noah Kantrowitz – @kantrn – DjangoCon US 2019