Three quick tips from two years with Celery
LaunchKit and Cluster use Celery extensively (using Redis as a broker) to handle all sorts of out-of-band background tasks. We have sent millions of push notifications, generated and delivered an insane amount of email, backed up millions of photos and much more all using Celery tasks over the past few years.
As a result, I have been woken up in the middle of the night many times by various Celery-related issues — and there are a few essential configuration tips I’ve taken away from the experience.
Here are a few tips for sleeping through the night while running a Celery task queue in production:
1. Set a global task timeout
By default, tasks don’t time out. If a network connection inside your task hangs indefinitely, your queue will eventually back up and something about your service will mysteriously stop working.
So you should set some large global default timeout for tasks, and probably some more specific short timeouts on various tasks as well.
In a Django project, you can set a global timeout by adding this line to settings.py:
# Add a one-minute timeout to all Celery tasks.CELERYD_TASK_SOFT_TIME_LIMIT = 60
… which you can override in specific tasks:
@celery_app.task(soft_time_limit=5)def send_push_notification(device_token, message, data=None): notification_json = build_notification_json(message, data=data) ...
This will prevent unexpectedly never-ending tasks from clogging your queues.
2. Use -Ofair for your preforking workers
By default, preforking Celery workers distribute tasks to their worker processes as soon as they are received, regardless of whether the process is currently busy with other tasks.
If you have a set of tasks that take varying amounts of time to complete — either deliberately or due to unpredictable network conditions, etc. — this will cause unexpected delays in total execution time for tasks in the queue.
To demonstrate, here’s an example: Let’s say you have 20 tasks, each of which calls some remote API, and each takes 1 second to finish.
You set up 4 workers to run through these 20 tasks:
celery worker -A ... -Q random-tasks --concurrency=4
This will take about 5 seconds to finish. 4 subprocesses, 5 tasks each.
But, if instead of 1 second, the first task (task 1 of 20) takes 10 seconds to complete, the total amount of time this queue will take to execute? It’s not 10 seconds — it’s 14 seconds.
That’s because the tasks get distributed evenly, so each subprocess gets 5 of the 20 tasks.