Celery Memcached backend does not reconnect after idle connection loss #9975

TheoCouss · 2025-10-30T10:49:56Z

TheoCouss
Oct 30, 2025

I’m running a Django application using Celery for background tasks (file generation and backend processing).
When the app starts, tasks execute and return results normally.
After around 1 hour of uptime, delayed tasks stop returning results — their state remains “PENDING” because the result key no longer exists in Memcached.

On the Django side, the frontend polls the backend periodically to check task status, but Celery never retrieves any result.
No errors are logged on the worker side.

Restarting Memcached (thus dropping all connections) reproduces the problem immediately: Celery never reconnects and all delayed tasks lose their result backend connection.
This affects all delayed tasks, except “fire-and-forget” ones (which don’t store results).

I also reproduced the issue when switching between the default Memcached client and pylibmc

Observed behavior

After ~1 hour (sometimes less), delayed tasks remain in PENDING state.
The result key no longer exists in Memcached.
No error or reconnect attempt appears in Celery logs.
Restarting Memcached immediately triggers the issue (workers stay connected to dead sockets).
Restarting Celery workers temporarily fixes it.

Expected behavior

Celery should detect when the Memcached connection has been dropped (e.g., due to idle TCP timeout in Kubernetes) and reconnect automatically instead of silently failing to read/write results.

Hypothesis

Celery’s Memcached backend keeps persistent connections open to Memcached.
After some idle time, either Memcached or the Kubernetes network stack closes the connection.
Since Celery’s Memcached backend doesn’t retry or reopen the connection, all subsequent result writes/reads silently fail.
This explains why result keys disappear and tasks remain “pending”.

What I’ve checked

Memcached was not restarted when the issue occurred.
No socket error or connection reset in Celery logs.
Memcached metrics (curr_connections, get_hits, get_misses) reain stable.
RabbitMQ and task dispatching are unaffected — only result backend lookups fail.
Issue reproduced using both python-memcached and libpymemcached backends.

auvipy · 2025-11-09T08:24:50Z

auvipy
Nov 9, 2025
Maintainer

may be we could try to introduce the re try machanism?

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Celery Memcached backend does not reconnect after idle connection loss #9975

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Uh oh!

Celery Memcached backend does not reconnect after idle connection loss #9975

Uh oh!

Uh oh!

TheoCouss Oct 30, 2025

Replies: 1 comment

Uh oh!

auvipy Nov 9, 2025 Maintainer

TheoCouss
Oct 30, 2025

auvipy
Nov 9, 2025
Maintainer