Python Background Workers: Architecture, Queues, and Retry Strategies

// TL;DR

I thought processing audio in the background was simple: spawn a thread, run the script, save the file
Then I hit 200 concurrent requests, and it failed epically
The CPU spiked to full usage because of pydub’s processing

I thought processing audio in the background was simple: spawn a thread, run the script, save the file. Then I hit 200 concurrent requests, and it failed epically.

The CPU spiked to full usage because of pydub’s processing. The TTS API didn’t rate-limit me but, it was horribly slow. Then the jobs started failing. Half the jobs died silently. The other half wrote corrupted files because of race conditions I didn’t know existed. And when I deployed a little fix? The deployment killed in-flight jobs, leaving orphaned audio segments scattered across cloud storage directory.

This is not a guide about how to spawn a background task. This is about what happens after that, when the simple script becomes production infrastructure, when “it works on my machine” becomes “why did it fail in production and how can I fix it?”

Referencing existing production systems was a great way to learn about the tradeoffs and patterns that exist in the space. I found a lot of cool articles. So, I started looking for architectural keywords, patterns, and tradeoffs. And started reading about their implementation in production.

Then I built them into the batch audio processing system. Python, FastAPI, asyncio. It takes text, sends it to Google Cloud Text-to-Speech, processes the audio, and stitches hundreds of segments into final output. I wrote down every pattern, every decision, every tradeoff. This is that document (with the important parts).

Why background processing is harder than it looks

Consider a client request to generate audio for 200 text segments. Each segment requires a TTS API call (300-1000ms), post-processing (adding silence, normalizing), upload to cloud storage, progress tracking, final stitching, and webhook notification.

Sequentially? That’s 4-5 minutes. You can’t hold an HTTP connection open that long.

The answer is obvious: background processing. Accept the request, return a job ID, process asynchronously. But this creates a cascade of new problems.

What happens if a worker crashes mid-job? What if two workers grab the same job? What about rate limits? Retries? What if the database locks under concurrent writes? What if a deployment kills a running job? How do you even debug failures that happen while you’re asleep?

Important

Background processing isn’t hard because the happy path is hard. It’s hard because everything else is hard.

This blog covers the patterns that solve these problems.

Level 1: The simple approach

Here’s how most people start:

python api.py

@app.post("/generate")
async def generate_audio(request: AudioRequest): # Bad idea: processing in the request handler
for segment in request.segments:
audio = await tts_service.generate(segment.text)
await storage.upload(audio)
return {"status": "done"}

This fails immediately. The client times out. The request gets killed. Half the segments are processed, half aren’t. There’s no way to recover.

The first fix is obvious: move to background tasks.

python api.py

@app.post("/generate")
async def generate_audio(request: AudioRequest):
job_id = create_job(request) # Stores job in database with PENDING status
asyncio.create_task(process_job(job_id)) # Fire and forget
return {"job_id": job_id, "status": "accepted"}

Better. The client gets a response immediately. But now you have new problems.

If the process restarts, all running jobs are lost. There’s no persistence. If process_job throws an exception, it vanishes silently. If you scale to multiple instances, you have no coordination. If a job takes too long, there’s no timeout.

This is where I learned and used the real architectural patterns and techniques.

Level 2: Using a worker pool

The fundamental unit of background processing is the worker pool: a fixed set of workers pulling jobs from a queue.

 1+-------------------------------------------------------+
                        Job Queue                     |
   [Job1] [Job2] [Job3] [Job4] [Job5] [Job6] ...      |
 4+--------------------------+----------------------------+
                         |
      +------------------+------------------+
      v                  v                  v
 +---------+        +---------+        +---------+
 | Worker1 |        | Worker2 |        | Worker3 |
 +---------+        +---------+        +---------+

Created with asciiflow

Unlike fire-and-forget tasks, a worker pool provides bounded concurrency, crash recovery through job persistence, and visibility into what’s running.

The queue can be in-memory (fast, but loses jobs on crash), database-backed (durable, simple), or a dedicated message broker (Kafka, RabbitMQ, SQS). I used SQLite with WAL mode. It’s surprisingly capable for moderate workloads, and deployment is trivial. One file, no infrastructure.

The Worker Loop

Here’s the skeleton of a worker in Python:

python worker.py

class Worker:
def **init**(self, db: Database, semaphore: asyncio.Semaphore):
self.db = db
self.semaphore = semaphore
self.shutdown_event = asyncio.Event()

    async def run(self):
        while not self.shutdown_event.is_set():
            job = await self.db.claim_next_pending_job()
            if job is None:
                await asyncio.sleep(1)  # No work, poll again
                continue

            async with self.semaphore:  # Bound concurrent operations
                try:
                    await self.process_job(job)
                    await self.db.mark_completed(job.id)
                except Exception as e:
                    await self.db.mark_failed(job.id, str(e))

    async def process_job(self, job):
        for segment in job.segments:
            if segment.status == "COMPLETED":
                continue  # Skip already-done segments (resumption)
            audio = await self.tts_service.generate(segment.text)
            await self.storage.upload(audio)
            await self.db.mark_segment_completed(segment.id)

A few critical details here:

The claim_next_pending_job must be atomic. Two workers should never claim the same job. In SQL, this looks like:

sql claim_job.sql

UPDATE jobs
SET status = 'IN_PROGRESS', worker_id = ?
WHERE id = (
SELECT id FROM jobs
WHERE status = 'PENDING'
LIMIT 1
)
RETURNING \*;

The semaphore bounds concurrency. If you fire 1000 API calls at once, you’ll exhaust rate limits, connection pools, and most importantly, system resources like CPU and memory. Start with 5-10 concurrent operations and increase based on monitoring.

The loop checks for already-completed segments before processing. This enables resumption. If the job failed at segment 150, the retry skips segments 1-149.

Level 3: Making it resilient

Failures are not exceptional. They’re normal in distributed systems. There are many reasons why things can fail: networks drop, APIs rate-limit, databases lock, servers restart etc. Your system needs to survive all of this.

Retry with exponential backoff

The naive approach to retries: retry immediately. This is wrong. If an API is overloaded, hammering it with immediate retries makes things worse.

Exponential backoff spaces out retries. And it allows the system to cool down.

python retry.py

import asyncio
import random
from functools import wraps

def retry_with_backoff(max_retries=5, base_delay=1.0, max_delay=60.0):
def decorator(func):
@wraps(func)
async def wrapper(*args, \*\*kwargs):
last_exception = None
for attempt in range(max_retries):
try:
return await func(*args, \*\*kwargs)
except TransientError as e:
last_exception = e
if attempt == max_retries - 1:
raise

                    # Exponential backoff with jitter
                    delay = min(base_delay * (2 ** attempt), max_delay)
                    jitter = delay * 0.5 * random.random()
                    await asyncio.sleep(delay + jitter)
            raise last_exception
        return wrapper
    return decorator

@retry_with_backoff(max_retries=5)
async def call_tts_api(text: str) -> bytes:
response = await http_client.post("/synthesize", json={"text": text})
if response.status_code == 429:
raise TransientError("Rate limited")
if response.status_code >= 500:
raise TransientError("Server error")
if response.status_code == 400:
raise PermanentError("Invalid input") # Don't retry this
return response.content

The jitter is crucial. Without it, all clients retry at the same instant after an outage, creating thundering herd spikes. Random jitter spreads the load and prevent repeated failures. For example if 1000 clients all retry at the same instant, they’ll all fail and retry at the same instant, creating a cascade of failures.

Tip

Retry transient failures. Fail fast on permanent errors.

A 400 Bad Request won’t magically succeed on retry. A 429 or 503 might. Classify your errors and act accordingly.

Making operations safe to repeat (Idempotency)

What happens if a worker crashes after uploading an audio file but before marking the segment as complete? The retry will upload the same file again.

If your upload uses a deterministic key (e.g., job_123/segment_045.mp3), this is fine. Uploading the same content to the same key is a no-op. That’s idempotency.

For database updates, use conditional writes:

sql idempotent_update.sql

UPDATE segments
SET status = 'COMPLETED'
WHERE id = ? AND status = 'IN_PROGRESS';

This won’t double-complete a segment. If something else already completed it, the update affects zero rows.

For external APIs that aren’t idempotent, use client-generated request IDs:

python

request*id = f"job*{job*id}\_segment*{segment_id}\_v{attempt}"
response = await api.synthesize(text=text, request_id=request_id)

Many APIs deduplicate by request ID. Check your provider’s documentation to implement what’s best for your use case.

Level 4: Production Readiness

Your system works. Jobs process. Retries happen. But you’re not done. Production means handling deployments, shutdowns, and failures you haven’t anticipated yet while developing.

Graceful Shutdown

When you deploy new code, what happens to running jobs?

The naive answer: they die. SIGTERM kills the process. Jobs are left in IN_PROGRESS forever, or worse, with corrupted output.

The correct answer is to implement graceful shutdown.

python shutdown.py

import signal
import asyncio

class WorkerManager:
def **init**(self):
self.workers = []
self.active_jobs = set()
self.shutdown_requested = False

    def setup_signal_handlers(self):
        for sig in (signal.SIGTERM, signal.SIGINT):
            signal.signal(sig, self._handle_shutdown)

    def _handle_shutdown(self, signum, frame):
        self.shutdown_requested = True
        for worker in self.workers:
            worker.shutdown_event.set()

    async def wait_for_shutdown(self, timeout=30):
        start = asyncio.get_event_loop().time()
        while self.active_jobs:
            if asyncio.get_event_loop().time() - start > timeout:
                # Hard deadline: force exit
                break
            await asyncio.sleep(0.1)

Here the sequence matters:

Receive SIGTERM
Stop accepting new work
Signal workers to stop pulling new jobs
Wait for in-progress jobs to finish (with timeout)
Flush any pending writes
Exit

If a job doesn’t finish within the timeout, it should be resumable. When the new instance starts, it should pick up where the old one left off.

Structured Logging

When something fails, your logs are your only debugging tool.

Unstructured logs are useless:

Bad Logs

Processing job job_123 segment 45 with voice en-US-Standard-A

Structured logs are useful and queryable:

python logging.py

import structlog

log = structlog.get_logger()

log.info(
"segment_processing_start",
job_id=job.id,
segment_id=segment.id,
voice=segment.voice,
attempt=attempt_number
)

This outputs JSON that you can search in your logging platform:

Good Logs

{
"event": "segment_processing_start",
"job_id": "job_123",
"segment_id": "45",
"voice": "en-US-Standard-A",
"attempt": 1,
"timestamp": "2026-02-01T10:30:00Z"
}

Log job state transitions, external API calls (with durations), retries, and errors. Don’t log sensitive data. Use log levels appropriately (DEBUG for verbose tracing, INFO for milestones, ERROR for failures).

Health Checks and Startup Validation

A service that starts with invalid configuration will fail eventually, usually at the worst time.

Validate everything at startup:

python validation.py

async def startup_validation(): # Check secrets exist
if not os.getenv("TTS_API_KEY"):
raise ConfigurationError("TTS_API_KEY not set")

    # Check external connectivity
    try:
        await tts_client.ping()
    except Exception as e:
        raise ConfigurationError(f"Cannot reach TTS API: {e}")

    # Check database
    try:
        await db.execute("SELECT 1")
    except Exception as e:
        raise ConfigurationError(f"Database unavailable: {e}")

    # Check storage
    try:
        await storage.check_permissions()
    except Exception as e:
        raise ConfigurationError(f"Storage not writable: {e}")

If validation fails, exit immediately with a clear error message. Don’t try to “work around” missing configuration. It will cause confusing failures later.

The reusable patterns

No single pattern is sufficient. It’s their combination that produces reliability.

Statelessness means any worker can pick up any job. State lives in the database, not in memory.

Idempotency means operations are safe to repeat. Retries can’t corrupt data.

Bounded concurrency means you don’t overwhelm downstream services or your own resources. Semaphores control parallelism.

Granular progress tracking means jobs can resume from where they failed. You don’t re-process 149 segments because segment 150 failed.

Graceful shutdown means deployments don’t kill work. In-flight jobs complete or become resumable.

Structured logging means you can debug failures after the fact. Every significant event is recorded with context.

Startup validation means configuration errors are caught early. You don’t discover missing credentials during processing job, if they were never added in the first place.

Common mistakes to avoid

Unbounded queues. Memory grows without limit as queue depth increases. Bound your queues. Apply backpressure (return 503) when full. For internal systems, persist all submitted jobs to the database and process them in the background. Rate limiting and capacity planning can handle excessive request volumes.

Missing timeouts. Operations hang forever, blocking workers. Timeout everything: API calls, database queries, entire jobs.

Ignoring partial failures. One failed segment causes the entire job to fail. Track granular status. Report partial success. Enable resumption.

State in memory. Worker crash loses all in-progress state. Persisting state externally is a must.

Retry storms. All clients retry simultaneously after an outage (you restart the service). Adding jitter to backoff and using circuit breakers can help.

No graceful shutdown. Deployments kill jobs, causing failures and orphaned state. Handle SIGTERM, wait for active jobs to finish and drain queues.

Do you need all this?

< 10 jobs/hour, quick tasks: FastAPI BackgroundTasks is fine
100s of jobs/hour, mission-critical: Use these patterns
1000s+ jobs/hour: Consider dedicated systems (Celery, etc)

Building production-ready background processing systems requires attention to many details. Worker pools with bounded concurrency prevent resource exhaustion. Retry strategies with exponential backoff and jitter handle transient failures gracefully. Idempotent operations enable safe retries and resumption. Graceful shutdown protects in-flight work during deployments.

The audio processing system I built handles thousands of segments daily (tested with 20rps, with 10 segments each and 5 workers). The principles behind it would work equally well for video transcoding, document processing, email sending, or any other batch workload. I learned a lot from this experience and I will be using these principles in my future projects and learn more design patterns.

My understanding is, start simple and add complexity only when needed. Measure everything you can. And always design as if your system will fail, because eventually, it will. The question is whether it fails gracefully or just plain fails.

Lorbic

Python Background Workers: Architecture, Queues, and Retry Strategies

Why background processing is harder than it looks

Level 1: The simple approach

Level 2: Using a worker pool

The Worker Loop

Level 3: Making it resilient

Retry with exponential backoff

Making operations safe to repeat (Idempotency)

Level 4: Production Readiness

Graceful Shutdown

Structured Logging

Health Checks and Startup Validation

The reusable patterns

Common mistakes to avoid

Do you need all this?

Further Reading

Why background processing is harder than it looks

Level 1: The simple approach

Level 2: Using a worker pool

The Worker Loop

Level 3: Making it resilient

Retry with exponential backoff

Making operations safe to repeat (Idempotency)

Level 4: Production Readiness

Graceful Shutdown

Structured Logging

Health Checks and Startup Validation

The reusable patterns

Common mistakes to avoid

Do you need all this?

Further Reading

Welcome to the Fold.