I thought processing audio in the background was simple: spawn a thread, run the script, save the file. Then I hit 200 concurrent requests, and it failed epically.
The CPU spiked to full usage because of pydub’s processing. The TTS API didn’t rate-limit me but, it was horribly slow. Then the jobs started failing. Half the jobs died silently. The other half wrote corrupted files because of race conditions I didn’t know existed. And when I deployed a little fix? The deployment killed in-flight jobs, leaving orphaned audio segments scattered across cloud storage directory.
This is not a guide about how to spawn a background task. This is about what happens after that, when the simple script becomes production infrastructure, when “it works on my machine” becomes “why did it fail in production and how can I fix it?”
Referencing existing production systems was a great way to learn about the tradeoffs and patterns that exist in the space. I found a lot of cool articles. So, I started looking for architectural keywords, patterns, and tradeoffs. And started reading about their implementation in production.
Then I built them into the batch audio processing system. Python, FastAPI, asyncio. It takes text, sends it to Google Cloud Text-to-Speech, processes the audio, and stitches hundreds of segments into final output. I wrote down every pattern, every decision, every tradeoff. This is that document (with the important parts).
Why background processing is harder than it looks
Consider a client request to generate audio for 200 text segments. Each segment requires a TTS API call (300-1000ms), post-processing (adding silence, normalizing), upload to cloud storage, progress tracking, final stitching, and webhook notification.
Sequentially? That’s 4-5 minutes. You can’t hold an HTTP connection open that long.
The answer is obvious: background processing. Accept the request, return a job ID, process asynchronously. But this creates a cascade of new problems.
What happens if a worker crashes mid-job? What if two workers grab the same job? What about rate limits? Retries? What if the database locks under concurrent writes? What if a deployment kills a running job? How do you even debug failures that happen while you’re asleep?
This blog covers the patterns that solve these problems.
Level 1: The simple approach
Here’s how most people start:
@app.post("/generate")
async def generate_audio(request: AudioRequest): # Bad idea: processing in the request handler
for segment in request.segments:
audio = await tts_service.generate(segment.text)
await storage.upload(audio)
return {"status": "done"}
This fails immediately. The client times out. The request gets killed. Half the segments are processed, half aren’t. There’s no way to recover.
The first fix is obvious: move to background tasks.
@app.post("/generate")
async def generate_audio(request: AudioRequest):
job_id = create_job(request) # Stores job in database with PENDING status
asyncio.create_task(process_job(job_id)) # Fire and forget
return {"job_id": job_id, "status": "accepted"}
Better. The client gets a response immediately. But now you have new problems.
If the process restarts, all running jobs are lost. There’s no persistence. If process_job throws an exception, it vanishes silently. If you scale to multiple instances, you have no coordination. If a job takes too long, there’s no timeout.
This is where I learned and used the real architectural patterns and techniques.
Level 2: Using a worker pool
The fundamental unit of background processing is the worker pool: a fixed set of workers pulling jobs from a queue.
1+-------------------------------------------------------+
2| Job Queue |
3| [Job1] [Job2] [Job3] [Job4] [Job5] [Job6] ... |
4+--------------------------+----------------------------+
5 |
6 +------------------+------------------+
7 v v v
8 +---------+ +---------+ +---------+
9 | Worker1 | | Worker2 | | Worker3 |
10 +---------+ +---------+ +---------+Created with asciiflow
Unlike fire-and-forget tasks, a worker pool provides bounded concurrency, crash recovery through job persistence, and visibility into what’s running.
The queue can be in-memory (fast, but loses jobs on crash), database-backed (durable, simple), or a dedicated message broker (Kafka, RabbitMQ, SQS). I used SQLite with WAL mode. It’s surprisingly capable for moderate workloads, and deployment is trivial. One file, no infrastructure.
The Worker Loop
Here’s the skeleton of a worker in Python:
class Worker:
def **init**(self, db: Database, semaphore: asyncio.Semaphore):
self.db = db
self.semaphore = semaphore
self.shutdown_event = asyncio.Event()
async def run(self):
while not self.shutdown_event.is_set():
job = await self.db.claim_next_pending_job()
if job is None:
await asyncio.sleep(1) # No work, poll again
continue
async with self.semaphore: # Bound concurrent operations
try:
await self.process_job(job)
await self.db.mark_completed(job.id)
except Exception as e:
await self.db.mark_failed(job.id, str(e))
async def process_job(self, job):
for segment in job.segments:
if segment.status == "COMPLETED":
continue # Skip already-done segments (resumption)
audio = await self.tts_service.generate(segment.text)
await self.storage.upload(audio)
await self.db.mark_segment_completed(segment.id)
A few critical details here:
The claim_next_pending_job must be atomic. Two workers should never claim the same job. In SQL, this looks like:
UPDATE jobs
SET status = 'IN_PROGRESS', worker_id = ?
WHERE id = (
SELECT id FROM jobs
WHERE status = 'PENDING'
LIMIT 1
)
RETURNING \*;
The semaphore bounds concurrency. If you fire 1000 API calls at once, you’ll exhaust rate limits, connection pools, and most importantly, system resources like CPU and memory. Start with 5-10 concurrent operations and increase based on monitoring.
The loop checks for already-completed segments before processing. This enables resumption. If the job failed at segment 150, the retry skips segments 1-149.
Level 3: Making it resilient
Failures are not exceptional. They’re normal in distributed systems. There are many reasons why things can fail: networks drop, APIs rate-limit, databases lock, servers restart etc. Your system needs to survive all of this.
Retry with exponential backoff
The naive approach to retries: retry immediately. This is wrong. If an API is overloaded, hammering it with immediate retries makes things worse.
Exponential backoff spaces out retries. And it allows the system to cool down.
import asyncio
import random
from functools import wraps
def retry_with_backoff(max_retries=5, base_delay=1.0, max_delay=60.0):
def decorator(func):
@wraps(func)
async def wrapper(*args, \*\*kwargs):
last_exception = None
for attempt in range(max_retries):
try:
return await func(*args, \*\*kwargs)
except TransientError as e:
last_exception = e
if attempt == max_retries - 1:
raise
# Exponential backoff with jitter
delay = min(base_delay * (2 ** attempt), max_delay)
jitter = delay * 0.5 * random.random()
await asyncio.sleep(delay + jitter)
raise last_exception
return wrapper
return decorator
@retry_with_backoff(max_retries=5)
async def call_tts_api(text: str) -> bytes:
response = await http_client.post("/synthesize", json={"text": text})
if response.status_code == 429:
raise TransientError("Rate limited")
if response.status_code >= 500:
raise TransientError("Server error")
if response.status_code == 400:
raise PermanentError("Invalid input") # Don't retry this
return response.content
The jitter is crucial. Without it, all clients retry at the same instant after an outage, creating thundering herd spikes. Random jitter spreads the load and prevent repeated failures. For example if 1000 clients all retry at the same instant, they’ll all fail and retry at the same instant, creating a cascade of failures.
A 400 Bad Request won’t magically succeed on retry. A 429 or 503 might. Classify your errors and act accordingly.
Making operations safe to repeat (Idempotency)
What happens if a worker crashes after uploading an audio file but before marking the segment as complete? The retry will upload the same file again.
If your upload uses a deterministic key (e.g., job_123/segment_045.mp3), this is fine. Uploading the same content to the same key is a no-op. That’s idempotency.
For database updates, use conditional writes:
UPDATE segments
SET status = 'COMPLETED'
WHERE id = ? AND status = 'IN_PROGRESS';
This won’t double-complete a segment. If something else already completed it, the update affects zero rows.
For external APIs that aren’t idempotent, use client-generated request IDs:
request*id = f"job*{job*id}\_segment*{segment_id}\_v{attempt}"
response = await api.synthesize(text=text, request_id=request_id)
Many APIs deduplicate by request ID. Check your provider’s documentation to implement what’s best for your use case.
Level 4: Production Readiness
Your system works. Jobs process. Retries happen. But you’re not done. Production means handling deployments, shutdowns, and failures you haven’t anticipated yet while developing.
Graceful Shutdown
When you deploy new code, what happens to running jobs?
The naive answer: they die. SIGTERM kills the process. Jobs are left in IN_PROGRESS forever, or worse, with corrupted output.
The correct answer is to implement graceful shutdown.
import signal
import asyncio
class WorkerManager:
def **init**(self):
self.workers = []
self.active_jobs = set()
self.shutdown_requested = False
def setup_signal_handlers(self):
for sig in (signal.SIGTERM, signal.SIGINT):
signal.signal(sig, self._handle_shutdown)
def _handle_shutdown(self, signum, frame):
self.shutdown_requested = True
for worker in self.workers:
worker.shutdown_event.set()
async def wait_for_shutdown(self, timeout=30):
start = asyncio.get_event_loop().time()
while self.active_jobs:
if asyncio.get_event_loop().time() - start > timeout:
# Hard deadline: force exit
break
await asyncio.sleep(0.1)
Here the sequence matters:
- Receive SIGTERM
- Stop accepting new work
- Signal workers to stop pulling new jobs
- Wait for in-progress jobs to finish (with timeout)
- Flush any pending writes
- Exit
If a job doesn’t finish within the timeout, it should be resumable. When the new instance starts, it should pick up where the old one left off.
Structured Logging
When something fails, your logs are your only debugging tool.
Unstructured logs are useless:
Processing job job_123 segment 45 with voice en-US-Standard-A
Structured logs are useful and queryable:
import structlog
log = structlog.get_logger()
log.info(
"segment_processing_start",
job_id=job.id,
segment_id=segment.id,
voice=segment.voice,
attempt=attempt_number
)
This outputs JSON that you can search in your logging platform:
{
"event": "segment_processing_start",
"job_id": "job_123",
"segment_id": "45",
"voice": "en-US-Standard-A",
"attempt": 1,
"timestamp": "2026-02-01T10:30:00Z"
}
Log job state transitions, external API calls (with durations), retries, and errors. Don’t log sensitive data. Use log levels appropriately (DEBUG for verbose tracing, INFO for milestones, ERROR for failures).
Health Checks and Startup Validation
A service that starts with invalid configuration will fail eventually, usually at the worst time.
Validate everything at startup:
async def startup_validation(): # Check secrets exist
if not os.getenv("TTS_API_KEY"):
raise ConfigurationError("TTS_API_KEY not set")
# Check external connectivity
try:
await tts_client.ping()
except Exception as e:
raise ConfigurationError(f"Cannot reach TTS API: {e}")
# Check database
try:
await db.execute("SELECT 1")
except Exception as e:
raise ConfigurationError(f"Database unavailable: {e}")
# Check storage
try:
await storage.check_permissions()
except Exception as e:
raise ConfigurationError(f"Storage not writable: {e}")
If validation fails, exit immediately with a clear error message. Don’t try to “work around” missing configuration. It will cause confusing failures later.
The reusable patterns
No single pattern is sufficient. It’s their combination that produces reliability.
Statelessness means any worker can pick up any job. State lives in the database, not in memory.
Idempotency means operations are safe to repeat. Retries can’t corrupt data.
Bounded concurrency means you don’t overwhelm downstream services or your own resources. Semaphores control parallelism.
Granular progress tracking means jobs can resume from where they failed. You don’t re-process 149 segments because segment 150 failed.
Graceful shutdown means deployments don’t kill work. In-flight jobs complete or become resumable.
Structured logging means you can debug failures after the fact. Every significant event is recorded with context.
Startup validation means configuration errors are caught early. You don’t discover missing credentials during processing job, if they were never added in the first place.
Common mistakes to avoid
Unbounded queues. Memory grows without limit as queue depth increases. Bound your queues. Apply backpressure (return 503) when full. For internal systems, persist all submitted jobs to the database and process them in the background. Rate limiting and capacity planning can handle excessive request volumes.
Missing timeouts. Operations hang forever, blocking workers. Timeout everything: API calls, database queries, entire jobs.
Ignoring partial failures. One failed segment causes the entire job to fail. Track granular status. Report partial success. Enable resumption.
State in memory. Worker crash loses all in-progress state. Persisting state externally is a must.
Retry storms. All clients retry simultaneously after an outage (you restart the service). Adding jitter to backoff and using circuit breakers can help.
No graceful shutdown. Deployments kill jobs, causing failures and orphaned state. Handle SIGTERM, wait for active jobs to finish and drain queues.
Do you need all this?
- < 10 jobs/hour, quick tasks: FastAPI BackgroundTasks is fine
- 100s of jobs/hour, mission-critical: Use these patterns
- 1000s+ jobs/hour: Consider dedicated systems (Celery, etc)
Building production-ready background processing systems requires attention to many details. Worker pools with bounded concurrency prevent resource exhaustion. Retry strategies with exponential backoff and jitter handle transient failures gracefully. Idempotent operations enable safe retries and resumption. Graceful shutdown protects in-flight work during deployments.
The audio processing system I built handles thousands of segments daily (tested with 20rps, with 10 segments each and 5 workers). The principles behind it would work equally well for video transcoding, document processing, email sending, or any other batch workload. I learned a lot from this experience and I will be using these principles in my future projects and learn more design patterns.
My understanding is, start simple and add complexity only when needed. Measure everything you can. And always design as if your system will fail, because eventually, it will. The question is whether it fails gracefully or just plain fails.
Further Reading
These books and articles helped me a lot to build this system.
- For queue design: Chapter 11 on Queues and Streams of Designing Data-Intensive Applications by Martin Kleppmann
- For retry patterns: Chapter 5 on Stability Patterns of Release It! by Michael Nygard
- FastAPI Background Tasks
- structlog for Python
- SQLite WAL Mode Explained