Why Your Goroutines Need a Speed Limit: Bounded Concurrency in Go

Reading part 14 of Deep Dive

TL;DR: Spawning go func() without a limiter is a recipe for system collapse. This guide details how to use Semaphores and Worker Pools to prioritize predictable stability over absolute speed, protecting downstream dependencies from the thundering herd.

It’s a rite of passage for every Go developer. You receive a list of 10,000 URLs to fetch or 50,000 rows to process. You wrap the workload in an unbounded go func() loop, achieving maximum throughput in milliseconds.

 1// Initial approach: unbounded concurrency
 2func ProcessItems(items []Item) {
  var wg sync.WaitGroup
  for _, item := range items {
      wg.Add(1)
      go func(i Item) {
          defer wg.Done()
          process(i) // HTTP request, DB write, etc.
      }(item)
  }
  wg.Wait()
12}

This code runs flawlessly on a local machine. However, in production, it triggers OOM (Out of Memory) kills and database connection failures. You achieved speed, but you sacrificed reliability.

Why cheap goroutines are expensive

Go makes concurrency easy because goroutines are lightweight. They start with a 2KB stack. While your CPU can manage 10,000 threads, your downstream dependencies cannot.

Unbounded go func() loops create a Thundering Herd. Instantaneous execution forces your system to simultaneously demand:

Memory: 10,000 goroutines require 20MB of stack space just to initialize. This excludes the heap allocations required to process JSON or hold HTTP buffers.
File Descriptors: Every outbound request requires a socket. Standard Linux environments cap file descriptors at 1,024 per process.
Connection Pools: Your database pool is a finite resource. If it is capped at 50 connections, 9,950 goroutines will block while holding onto their allocated memory, causing massive GC pressure.

To build resilient systems, you must impose a Speed Limit.

Use Semaphores for minimal refactoring

A Semaphore restricts the number of threads accessing a shared resource. In Go, you can implement this pattern using a buffered channel.

A buffered channel blocks when full. By using it as a token bucket, you control exactly how many goroutines execute their critical path at once. For production workloads requiring burst handling and precise refills, I use the TokenBucket implementation from my goutils library.

How to implement a channel-based Semaphore

Create a buffered channel with a capacity equal to your limit.
Push an empty struct into the channel before starting work (acquire token).
Read from the channel when the work finishes (release token).

 1func ProcessItemsWithSemaphore(items []Item, maxConcurrency int) {
  var wg sync.WaitGroup
  sem := make(chan struct{}, maxConcurrency) // The token bucket
 4
  for _, item := range items {
      wg.Add(1)
      go func(i Item) {
          defer wg.Done()
          sem <- struct{}{}        // Acquire token (blocks if full)
          defer func() { <-sem }() // Release token
          process(i)
      }(item)
  }
  wg.Wait()
15}

This pattern is ideal for quick scripts because it requires minimal code changes. However, it still spawns 10,000 goroutines. The loop completes instantly, leaving thousands of blocked goroutines parked in memory. For massive workloads, you need a more robust architecture.

Use Worker Pools for sustained processing

While a Semaphore is a traffic light, a Worker Pool is an assembly line. You spawn a fixed number of long-lived “Worker” goroutines that pull jobs from a shared channel.

How to architect an assembly line

Initialize a jobs channel.
Spawn N workers that range over that channel.
Feed items into the channel.
Close the channel to signal workers to exit.

 1func worker(id int, jobs <-chan Item, wg *sync.WaitGroup) {
 2    defer wg.Done()
 3    for item := range jobs {
 4        process(item)
 5    }
 6}
 7
 8func ProcessItemsWithPool(items []Item, numWorkers int) {
 9    jobs := make(chan Item, len(items))
10    var wg sync.WaitGroup
11
12    for w := 1; w <= numWorkers; w++ {
13        wg.Add(1)
14        go worker(w, jobs, &wg)
15    }
16
17    for _, item := range items {
18        jobs <- item
19    }
20    close(jobs) // Signal shutdown
21    wg.Wait()
22}

Worker Pools decouple the volume of work from resource consumption. Whether processing 100 items or 100,000, your system only spawns numWorkers goroutines. Memory profiles remain flat, and GC pressure is minimized.

Trade-offs and Costs

Engineering is a series of trade-offs. Bounding your concurrency introduces specific costs:

Latency vs. Stability: Limiting concurrency increases the total execution time for a batch. You trade absolute speed for a predictable, non-collapsing system.
Complexity: Worker Pools require more boilerplate than a simple go func(). You must manage channel lifecycles and context-based cancellation properly.
Deadlock Risk: Improper channel management in pools can lead to deadlocks if workers are blocked while the feeder loop waits for capacity.

Optimizing for speed is the first phase of implementation. The second phase; and the most critical for production reliability, is identifying how safely to constrain that speed.

Unbounded concurrency is a bug that waits for a traffic spike to trigger. Establish a habit: every time you type go func(), identify its upper bound. If the loop is tied to user input or database rows, you need a speed limit.

Grab a Semaphore for scripts and a Worker Pool for production pipelines. Respect the hardware, and your database will remain stable.