Memory Mechanics In Go - Stack vs Heap

When thinking about performance, it's easy to focus on Big O notation. But in Go, the difference between the Stack and the Heap is often the difference between a service that scales and one that chokes on GC pauses. This post explores escape analysis, the "Pointer Myth", and why passing by value is often 40x faster than passing by pointer.

WORDS: 1203 | CODE BLOCKS: 6 | EXT. LINKS: 3

We often talk about “fast” code in terms of Big O notation or algorithmic complexity. But in systems programming languages like Go, “fast” is often a function of where your data lives in memory.

When optimizing for high throughput, efficient loops and database indexes are only part of the story. Eventually, you have to talk about the Stack and the Heap.

Understanding the difference isn’t just trivia. It is the difference between a service that hums along at 100k OPS with flat latency, and one that chokes on Garbage Collection (GC) pauses every few seconds.

This post explores the mechanics of Go’s memory management, why the Stack is your friend, and why “using pointers for performance” is often a lie we tell ourselves.

The Two Worlds: Stack vs Heap

Memory in your Go program is effectively divided into two zones. They aren’t physically different RAM chips, but they are managed with completely different strategies.

The Stack: The Fast Lane

Every Goroutine in Go gets its own stack. It starts small (2KB) and grows/shrinks dynamically.

Think of the stack like a scratchpad. When a function is called, it gets a slice of this scratchpad (a stack frame) to write its variables. When the function returns, that slice is simply marked as “free” by moving a pointer.

  • Allocation Cost: One CPU instruction (subtracting from the stack pointer).
  • Cleanup Cost: Zero (adding to the stack pointer).
  • Access: Extremely fast (L1/L2 cache locality).

Technically, Go uses continuous stacks. When a goroutine runs out of stack space, the runtime allocates a new, larger segment and copies the existing stack to it. This “stack copying” is the only time stack memory incurs a significant cost, but it happens rarely compared to function calls.

The Heap: The Shared Chaos

Anything that cannot fit on the stack, or needs to live longer than the function that created it, goes to the Heap. The Heap is a massive pool of shared memory.

  • Allocation Cost: Expensive. The runtime (runtime.mallocgc) must find a free block of the right size, potentially hold locks, and update metadata.
  • Cleanup Cost: Very Expensive. This is the domain of the Garbage Collector. It has to scan, mark, and eventually sweep this memory.
  • Access: Slower (pointer chasing, less cache friendly).

The golden rule of Go performance: If it’s on the Stack, you don’t pay for it. If it’s on the Heap, you pay tax on every GC cycle.

Escape Analysis: The Compiler’s Decision

So, how does Go decide? You don’t have malloc or free. You have the compiler.

Go uses a technique called Escape Analysis to determine the lifetime of a variable. If the compiler can prove that a variable never leaves the function scope, it allocates it on the stack. If the variable “escapes” (e.g., is returned to a caller, stored in a global variable, or sent to a channel), it generally must be moved to the Heap.

Seeing It In Action

You can see this decision-making process by using the -gcflags flag.

go
 1package main
 2
 3type User struct {
 4    ID   int
 5    Name string
 6}
 7
 8func createUserV1() User {
 9    u := User{ID: 1, Name: "Vikash"}
10    return u // Returns a VALUE
11}
12
13func createUserV2() *User {
14    u := User{ID: 2, Name: "Vikash"}
15    return &u // Returns a POINTER
16}
17
18func main() {
19    _ = createUserV1()
20    _ = createUserV2()
21}

Compile with analysis enabled:

bash
1$ go build -gcflags="-m" main.go
2
3./main.go:14:2: moved to heap: u
  • createUserV1: Returns User by value. The data is copied to the caller’s stack frame. u stays on the stack. Fast.
  • createUserV2: Returns &u. The caller needs to access variables created inside createUserV2 after the function returns. If u were on the stack, it would be overwritten by the next function call. The compiler must move u to the heap. Slow.

Hidden Escape Routes

Returning a pointer is the obvious escape route. But there are subtle ones that catch even experienced engineers.

1. Interfaces (Dynamic Dispatch)

When you assign a concrete value to an interface, the runtime often needs to store the type information along with the data. If the compiler cannot deduce the type at compile time, or if the method call involves dynamic dispatch, the value often escapes.

go
1func Log(v interface{}) {
2    fmt.Println(v) // 'v' escapes to heap because fmt.Println takes interface{}
3}

This is why passing huge structs to log.Println or json.Marshal kills performance, it forces them onto the heap.

2. Slices with Dynamic Size

A slice on the stack must have a known size at compile time. If the size is determined by a variable, it often escapes.

go
1func makeSpace(n int) {
2    x := make([]byte, n) // Escapes to heap
3}
4
5func makeFixed() {
6    x := make([]byte, 64) // Stays on stack
7}

3. Closure Capture

If a closure (anonymous function) references a variable from the outer scope, and that closure is passed around, the closed-over variable escapes.

Benchmarking the Cost

Let’s prove the “Pointer Myth” wrong with data. We will compare passing a small struct (64 bytes) by value vs. by reference.

go
 1// Benchmark Code
 2type Config struct {
 3    ID   int
 4    Name string
 5    Data [50]byte // Padding to make it ~64 bytes
 6}
 7
 8//go:noinline
 9func byValue(c Config) int {
10    return c.ID
11}
12
13//go:noinline
14func byPointer(c *Config) int {
15    return c.ID
16}

Results:

text
1BenchmarkByValue-10     1000000000           0.30 ns/op        0 B/op       0 allocs/op
2BenchmarkByPointer-10    100000000          12.50 ns/op       64 B/op       1 allocs/op
  • Pass by Value: 0.3ns. Zero allocations. The CPU simply copies the registers.
  • Pass by Pointer: 12.5ns. One allocation.
  • Difference: 40x slower.

Why? Because byPointer creates a heap allocation (in this specific microbenchmark setup where the pointer effectively escapes or the compiler decides to alloc). In real-world code, even if it doesn’t always alloc, the pressure on the GC adds up.

Practical Advice for Optimization

When optimizing:

  1. Default to “Pass by Value”. Current CPUs can copy 64 bytes faster than you can blink. Do not use pointers just to avoid copying small structs.
  2. Watch your Interfaces. Heavy use of interface{} in hot paths (like middleware or data ingest loops) is a common source of hidden allocations.
  3. Pre-allocate Slices. Use make([]T, 0, capacity) where capacity is a constant if possible, or at least a calculated max, to avoid resizing allocations.
  4. Profile mallocgc. If runtime.mallocgc is more than 5% of your CPU profile, you have a churn problem. Use go build -gcflags="-m" to find the leaks.

Summary

  • Stack = Fast, local, free.
  • Heap = Slow, shared, taxed by GC.
  • Pointers = Cause heap allocations. Use them for semantics (I need to modify this), not for performance (unless the struct is > 2KB).

Memory management in Go is about empathy for the runtime. If you treat the Garbage Collector like a colleague who is already overworked, you’ll naturally write code that stays on the stack, keeps the heap clean, and scales effortlessly.

Further Reading

  1. Language Mechanics On Escape Analysis – Ardan Labs (William Kennedy)
    • The definitive 4-part series on how the Go compiler makes allocation decisions.
  2. A Guide to the Go Garbage Collector – Official Go Documentation
    • Deep dive into the tri-color mark-and-sweep algorithm and tuning GOGC.
  3. Go SliceTricks – Go Wiki
    • Efficiency patterns for manipulating slices without unnecessary allocations.