The marketing for “schemaless” architecture was brilliant. It promised speed, agility, and a life free from the tyranny of ALTER TABLE migrations. When PostgreSQL introduced JSONB in version 9.4, many developers saw it as a green light to treat Postgres like MongoDB.
I’ve seen this pattern in dozens of codebases. It starts with a single metadata column, but within months, the entire business logic is buried inside a 2MB JSON blob.
This isn’t agility; it’s an architectural debt trap. By bypassing the relational schema, you aren’t just losing type safety; you are fighting the physical storage engine of PostgreSQL.
The Cost of Write Amplification
PostgreSQL was designed to handle rows. When you update a row, Postgres doesn’t change the data in place. Instead, it uses a mechanism called MVCC to create a new version of that row.
Glossary //
If you have a normalized table with 50 columns, and you update a single boolean flag, Postgres writes a few bytes to a new row version.
But if you store your data in a massive JSONB document, Postgres has no idea how to “partially” update it. If your document is 1MB and you update a single key, Postgres must write the entire 1MB document to a new location on disk.
Disk Bloat from TOAST
Postgres stores data in 8KB pages. If a row exceeds that size, Postgres moves the large data into a separate storage area called TOAST.
Glossary //
When you abuse JSONB to store large documents, almost every row ends up in TOAST. Every time you update a small field inside that JSON, Postgres has to rewrite the entire TOAST entry.
I ran a quick test to see the impact. I updated a single flag in a 2MB JSON document 1,000 times.
1-- Checking the physical size of the TOAST table
2SELECT relname, relpages
3FROM pg_class
4WHERE oid = (SELECT reltoastrelid FROM pg_class WHERE relname = 'users_jsonb');The result was 2GB of disk bloat for a single row’s updates. In a normalized table, that same operation would have cost less than 100KB. You are literally burning disk I/O to maintain a “flexible” schema. Quite literally, your disk is getting TOASTed.
Query Planner Blindness
The PostgreSQL query planner is an incredible piece of engineering. It uses statistics to decide whether to use an index or scan the whole table. It knows how many NULLs are in a column and the distribution of values.
But the planner is blind to what is inside your JSONB blob.
If you query a standard column, the planner knows exactly how many rows to expect. If you query a key inside JSON, it has to guess. Usually, it guesses a very small number (like 10 rows).
If the planner thinks only 10 rows match, it might choose a “Nested Loop” join, which is fast for 10 rows but catastrophic for 100,000 rows. I’ve seen queries that should take 5ms balloon to 30 seconds because the planner made a bad guess based on a JSONB blind spot.
The Cost of GIN Indexes
To make JSON queries fast, you usually end up using a GIN (Generalized Inverted Index).
GIN indexes are powerful, but they are heavy. Every time you insert a JSON document, Postgres has to break the entire document apart, find every key and value, and update the index.
In my benchmarks, inserting into a table with a GIN index is 5x to 10x slower than a standard B-Tree index. You’ve traded your write performance for the convenience of not defining a schema.
Valid Use Cases for JSONB
After heavily trashing it, I should clarify: JSONB is an incredible tool when used for its intended purpose. You should absolutely use it for:
- Third-Party Webhooks: When a service like Stripe or GitHub sends you a webhook, you don’t control the schema. Storing the raw payload in a
JSONBcolumn ensures you never drop data if they add a new field. - User-Defined Custom Fields: If you are building a SaaS where users can define their own attributes for a CRM contact, a sparse
JSONBcolumn is vastly superior to creating a table with 100custom_field_xcolumns. - Historical Snapshots: Storing a frozen snapshot of an order receipt or a configuration state at a specific point in time, where you need to guarantee the shape of the data won’t change even if your active relational tables evolve.
Fixing Slow JSON Queries
If you have a valid JSONB payload but find yourself frequently filtering or joining on a specific key inside of it, you don’t have to choose between flexibility and performance. You can use Generated Columns to bridge the gap:
1ALTER TABLE users
2ADD COLUMN external_id TEXT
3GENERATED ALWAYS AS (payload->>'id') STORED;
4
5CREATE INDEX idx_external_id ON users(external_id);This gives you the best of both worlds: the raw payload stays safely in your JSON blob, but the query planner gets a physical column with real statistics to work with.
Don’t let the allure of “agility” trick you into building a second-rate document store inside the world’s most advanced relational database. Define your core schema, and use JSONB only for the margins. Your disk, your CPU, and your future self will thank you.