At a recent ClickHouse conference in New Delhi, I attended this Saturday. There were many interesting sessions, but one that stood out was a talk on multi-tenant analytics at scale. I was reminded of a fundamental truth in backend engineering: “The database that runs your app cannot be the database that analyzes your app”.
Early in a startup’s life, we shove everything into one database. User profiles, session data, logs, and analytics events all live in the same Postgres, Mongo, or Couchbase instance. It works, until it throttles.
Eventually, your analytics queries start creating backpressure on your login API. That is when you realize you have two distinct problems pretending to be one. You need to split your world into OLTP (Online Transaction Processing) and OLAP (Online Analytical Processing).
In my stack, this split is represented by two heavyweights: Couchbase and ClickHouse.
Note: While this post focuses on Couchbase and ClickHouse because they are part of my stack, these principles are universal.
- OLTP Examples: Postgres, MySQL, MongoDB, DynamoDB.
- OLAP Examples: Snowflake, BigQuery, Redshift, DuckDB.
The physics of Row vs Column remains the same regardless of the brand.
The fundamental Divergence: Row vs Column
The difference isn’t just branding. It is physics.
Couchbase (The OLTP Workhorse)
Couchbase is designed for Transactions. It is a document store that excels at “Key-Value” lookups.
Imagine you are fetching a User Profile. You want all the data for one specific user (Name, Email, Preferences, LastLogin). In a Row-Oriented database (and for the sake of simplicity, a JSON Document is just a fancy Row), this data is stored contiguously on disk.
- The “Row” Concept: Whether it is a PostgreSQL Tuple or a Couchbase Document, the principle is the same: all fields for one ID are stored together.
- Read Pattern: Fetch row
user:123. The disk head jumps to one spot and reads the whole block. Fast. - Write Pattern: Update
user:123. The DB locks that specific row/document, updates it, and moves on. Fast.
ClickHouse (The OLAP Beast)
ClickHouse is designed for Analytics. It is a Column-Oriented store.
Imagine you need to calculate the average age of 50 million users. You don’t care about their names, emails, or session IDs.
- The “Column” Concept: ClickHouse stores every
agevalue for every user together on disk. Allnamesare in a separate physical location. - Read Pattern: To calculate an average, the system only reads the
agefile. It skips 90% of the data it doesn’t need. Blazing Fast for Aggregations. - Write Pattern: Updating a single record is “expensive” because it involves touching many different files. ClickHouse is optimized for appending massive batches of immutable data. High Throughput.
See It In Action
Let’s say you have users.json (1TB of Data).
Query: SELECT AVG(age) FROM users
-
OLTP (Row/Doc):
- Load Block 1 (User A).
- Parse JSON/Row.
- Extract
age. - Discard rest.
- Repeat for 1 billion users.
- Result: Massive I/O waste. Your disk is reading “Name”,“Email”,“Bio” just to throw it away.
-
OLAP (Column):
- Load
age.bin(Just the integers). - Send array to CPU.
- Vectorized Processing (SIMD): The CPU adds 4 or 8 integers in a single instruction (AVX-512). I’ll write about SIMD in a future blog.
- Zero branching, zero parsing overhead.
- Result: 100x faster execution. This is why ClickHouse can scan billions of rows per second on a modern laptop.
- Load
The Architecture of the Split
So, how do you use both without going crazy? You need a bridge.
At this scale of our example, we use Couchbase for the “Live” path and ClickHouse for the “Retrospective” path.
-
The Live Path (Couchbase):
- User logs in? Hit Couchbase.
- User posts a comment? Write to Couchbase.
- Requirement: Sub-millisecond latency, high concurrency, strong consistency (for the user).
-
The Bridge (CDC - The “Real” Engineering):
- The Amateur Move (Dual Writes):
go
1// Don't do this 2db.SaveUser(user) 3clickhouse.InsertUser(user) // What if this fails? Now your analytics is drifting. - The Pro Move (Change Data Capture or CDC):
We treat the database log (WAL or DCP Stream) as the source of truth.
- User updates profile -> Couchbase commits to disk -> Couchbase emits event.
- Kafka/Redpanda captures this event.
- ClickHouse Connector consumes the batch and inserts it.
- Why this matters: If ClickHouse goes down for maintenance, Kafka buffers the events. When ClickHouse comes back up, it drains the buffer. Zero data loss. Zero impact on the Login API.
- The Amateur Move (Dual Writes):
-
The Retrospective Path (ClickHouse):
- Product Manager wants to know: “How many users logged in from iOS vs Android last month?”
- Dashboard query hits ClickHouse.
- Requirement: Throughput. Scanning millions of rows in milliseconds. Latency can be 200ms-500ms, but it must handle massive volume.
Why Not Just Use Postgres for Everything?
“But Postgres has columnar indexes now!” or “What about TimescaleDB?”
Yes, tools are converging. Postgres is the swiss-army knife of databases, and for 90% of use cases, it is enough. But at high scale, general-purpose tools hit hard limits where specialization wins.
1. The WAL Bottleneck (Ingestion)
Postgres guarantees ACID compliance for every transaction. This means every insert must be written to the Write Ahead Log (WAL) before it is visible.
- Postgres: High write volume = High WAL contention. If you try to insert 100k events/sec into Postgres, your WAL becomes the choke point.
- ClickHouse: Uses
MergeTreeengines. Data is written to parts in memory and flushed to disk as immutable files. There is no traditional “WAL” for every row. It is designed to swallow firehoses of log data.
2. The Maintenance Tax (Vacuum vs Merges)
- Postgres: Updates are “Copy on Write”. Old versions of rows are kept around (dead tuples). You rely on
AUTOVACUUMto clean them up. At terabyte scale, Vacuuming is a heavy background process that eats I/O. - ClickHouse: Updates are rare. It relies on background merges to combine data parts. This is efficient for write-once-read-many (WORM) patterns typical of analytics.
3. Resource Isolation
If you run a heavy analytical query (GROUP BY over 1 billion rows) on your primary Postgres instance:
- It poisons the page cache (evicting hot OLTP usage data).
- It maxes out CPU.
- Result: Your user login times out because the DB is busy calculating “Average Age”.
By splitting them, you ensure that your Analysts’ dashboard queries never bring down the Checkout page.
The key to a scalable architecture is knowing the tool’s purpose.
- OLTP (Couchbase): Optimized for the individual. “Who is User X?”
- OLAP (ClickHouse): Optimized for the aggregate. “What are users doing?”
Don’t force one to do the other’s job. Your on-call rotation (and your latency SLOs) will thank you.
Further Reading
- OLTP vs. OLAP – Clickhouse Engineering
- An overview of the fundamental differences between transaction processing and analytical processing.
- ClickHouse MergeTree Family – ClickHouse Docs
- Deep dive into the LSM-tree inspired architecture that powers high-speed analytics.
- PostgreSQL and ClickHouse for Analytics – ClickHouse
- A benevolent comparison helping you decide when to stick with Postgres and when to introduce ClickHouse.
- Couchbase Architecture Overview – Couchbase Docs
- Understanding the memory-first, shared-nothing architecture of a modern document store.