Explore with AI
ChatGPTClaudeGeminiPerplexity
Essay

You've Outgrown Postgres for Analytics: The Five Numbers That Tell You When

Cover image for You've Outgrown Postgres for Analytics: The Five Numbers That Tell You When

The cohort query took 47 seconds yesterday. Today it just sat there. You cancelled it after two minutes, opened Datadog, and saw your API p95 spiked at the same moment — because the analyst's notebook is now in the same connection pool as checkout. Your read replica is at 80%. Your six materialized views all need to refresh, and the refresh is starting to overlap the freshness window. Board deck is Thursday.

Before you cave and pay for Snowflake, the real question is whether this is a Postgres-is-wrong problem or a Postgres-is-untuned problem. 90 million rows isn't supposed to be a wall. Someone, somewhere, has written down the actual numbers — what row count, what p95, what concurrency — at which Postgres stops being defensible for an analytical workload.

This is that post. Five thresholds with numbers. Most of what you'll find on this question comes from Postgres-extension vendors — Citus, Timescale, paradedb, Tinybird — whose business depends on you scaling around Postgres a little further. We don't have that constraint. Definite collapses one of the forward paths into a single tool, but we're not going to pretend it makes the diagnosis.

TL;DR

  • Postgres can run analytical queries — but five specific thresholds mark where it stops being defensible: largest fact table size, p95 dashboard latency, concurrent dashboard users, freshness SLA, and replica + materialized-view monthly cost.
  • Read replicas and materialized views are the right first moves. They have a known ceiling, not a band-aid label.
  • Three forward paths, smallest first: stay-and-tune (≤1 threshold crossed), bridge to a columnar engine via logical replication (2–3), full graduate to a columnar warehouse (4+).
  • The bridge path takes 1–2 sprints with a managed CDC tool, not a quarter. Definite is the platform option — one tool with the engine, ingestion, semantic layer, and dashboards built in. DuckDB, ClickHouse, and the full Snowflake/Fivetran/dbt/BI stack are engines or stacks you wire up yourself.
  • Postgres is the right OLTP source-of-truth before, during, and after the move. Graduating doesn't mean throwing out the work you did.

The cohort query hung — and your read replica is also pegged

If you've already added a replica and you're maintaining six materialized views to keep dashboards alive, you're not wrong. You're at the moment when that approach has a known ceiling.

The replica isolates analytical reads from production writes — that's why a Sev 1 from a runaway analyst query is no longer your top fear. Materialized views precompute the expensive aggregates so dashboards don't recompute them on every load. Both moves are taught at Postgres conferences. Both are correct. The question isn't whether you should have done them. You should have. The question is what they don't fix.

What they don't fix: the query physics. A 47-second cohort scan stays 47 seconds on a replica — it just doesn't take down checkout.

And six MVs become a refresh dependency graph. When refresh runs longer than your freshness window, you've reinvented batch ETL inside Postgres without the tooling. The Postgres community itself acknowledges this — a November 2025 r/PostgreSQL thread proposes a community patch making MV refresh O(delta) instead of O(total). People running Postgres at scale are working around the wall in public.

So let's name where the wall actually is.

The five thresholds: where Postgres analytics actually breaks

Every Postgres-vendor blog says "it depends." Here are the five things it depends on, with numbers.

Postgres committer Tomas Vondra published TPC-H archaeology in December 2024 — a fully Postgres-aligned, non-vendor source. His finding: a 10 GB analytical dataset alone needs ~13 GB of tables and another 30 GB of indexes for index-only scans to even work. Above 100 million rows on a wide fact table, that arithmetic stops favoring Postgres.

Public benchmarks land in the same place. ClickBench, the most-cited public OLAP benchmark, uses a 99,997,497-row dataset — and the ClickBench README explicitly notes that size is "rather small by modern standards but allows tests to be completed in reasonable time." Translation: 100M rows is the floor of "real OLAP," not the ceiling.

These are the median bands. Your numbers will move ±50% with schema width, index quality, and query shape. Treat them as zones, not lines.

A few of the bands deserve a note before you read the table. The p95 bands map to Nielsen Norman Group's 30-year-old user-attention research: 1 second is the limit of feeling uninterrupted, 10 seconds is the limit of held attention. A 12-second dashboard load is past the point where humans tolerate it without switching tabs. The concurrency bands map to AWS's published data on idle Postgres connections: 100 idle connections consume ~150 MB; 1,000 idle climb past 1 GB and start eating CPU just to maintain themselves. Default RDS max_connections tops out in the low thousands; PgBouncer's typical default_pool_size is 20. The cost band assumes AWS or GCP managed Postgres on-demand — bare-metal economics (Hetzner, dedicated DC) shift these by 3–5×, and a 1-year Reserved Instance cuts pricing ~30%.

ThresholdStay-and-tune zoneBridge zoneGraduate zone
Largest fact table size< 50M rows / < 50 GB50–500M / 50–500 GB> 500M / > 500 GB
p95 dashboard query latency< 2s2–10s> 10s
Concurrent analytical queries¹< 1010–50> 50
Dashboard freshness SLA> 5 min OK1–5 min OK< 1 min required
Replica + MV monthly cost²< $1k$1k–4k> $4k

¹ Concurrent analytical queries running at the same time — not logged-in dashboard seats. A team of 20 with one analyst at a time is a 1. ² AWS RDS on-demand for the replica instance + storage + IOPS. The MV refresh load on the primary is on top of this.

How to read the table: plug your numbers in. Zero or one threshold crossed — keep tuning. Two or three in the bridge zone — replication to a columnar engine is the smallest move that buys six months. Four or five in the graduate zone — you've left Postgres-for-analytics behind and you're paying for infrastructure that doesn't fix the problem.

Why columnar engines beat Postgres on analytical queries — briefly

Postgres stores data row-by-row. That's optimal for "give me one user's full record." Columnar engines store column-by-column, optimal for "give me the average of column X across 100 million rows." Two consequences:

First, the speedup. Add vectorized execution — columnar engines process batches of values inside CPU registers; Postgres processes one row at a time — and the speedup on aggregate queries is often 100× or more. On the ClickBench public benchmark, the median query runs in 258.5 seconds on Postgres and 0.24 seconds on DuckDB. Same hardware. Same dataset. Same SQL. Roughly 1,000× at the median.

Second, the storage. Postgres stores the same analytical dataset 5–7× less efficiently than a columnar engine. On the ClickBench dataset, 106 GB on Postgres vs. 20 GB on DuckDB and 15 GB on ClickHouse. (That ratio is for wide, denormalized fact tables; it narrows on heavily-indexed OLTP-shaped tables.) Your $0.08/GB-month bill is also paying for the wrong format.

A caveat the engineering reader will reach for: ClickBench is a wide-table aggregation against a denormalized hits dataset — the most favorable shape for columnar — and the Postgres baseline is unindexed. With aggressive indexing and partitioning, a tuned Postgres narrows the gap on point queries and short-range scans. The 100×–1,000× speedup holds for full-scan aggregates on big fact tables, which is the exact workload your dashboards run. That's the workload this post is about.

If you want the architectural deep-dive, we wrote about why we run Definite on DuckDB instead of Postgres for analytics.

Three forward paths, ranked smallest-move-first

Path 1: Stay and tune (the runway extension)

What you do: Aggressive partitioning on the events table by time. BRIN indexes on time columns instead of B-tree where you can get away with it. Parallel workers tuned for analytical queries. statement_timeout set to protect the primary from runaway joins. Denormalized analytical schemas separate from the OLTP schema so MV refreshes don't lock production tables.

The newest option worth knowing about is pg_duckdb: DuckDB's columnar execution engine running inside Postgres. Same database, columnar speed for analytical queries. It's still young (v0.x), but it's actively maintained by the DuckDB team and now ships as a Crunchy Postgres add-on. If you're at the early end of the bridge zone and you want to stretch Postgres a bit further before moving, pg_duckdb is the option that didn't exist a year ago.

What it buys you: 6–12 months on the same hardware before another wall.

When it runs out: when you cross into the bridge zone on two or more thresholds. After that, tuning yields diminishing returns.

Estimated effort: one focused sprint.

Path 2: Bridge via logical replication to a columnar engine (the lazy CTO move)

What you do: Enable Postgres logical replication. Stream changes to an analytical engine — either via a complete platform like Definite (where the CDC, the columnar engine, and the dashboard layer arrive as one product) or via a DIY pipeline you wire up yourself with PeerDB or Sequin into DuckDB or ClickHouse. Point your dashboards at the analytical side. Keep the app on Postgres untouched.

The Path 2 prerequisites people skip and then page about at 2am:

  • wal_level = logical in postgresql.conf (on RDS: rds.logical_replication = 1 in a custom parameter group, which requires a reboot).
  • REPLICA IDENTITY FULL on any table without a primary key (or stable unique index) — otherwise CDC can't track updates.
  • A budget for max_replication_slots and max_wal_senders. One per destination.
  • A monitor for replication slot lag. If the destination falls behind or the slot stalls, the primary's WAL grows until the disk fills. This is the single most common Postgres-CDC outage we've seen, and it doesn't show up in normal metrics.

ClickHouse acquired PeerDB in July 2024, shipped the Postgres CDC connector to public beta in February 2025, and launched Managed Postgres on ClickHouse Cloud in January 2026. The full stack — Postgres for OLTP, ClickHouse for OLAP — is now a single managed product. That's the pattern ClickHouse productized; it's not a fringe move.

What it buys you: Often 100× faster analytical queries (1,000× on aggregate-heavy ones) with tens of seconds to a few minutes of freshness lag depending on write volume. No migration of OLTP. No dbt project required to start. The dashboard team gets a sandbox; the app team gets their primary back. And because the analytical engine has the semantic layer built in (in the platform options below), AI questions answer against governed metrics instead of guessing at raw schemas.

When it runs out: Rarely. This is the long-term answer for most SaaS shops up to ~10 TB.

Estimated effort: 1–2 sprints with a managed bridge tool. Longer if you're hand-rolling the pipeline.

What you have to monitor (Path 2 ops checklist):

  • Replication slot health and WAL retention on the primary
  • CDC lag at the destination, with alerting when lag crosses your freshness SLA
  • Schema drift — what happens when an ALTER TABLE ships? Each CDC tool handles DDL differently
  • Read backpressure on the destination during burst write events

Named options — one platform, three DIY paths:

The first option is the only one that's a platform — meaning the engine, the CDC pipeline, the semantic layer, and the dashboard layer arrive as one product. The other three are engines or stacks you wire up yourself.

  • Definitethe platform option. Built on the same DuckDB columnar engine you'd otherwise install separately, with managed Postgres CDC (and 500+ other source connectors), a governed semantic layer (Cube.dev), an AI analyst (Fi) that answers in plain English against your governed metrics, and dashboards. One install, one bill, one place to grant access. The bridge that doesn't become its own stack to maintain.

For DIY paths (you supply the ingestion, modeling, and dashboard layer):

  • DuckDB / MotherDuck — a single-node columnar engine. Best when your largest fact table is under a few hundred GB. You'll need a CDC tool, a modeling layer, and a dashboarding tool around it.
  • ClickHouse Cloud (managed) or self-hosted ClickHouse (needs ops capacity) — a distributed columnar engine. Best for extreme aggregation throughput across many concurrent analytical users. Same caveat: you wire the rest.
  • Snowflake / BigQuery + Fivetran + dbt + a BI tool — works, but typically a quarter or more of setup and 4 vendor relationships to manage. Included for comparison.

If you're running sub-second customer-facing analytics (think: a chart inside your product that 1,000 users hit per minute), you'll need streaming aggregation or pre-computed projections on the analytical side; that's beyond what vanilla CDC + dashboards delivers. For internal dashboards, finance reporting, and exec views — the workloads driving most of this decision — tens-of-seconds freshness is faster than your BI tool's cache settings anyway.

Path 3: Full graduation (when you're past the bridge)

What you do: Separate the analytical schema entirely. Move the events fact table, the financial reporting tables, the customer-facing analytics tables to a columnar warehouse. Postgres remains the source of truth for OLTP. If you don't have a data person yet, now's the time to hire one.

What it buys you: Independence. Analytical workload is no longer constrained by OLTP infrastructure decisions, capacity planning, or the next replica upgrade.

When this is the right call: When you're in the graduate zone on three or more thresholds and you're running customer-facing analytics where freshness under 30 seconds is a product requirement, not a nice-to-have.

Estimated effort: 4–8 weeks for a small team. Longer if you're also adopting dbt + a separate BI tool simultaneously, which is where most "six-month migration" stories come from.

What you keep on Postgres, what moves to columnar

Graduating doesn't mean throwing out the work you did. The boundary is clean:

Stays on Postgres (OLTP, source-of-truth):

  • Users, accounts, sessions, auth state
  • Transactions, payments, billing source records
  • Application state, feature flags, config
  • Anything where a missed write is a customer-facing bug

Moves to columnar (analytics):

  • Events fact table
  • Derived cohort tables
  • Financial reporting tables
  • Marketing attribution tables
  • Customer-facing analytics
  • Anything where a 30–60 second freshness lag is acceptable

The bridge between them is logical replication / CDC. The replication runs continuously. The OLTP database doesn't change shape. Your application code keeps writing to Postgres exactly as it does today.

The point of this split isn't the migration — it's that you don't have to do it again. The same architecture supports 50M rows and 5B rows. The bridge doesn't get rebuilt at Series C.

FAQ: the questions you're actually asking

I have a 90M-row events table, p95 of 8s, one read replica, six materialized views — do I actually need a warehouse? You're in the bridge zone on at least two thresholds (table size and p95). You don't need a full warehouse migration. You need logical replication to a columnar engine. Pick one option from Path 2.

Is Citus or Timescale (now TigerData) a real fix or just a longer runway? Both are real for specific shapes — Citus for distributed Postgres-flavored OLAP, TigerData for time-series. Neither solves general analytical workload at the freshness + concurrency profile most SaaS dashboards need. They buy 6–12 months for some teams. For the per-vendor breakdown, see Best Managed Postgres for Analytics in 2026.

What's the smallest move I can make this quarter that solves dashboard timeouts? Path 2 (logical replication to a columnar engine). One sprint with a managed bridge like Definite. Two sprints with a DIY DuckDB or ClickHouse setup.

Will replicating Postgres to a columnar engine give me real-time dashboards or do I have to accept a freshness lag? Logical replication on a columnar engine usually gives you sub-minute freshness — faster than most BI dashboards refresh anyway. Sub-second freshness for customer-facing analytics is achievable, but it requires more infrastructure (streaming aggregation or pre-computed projections on the analytical side).

If I keep Postgres for the app and add a separate analytical engine, what's the minimum I have to wire up? Three things: a CDC source on Postgres (logical replication slot), a columnar destination with a Postgres-compatible ingest path, and a dashboard layer pointed at the destination. With an all-in-one platform, that's one tool. With DIY, it's three tools and a pipeline.

At what point is sticking with Postgres for analytics the wrong call? When you're in the graduate zone on three or more of the five thresholds, and adding more replicas isn't fixing it. Before that, tune-or-bridge. After that, you're paying replica + MV maintenance cost without solving the problem.

What did teams who migrated regret? The regret pattern is shape, not vendor. Teams who replicated to a single platform recovered weeks. Teams who replicated and started a dbt project and picked a separate BI tool and hired an analyst lost a quarter. The regret isn't "we left Postgres." It's "we re-platformed when we should have replicated."

Replicate, don't re-platform

The regret isn't we left Postgres. It's we re-platformed when we should have replicated. Postgres stays your source of truth. Analytics moves to a system designed for it. The bridge runs continuously between them, and you don't rebuild it at Series C.

That's the case for Definite. The bridge in one tool instead of three: a managed Postgres source connector, a columnar engine, a governed semantic layer, an AI analyst, and dashboards arrive as a single platform. One sprint to live. No dbt project. No BI tool to negotiate. No 4-vendor stack to babysit while your CTO writes a custom partition scheme at midnight. Start free.

For the rest of the Postgres cluster: PostgreSQL as a Data Warehouse: When It Works, When It Doesn't (the strong-form case for not running analytics on Postgres at all). Best Managed Postgres for Analytics in 2026 (per-vendor comparison if you're staying). The Best ETL Tools for PostgreSQL in 2026 (getting data out of Postgres into the warehouse you already own).

Your answer engine
is one afternoon away.

Book a 30-minute call. We'll build your first dashboard on the call — or you can stop paying us.