All posts
Database CostMay 27, 20268 min read

Materialized Views on Aurora Postgres: Your Most Expensive, Unaudited Line Item

Teams use Materialized Views to fix slow dashboards, then get blindsided by the six-figure Aurora I/O bill. A single `REFRESH` running on a cron job is likely costing you more than your lead engineer's salary.

VS

Venkat Sakamuri

DeepSQL R&D · Ex Oracle Query Engine Team · YC & CMU

An illustration showing the hidden chaos of a materialized view refresh beneath a calm surface, representing hidden costs.

TL;DR

  • REFRESH MATERIALIZED VIEW on Aurora Postgres isn't a metadata operation. It runs a full query, scanning base tables and translating directly into millions of billable I/O operations that dominate your cloud bill.
  • A default, non-CONCURRENT refresh takes an ACCESS EXCLUSIVE lock, blocking all reads from the view. This causes intermittent but regular failures for any dashboard or application using it.
  • The I/O cost of frequent refreshes often dwarfs the performance gains from reads. We regularly see MVs with a negative financial ROI, silently burning tens of thousands of dollars per month.

I was reviewing an architecture for a team seeing their Aurora bill spiral out of control. The database hosted a multi-tenant SaaS application, storage growth was linear, and user traffic was stable. Nothing in the application logs suggested a crisis. But their AWS bill told a different story: the line item for Aurora I/O operations was up 400% in two quarters.

The culprit? A set of 12 materialized views, added six months prior to speed up an analytics dashboard. One of them, refreshed every five minutes, was responsible for over $20,000 a month in I/O charges alone.

Materialized views feel like a silver bullet. A complex, slow SELECT statement gets pre-computed into a table, and queries against it are instant. The problem is solved. Except it isn't. On a pay-per-I/O platform like Aurora, you've traded a query latency problem for a massive, hidden cost problem.

The Anatomy of an Aurora Bill & The MV Trap

Your Aurora bill has two main drivers beyond the instance compute cost:

  1. Storage: Priced per GB-month (around $0.10 in us-east-1). This is intuitive and easy to forecast.
  2. I/O Operations: Priced per million requests (around $0.20 per million in us-east-1). This is abstract, non-intuitive, and where costs hide.

An I/O operation in Aurora is a read or write of an 8KB page from the shared storage volume. When you run REFRESH MATERIALIZED VIEW mv_name;, Postgres doesn't do anything magical. It effectively runs:

-- This happens under the hood during a non-concurrent refresh
BEGIN;
LOCK TABLE mv_name IN ACCESS EXCLUSIVE MODE;
TRUNCATE mv_name;
INSERT INTO mv_name (SELECT ... FROM base_tables ...);
COMMIT;

The INSERT INTO ... SELECT ... part is the killer. To execute this, Postgres must run the original, expensive query from scratch. If that query requires a full table scan on a 500GB table, Aurora will dutifully read all 500 * 1024^2 / 8 pages from storage. That's 65 million pages, or 65 million billable I/O operations. For one refresh.

Let's look at a typical EXPLAIN for the query that defines an MV. It's usually a big aggregate over a fact table.

EXPLAIN SELECT
    customer_id,
    date_trunc('day', event_timestamp) as event_day,
    count(*) as event_count,
    avg(value) as avg_value
FROM events
WHERE event_timestamp > now() - interval '30 days'
GROUP BY 1, 2;

The plan will almost certainly involve a Sequential Scan on events if there's no perfectly-suited index.

                                                 QUERY PLAN
-------------------------------------------------------------------------------------------------------------
 HashAggregate  (cost=1234567.89..1234568.12) (actual time=25000.12..25000.45 rows=15000 width=32)
   Group Key: customer_id, (date_trunc('day'::text, event_timestamp))
   ->  Seq Scan on events  (cost=0.00..1100000.00) (actual time=0.05..18000.99 rows=50000000 ...)
         Filter: (event_timestamp > (now() - '30 days'::interval))

That Seq Scan is your budget draining away. Every time REFRESH runs, you pay for that scan in full.

Doing the Brutal Math

Let's make this concrete. Assume you have a 100GB events table and you create a summary MV that refreshes every 15 minutes to keep a dashboard fresh.

  • Table Size: 100 GB
  • Postgres Page Size: 8 KB
  • Pages in Table: (100 GB * 1024 * 1024) / 8 KB = ~13.1 million pages
  • I/Os per Refresh: A full table scan requires reading every page, so ~13.1 million I/Os.
  • Aurora I/O Price: ~$0.20 per million I/Os.
  • Cost per Refresh: 13.1 million * $0.20 / 1 million = $2.62

$2.62 doesn't sound terrifying. But you're doing this on a cron job.

  • Refreshes per Day: (60 minutes / 15 minutes) * 24 hours = 96 refreshes.
  • Daily I/O Cost: 96 * $2.62 = $251.52
  • Monthly I/O Cost: $251.52 * 30 = $7,545.60

For a single materialized view, you are paying nearly $8,000 per month. That's before we even account for the storage cost of the MV itself (a rounding error at 100GB * $0.10/GB-mo = $10/mo) or the cost of replicating that churn.

Locking, Lies, and Dashboard Outages

The default REFRESH doesn't just cost money; it costs availability. It acquires an ACCESS EXCLUSIVE lock on the materialized view for the duration of the refresh. This is the same lock level used by DROP TABLE. It blocks everything, including SELECT statements.

If your refresh takes 30 seconds to run every 5 minutes, your dashboard is guaranteed to be throwing errors for 10% of the time. Users will complain, and your engineers will burn time chasing transient failures that are actually by design.

The textbook solution is to use CONCURRENTLY:

REFRESH MATERIALIZED VIEW CONCURRENTLY mv_customer_summary;

This is much kinder. It takes a weaker lock, allowing reads to continue while the refresh happens in the background. But it is not a free lunch. To work, CONCURRENTLY requires a UNIQUE index on the MV. It then performs a much more complex operation:

  1. Creates a new temporary table with the fresh data.
  2. Compares the new temp table against the old MV data, generating a diff.
  3. Issues individual DELETE, UPDATE, and INSERT statements against the real MV to apply the diff.
  4. Drops the temporary table.

This avoids the exclusive lock, but it often generates more total write I/O than the simple TRUNCATE and bulk INSERT. Your availability problem is solved, but your cost problem may have just gotten worse.

The Replica Amplification Tax

Here's the final turn of the knife. In an Aurora cluster, writes to the primary instance generate log records that are shipped to your read replicas. The replicas must then apply these changes to their own page caches to stay in sync. All of this activity contributes to the cluster's total billable I/O count.

A full MV refresh is one of the most write-intensive operations possible. You are effectively deleting and rewriting the entire table's contents. This creates a massive storm of redo log traffic. If you have three read replicas, that $7,500/month MV refresh isn't just generating I/O on the primary—it's causing a significant downstream I/O echo across your entire cluster as replicas scramble to apply the changes.

You are paying to replicate data churn that might not even be read before the next refresh cycle stomps all over it again.

What DeepSQL does about this

This isn't a theoretical problem. We see it constantly. MVs are a tool, and like any tool, they can be misused. Blindly throwing them at a problem without measuring the total cost of ownership is engineering malpractice.

DeepSQL doesn't guess. It measures, correlates, and provides specific, actionable advice.

  1. Calculates Financial ROI: By ingesting metrics from pg_stat_statements and cloud provider APIs, DeepSQL automatically calculates the total I/O cost of each REFRESH MATERIALIZED VIEW command. It then compares this to the cumulative I/O saved by all the read queries served by that MV. It surfaces a simple report: "mv_user_summary costs $7,545/mo in refresh I/O but only avoids $800/mo in query I/O. Its net financial impact is -$6,745/mo."

  2. Identifies Lock Contention: DeepSQL monitors pg_locks and wait events. When it sees read queries queuing up behind a REFRESH operation's ACCESS EXCLUSIVE lock, it flags the MV and explicitly recommends switching to REFRESH CONCURRENTLY, while also projecting the potential I/O cost increase.

  3. Proposes Smarter Alternatives: Often, an MV is just a costly crutch for a missing index. DeepSQL analyzes the underlying MV definition query. If it finds that the expensive Seq Scan could be replaced by a far cheaper Index Scan with the addition of a specific covering index on a base table, it will recommend creating the index and dropping the MV. This is the best outcome: the original queries become fast enough that the MV is no longer needed, eliminating both the I/O and storage cost entirely.

An illustration comparing a non-concurrent refresh (a total roadblock) with a concurrent refresh (a single lane closure with traffic flowing).