All posts
Database CostMay 28, 20267 min read

The Dashboard Tax: How 10 Queries Quietly Bankrupt Your Aurora Bill

Your BI dashboards are a hidden tax on your database. A few slow, recurring queries can generate millions of needless IOs, quietly inflating your Aurora bill by tens of thousands per month. Here’s the math, and how to stop it.

VS

Venkat Sakamuri

DeepSQL R&D · Ex Oracle Query Engine Team · YC & CMU

An illustration of a database under heavy load from numerous queries, represented by lines converging on a central cylinder.

TL;DR

  • Your BI tool is hammering your database with dozens of identical, recurring queries. The constant auto-refresh creates a massive, hidden IO tax that drives your Aurora bill through the roof.
  • Pre-aggregation via materialized views is the common reflex, but it often just shifts the cost. You end up scanning more data to build the rollups than the dashboards ever read, burning IO 24/7 for dashboards viewed 8/5.
  • You're flying blind. The core problem is you aren't measuring the IO cost of a query against its actual usage or business value. You have to treat database queries as a COGS, not just an operational side effect.

I’ve seen this happen a dozen times. A company migrates to Aurora Postgres. Everything is great. Six months later, the CFO is knocking on the VP of Engineering’s door, holding a bill for $50,000 more than expected, demanding to know what happened. The culprit isn’t a catastrophic failure or a massive spike in user traffic. It’s slow, quiet, and insidious: the executive dashboard.

That 10-panel dashboard showing MRR, churn, and regional sales looks harmless. But it’s layered on top of a BI tool that translates every panel into a SQL query. And it’s set to auto-refresh every five minutes for every user who has it open on a second monitor. What started as a simple monitoring tool becomes a denial-of-service attack you inflict on yourself.

This is the dashboard tax: the compounding cost of inefficient, recurring queries that create a permanent floor of CPU and I/O consumption. And on a cloud database like Aurora, where you pay per I/O, that floor can get very expensive, very fast.

The Math of Financial Ruin

Let's get specific. Most people think their Aurora bill is just instance hours. They're wrong. A huge chunk of it is I/O. As of writing, Aurora I/O is priced at $0.20 per 1 million requests.

What is an "I/O request" in Aurora's world? It's a read operation. When Postgres needs a data block that isn't in its buffer cache, it has to fetch it from Aurora's distributed storage layer. Each 8KB page fetched is one I/O. A single query that needs to read 10GB of data that isn't cached will perform 10 GB / 8 KB = 1.25 million I/O operations. That single query run just cost you $0.25.

Now, let's go back to our dashboard. Imagine we have a 200GB orders table. One of the dashboard panels calculates "Sales by Region, Quarter-to-Date." The query looks something like this:

SELECT
  region,
  SUM(order_total)
FROM orders
WHERE
  order_date >= date_trunc('quarter', now())
GROUP BY 1;

Assume there's no index on order_date. The planner has no choice but to do a Sequential Scan over the entire 200GB table, filter the rows, then aggregate. Here’s what a piece of that plan might look like:

EXPLAIN (ANALYZE, BUFFERS)
SELECT ... -- (the query from above)

-- ... other plan nodes ...
->  Seq Scan on orders  (cost=0.00..650123.45 rows=1234567 width=12)
      (actual time=0.123..15890.123 rows=1000000 loops=1)
      Filter: (order_date >= '2023-10-01 00:00:00'::timestamp without time zone)
      Buffers: shared read=25000000
      I/O Timings: read=12345.678
-- ... other plan nodes ...

See that Buffers: shared read=25000000? That's 25 million 8KB blocks read from storage. That's 25 million IOs. At $0.20 per million, that single query refresh costs $5.00.

Let’s say your dashboard has 10 panels. Five of them are simple and well-indexed. The other five are beasts like this one, each costing a couple of dollars. Let's be conservative and say the entire 10-panel dashboard costs $10 in I/O to refresh just once.

Now, the multiplier effect:

  • The dashboard auto-refreshes every 5 minutes (12 times per hour).
  • During an 8-hour workday, you have 30 people (execs, sales ops, managers) keeping it open.

Let’s do the math for one month:

$10 per refresh * 12 refreshes/hr * 8 hours/day * 30 users * 22 workdays/month = **$633,600 per month**

This is not a typo. This is real math. Maybe your numbers are a tenth of this. It's still $63k/month spent on refreshing a handful of analytics. This is pure waste, driven by architectural neglect. And this doesn't even cover the secondary effects: buffer cache thrashing that slows down your actual application queries, and constant CPU pressure from hash aggregations and sorts spilling to disk.

The 'Fix' That Isn't: Pre-Aggregation Theater

When a DBA or platform engineer finally tracks down the problem using pg_stat_statements or Aurora Performance Insights (looking for queries with high blk_read_time and total_exec_time correlated with IO:DataFileRead wait events), the knee-jerk solution is always pre-aggregation.

"Let's just create a materialized view!"

So, you write a CREATE MATERIALIZED VIEW sales_by_region_quarterly AS ... and set up a cron job to run REFRESH MATERIALIZED VIEW sales_by_region_quarterly every 10 minutes.

The query against the MV is lightning fast. Problem solved, right? Wrong. You just traded one problem for another, and sometimes, a more expensive one.

  1. Over-Aggregation & Wasted Computation: The engineering team, trying to be helpful, creates a summary table that aggregates not just by region but also by product_category, sales_channel, and customer_segment. The resulting MV is huge. But 95% of dashboard queries only ever filter by region. You are computing and storing aggregates that are never used.
  2. The 24/7 Refresh Cycle: The cron job runs the refresh every 10 minutes. Around the clock. On weekends. On holidays. But the dashboard is only heavily used from 9 AM to 5 PM, Monday to Friday. You are burning I/O to rebuild this view at 3 AM on a Sunday for absolutely no one.
  3. Vacuum & Replica Lag Catastrophes: A REFRESH MATERIALIZED VIEW (without CONCURRENTLY) locks the view. A REFRESH ... CONCURRENTLY is better, but it still runs a massive query on your primary instance. This generates a storm of Write-Ahead Log (WAL) traffic. Your read replicas—which are likely serving the dashboard queries—can struggle to keep up, leading to high replica lag. The refresh process also creates enormous amounts of dead tuples, putting your autovacuum workers on a treadmill they can never get off of, consuming even more CPU and I/O.

You haven't solved the cost problem; you've just smeared it around and hidden it in a cron job. Now your base table scans happen on a schedule, but they are often bigger and less efficient than the original queries they replaced.

The Real Trade-Off Framework

Choosing an aggregation strategy isn't a one-time decision. It's a constant balancing act based on query cost, usage patterns, and data latency requirements. Here's how I think about it.

  • Live Aggregation: The default. Best for low-usage dashboards, queries that are already perfectly indexed, or when data must be seconds-fresh. Cost Profile: Expensive per-query, zero maintenance cost.
  • Scheduled Full Rollup (e.g., REFRESH MV): The blunt instrument. Use this when the sum of I/O from live queries over a period (e.g., one day) is far greater than the cost of one full rebuild. Best for very high-usage dashboards where data can be 15-60 minutes stale. Cost Profile: Cheap per-query, expensive maintenance cost.
  • Incremental Rollup (Triggers / pg_ivm): The sophisticated approach. Use triggers on the source tables to update an aggregate table row-by-row. Or, if you're on a modern Postgres version and feeling brave, explore extensions for incremental view maintenance. This is the most efficient but carries the highest complexity and maintenance burden. Cost Profile: Cheap per-query, moderate and continuous maintenance cost.
  • Application-Level Caching (Redis, etc.): A shallow fix. Good for caching the final results of a query. But as soon as a user changes a date filter on the dashboard, it's a cache miss, and you're back to hitting the database. It solves for repetition, not for variation.

The key is that you need to be able to model the costs of each approach for your specific workload. You can't make an economic decision without economic data.

What DeepSQL Does About This

We built DeepSQL because we were tired of fighting this battle with a patchwork of pg_stat_statements, Python scripts, and spreadsheets. DeepSQL connects directly to your database and your cloud bill. It automatically attributes Aurora I/O costs to every single query, user, and BI dashboard panel. We show you the 10 queries that make up 80% of your bill and tell you which pre-aggregations are actually saving you money versus which ones are just burning I/O on schedules and dimensions nobody uses. Our system gives you a concrete recommendation: "This query from your Executive Dashboard is costing you $4,200/month in I/O. Converting it to a materialized view, refreshed hourly from 8 AM to 6 PM, will reduce its cost by 95%."

A balance scale weighing a single large block against a pile of many small pebbles, illustrating the trade-off between pre-aggregation and live queries.