All posts
Query OptimizationMay 27, 20269 min read

Creating indexes is easy. Dropping them is what saves your Aurora bill.

Every Postgres team I meet has an index problem. It's almost never "we need more indexes." It's "we're paying for 40+ indexes nobody reads, and they're killing writes, vacuum, and the monthly bill."

VS

Venkat Sakamuri

DeepSQL R&D · Ex Oracle Query Engine Team · YC & CMU

A row of white B-tree index icons with pruning shears cutting one off

TL;DR

  • Most Aurora/RDS Postgres clusters carry 30-60% dead index weight. Every one of them costs you on writes, WAL, vacuum, buffer cache, and storage IOPS.
  • pg_stat_user_indexes.idx_scan = 0 is not enough. You need workload history, replica usage, and constraint awareness before you drop anything.
  • On Aurora specifically, unused indexes are double-expensive: you pay for the storage and the I/O to replicate every index page change across the shared storage layer.
  • A disciplined drop cycle — not a heroic index-creation sprint — is the single highest-ROI thing a DBA can do this quarter.

The bias nobody talks about

Engineers love creating indexes. It feels productive. Query is slow → CREATE INDEX CONCURRENTLY → query is fast → ship. Nobody gets a high-five for DROP INDEX.

So indexes accumulate. A users table I audited last month had 17 indexes. Eleven of them had idx_scan = 0 over a 90-day window. Three were near-duplicates of the primary key with one extra column tacked on for a report that got deprecated in 2023. The table did 4,200 writes/second. Every insert was touching 17 B-trees.

We dropped 11 indexes. Write latency p99 went from 38ms to 11ms. Autovacuum cycles on that table dropped from every 6 minutes to every 40 minutes. Aurora I/O charges on that cluster fell 22% the next billing cycle.

That's not an optimization story. That's a hygiene story.

Why unused indexes are uniquely brutal on Aurora

On a self-managed Postgres box, an unused index costs you disk, buffer cache, write amplification on the local volume, and vacuum time. Annoying, but bounded.

On Aurora, the cost model is different and worse:

  1. Every index page change is a billable I/O. Aurora charges per I/O against the shared storage layer. An index on a hot column means every UPDATE that touches that column generates extra storage I/Os — for an index nobody reads.
  2. Storage is 3x replicated across AZs. Your 40GB of useless index is really 120GB of useless index from a durability standpoint, and Aurora prices accordingly.
  3. Buffer cache pressure on the writer cascades to readers. Useless index pages evict useful ones. Replica lag goes up. You "fix" it by upsizing to a bigger instance class, which is exactly the wrong move.
  4. Backups and snapshots inflate. Every snapshot carries the dead weight. Restore times balloon.

I've seen teams move from db.r6g.4xlarge to db.r6g.8xlarge to "fix" replica lag when the real fix was dropping 14 indexes on three hot tables. The downsize after cleanup paid for a junior DBA's salary.

Why idx_scan = 0 is not enough to drop

The naive query everyone runs:

SELECT schemaname, relname, indexrelname, idx_scan, pg_size_pretty(pg_relation_size(indexrelid)) AS size
FROM pg_stat_user_indexes
WHERE idx_scan = 0
  AND indexrelname NOT LIKE '%_pkey'
ORDER BY pg_relation_size(indexrelid) DESC;

This is a starting point, not a decision. It will get you fired if you act on it alone. Things it doesn't know:

  • Stats reset. Did someone run pg_stat_reset() last Tuesday? Then every "unused" index has been observed for 4 days, not 4 months. Always check pg_stat_database.stats_reset.
  • Replica usage. pg_stat_user_indexes on the writer doesn't see scans that happened on read replicas. Plenty of reporting workloads route to replicas. You need to aggregate across every node.
  • Uniqueness constraints. An index backing a UNIQUE constraint may show idx_scan = 0 for lookups but is enforcing data integrity on every insert. Drop it and you allow duplicates.
  • Foreign keys. An index on the referencing column may show zero scans but is preventing a seq scan on every parent-row delete. Drop it and a DELETE FROM parent WHERE id = ? goes from 2ms to 9 seconds.
  • Planner-only usage. Some indexes are used for statistics or to enable index-only scans on a covering combination. Plan changes after a drop can be non-obvious.
  • Periodic jobs. A month-end close, a quarterly audit query, a year-end tax export. idx_scan = 0 over 30 days doesn't mean idx_scan = 0 over 365.

A drop checklist that actually works

For each candidate, verify:

  1. idx_scan = 0 AND idx_tup_read = 0 AND idx_tup_fetch = 0 across all nodes (writer + every replica) for a window longer than your longest known batch job.
  2. Not backing a UNIQUE, PRIMARY KEY, or EXCLUDE constraint (pg_index.indisunique, indisprimary).
  3. Not the only index on a foreign-key referencing column.
  4. Not part of an active replication slot's logical decoding plan.
  5. Use ALTER INDEX ... SET (invalid = true) style? Postgres doesn't expose that, but you can do the equivalent with BEGIN; LOCK ... ; DROP INDEX ... ; ROLLBACK; in a test against a clone. Better: use the soft-disable pattern below.

The "hide before you drop" pattern

Don't DROP INDEX cold. Hide it from the planner first, watch for fallout, then drop. Two ways:

-- Option A: take it invisible for everyone
UPDATE pg_index
SET indisvalid = false
WHERE indexrelid = 'public.users_legacy_report_idx'::regclass;

(Requires superuser, and you should test this carefully — direct catalog updates are a last resort. On RDS/Aurora you don't have superuser, which leads to Option B.)

-- Option B: portable. Disable in a session and analyze.
BEGIN;
SET LOCAL enable_indexscan = off;
SET LOCAL enable_bitmapscan = off;
EXPLAIN (ANALYZE, BUFFERS) <your candidate queries>;
ROLLBACK;

For Aurora specifically, the cleanest pattern is: rename the index (ALTER INDEX ... RENAME TO _deprecated_*), wait one full business cycle (a week, a month, a quarter — depending on workload), then drop. Renames are metadata-only and instant; if a query suddenly breaks, you rename it back. No data motion, no rebuild.

Then, when you're confident:

DROP INDEX CONCURRENTLY public.users_legacy_report_idx_deprecated_2026q2;

Always CONCURRENTLY. Never block writes on a hot table to clean up your own mess.

The numbers that should make this a quarterly ritual

From audits across roughly 30 production Aurora/RDS Postgres clusters in the last year:

  • Median wasted index storage per cluster: 31% of total index size.
  • Median write-amplification reduction after cleanup: 2.1x fewer index pages touched per UPDATE on hot tables.
  • Median Aurora I/O bill reduction after a full cleanup pass: 18-24%, sustained.
  • Median autovacuum frequency reduction on cleaned tables: 3-5x longer between cycles.

You don't get those numbers from adding indexes. You get them from removing them.

What DeepSQL does about this

DeepSQL watches index usage across the writer and every replica continuously, not as a one-shot snapshot. It accounts for constraint backing, foreign-key support, and periodic jobs by correlating index reads with the full 90-day query workload from pg_stat_statements and our own capture layer. When it proposes a drop, it ships the ALTER INDEX ... RENAME soft-disable first, monitors for plan regressions and seq scan spikes on dependent queries, and only then opens a PR for the DROP INDEX CONCURRENTLY. The point isn't to suggest drops — every advisor does that and most are wrong. The point is to close the loop safely so the cleanup actually ships.