How do I find which PostgreSQL queries are slowest without checking logs manually?

Enable pg_stat_statements and run: SELECT query, calls, mean_exec_time, total_exec_time FROM pg_stat_statements ORDER BY total_exec_time DESC LIMIT 20; This shows you cumulative time across all calls, which matters more than single-query slowness. Reset it with SELECT pg_stat_statements_reset() after a deploy so you're not looking at historical noise.

Why does PostgreSQL ignore my index even though I created it on the right column?

Most common reasons: 1) Implicit type cast — if your column is integer but you query WHERE id = '123' (a string), Postgres can't use the index. 2) The planner thinks a sequential scan is cheaper because it estimates too few rows to be returned. Fix the cast issue by matching types explicitly. For the planner issue, check if your table stats are fresh with ANALYZE tablename, then run EXPLAIN again.

What's the difference between VACUUM and ANALYZE and do I need to run them manually?

VACUUM reclaims storage from dead rows left by updates and deletes — without it, table bloat degrades performance. ANALYZE updates the statistics the query planner uses to pick execution plans. Autovacuum handles both automatically, but for high-write tables (millions of writes/day), the default thresholds are too conservative. You can tune per table with ALTER TABLE your_table SET (autovacuum_vacuum_scale_factor = 0.01). Only run them manually if you have a specific bloat problem you've confirmed with pgstattuple.

Should I use PgBouncer or handle connection pooling inside my app?

Use PgBouncer. Application-level pooling (like Rails' connection pool or Node's pg-pool) still opens real Postgres connections per process. If you have 10 app servers each with a pool of 10, that's 100 Postgres connections before you've done anything. Postgres gets slow past 200-300 active connections. PgBouncer in transaction mode multiplexes all of those down to 20-30 actual server connections. The only catch: you can't use SET LOCAL, advisory locks, or prepared statements reliably in transaction mode — check your ORM's behavior with it before deploying.

My PostgreSQL Queries Were Destroying Our App at Scale — Here's How I Fixed Them

The Moment You Know You Have a Query Problem

Everything works great until it doesn’t. I’ve watched dashboards look perfectly healthy — p50 latency under 50ms, error rate flat — right up until a table crosses 500k rows and suddenly specific endpoints start timing out. The failure mode is never gradual. One day it’s fine, the next day your on-call Slack is a disaster. The rows accumulated slowly; the performance cliff arrives fast.

The symptom that tips you off isn’t average latency — it’s p99. Your p50 might stay at 60ms while p99 quietly climbs to 4 seconds on an endpoint that does nothing exotic. Something like a user activity feed, an order history list, a notification count. Endpoints that look like one-liners in the controller. The thing that caught me off guard the first time was that these endpoints weren’t slow in any way I could predict by reading the code. A SELECT * FROM orders WHERE user_id = $1 ORDER BY created_at DESC LIMIT 20 has no business taking 3 seconds. But without the right index, across 800k rows, Postgres is doing a sequential scan every single time, and ORDER BY on an unindexed column means it’s sorting a massive result set before handing you 20 rows.

Pull the Actual Culprit Out of pg_stat_statements

This extension saved me hours of guessing. Enable it in postgresql.conf by adding shared_preload_libraries = 'pg_stat_statements', then restart Postgres and run CREATE EXTENSION pg_stat_statements;. After that, one query tells you everything:

📚 Related Reading
The Moment You Know You Have a Query Problem

SELECT
  query,
  calls,
  total_exec_time / calls AS avg_ms,
  total_exec_time,
  rows / calls AS avg_rows
FROM pg_stat_statements
ORDER BY total_exec_time DESC
LIMIT 10;

I ran this on a production database that had been running for three weeks without a restart (so the stats had accumulated real traffic) and found a single query eating 43% of total database execution time. It was called tens of thousands of times per hour because it sat inside a loop in our notification service — something a code review had missed because the query itself looked innocent. The total_exec_time column is the one you want to sort by, not avg_ms. A query called 200k times at 2ms each destroys you faster than a query called 10 times at 500ms each.

Why Staging Lies to You Every Single Time

Staging environments almost always have a fraction of the data production has. Maybe 10k rows in your orders table versus 900k in production. At 10k rows, Postgres’s query planner makes a completely rational decision: a sequential scan is fast enough that it’s not worth using an index. The planner uses statistics (collected by AUTOVACUUM) to decide whether an index scan or a seq scan is cheaper, and at low row counts, seq scans often win. So your query runs in 5ms on staging, you ship it, it runs fine for three months, and then one day it falls off a cliff. You never see it coming because the planner’s decision changes dynamically as your table grows — there’s no test that catches this unless you’re testing with production-scale data or explicitly forcing index usage.

The fast check is to run EXPLAIN (ANALYZE, BUFFERS) on suspicious queries directly against a production replica. Don’t trust the staging plan. Look for Seq Scan on large tables, check the Rows Removed by Filter line (if it’s orders of magnitude larger than the rows actually returned, you’re scanning way too much), and pay attention to Buffers: shared hit/read — high read counts mean you’re hitting disk instead of cache. For a more complete toolkit for tracking down performance bottlenecks and organizing your investigation workflow, check out our guide on Productivity Workflows.

One concrete habit I’ve built: before any feature that touches a table I know is growing fast, I load a copy of the production database into a local Postgres instance and run EXPLAIN ANALYZE there. AWS RDS and Supabase both let you export snapshots; it’s an afternoon of setup that has saved me from multiple production incidents. If you can’t do that, at minimum run SELECT count(*) FROM your_table in production before assuming your staging query plan is representative of anything.

📚 Related Reading
Pull the Actual Culprit Out of pg_stat_statements

Step 1: Enable the Tools You Actually Need

Start with `pg_stat_statements` — Everything Else Is Guesswork

I’ve seen teams spend days optimizing the wrong queries because they were going off application logs and gut feelings instead of actual database-level data. pg_stat_statements fixes this. It tracks every query that runs through your Postgres instance — total calls, mean execution time, rows returned, cache hit rates — and it aggregates them by query shape, not by individual execution. That last part matters. Instead of seeing 50,000 individual log lines, you see one row that says “this parameterized SELECT ran 50,000 times and averaged 340ms.” That’s the signal you need.

Here’s the exact config change. Open your postgresql.conf — usually at /etc/postgresql/15/main/postgresql.conf on Debian/Ubuntu, or /var/lib/pgsql/15/data/postgresql.conf on RHEL — and add or update these lines:

# Load pg_stat_statements at startup (requires restart)
shared_preload_libraries = 'pg_stat_statements'

# Track all queries, not just top-level ones
# 'all' catches queries fired inside functions and stored procedures
pg_stat_statements.track = all

# How many distinct query fingerprints to track (default 5000)
# Bump this up on busy apps or you'll start losing data
pg_stat_statements.max = 10000

# Save stats across restarts — leave this on
pg_stat_statements.save = on

After saving, you need a full Postgres restart for shared_preload_libraries to take effect. There’s no getting around this — it’s not a reload. The difference between reload and restart trips people up constantly. A SIGHUP or SELECT pg_reload_conf() handles most config changes, but not this one. On a production box, I use systemctl restart postgresql rather than pg_ctl directly because systemctl respects your service dependencies and handles the socket file cleanup. If you’re on a managed database like RDS or Cloud SQL, shared_preload_libraries is already set for you — just run CREATE EXTENSION pg_stat_statements; in each database you want to track. Then verify it loaded:

📚 Related Reading
Why Staging Lies to You Every Single Time

-- Run this in psql
SELECT * FROM pg_extension WHERE extname = 'pg_stat_statements';

-- If you get a row back, you're good. Then check it actually has data:
SELECT count(*) FROM pg_stat_statements;

Slow Query Logging: 100ms, Not 1000ms

The default slow query threshold — when there even is one configured — is almost always 1000ms. That’s too late. By the time a query is taking a full second, your users have already had a bad experience and your connection pool is probably feeling it. I set log_min_duration_statement = 100 and I’d argue you should too, at least initially. Yes, you’ll catch more queries. That’s the point. You’re mapping your performance space, not just catching the obvious offenders. Once you’ve cleaned up the low-hanging fruit, you can raise it to 250ms or even 500ms if log volume becomes a problem.

# Log queries slower than 100ms
# Set to 0 to log everything (don't do this in prod long-term)
# Set to -1 to disable completely
log_min_duration_statement = 100

# Add this context to every slow query log line
log_line_prefix = '%t [%p]: [%l-1] user=%u,db=%d,app=%a,client=%h '

# Log parameters so you can actually reproduce the slow query
log_duration = off          # redundant if log_min_duration_statement is set
log_statement = 'none'      # don't log all statements, only slow ones

# These two together show you what Postgres was doing when it was slow
log_lock_waits = on
log_temp_files = 0          # log any temp file creation (often signals a bad sort/hash join)

The thing that caught me off guard the first time I set this up: log_statement = 'all' and log_min_duration_statement interact in a way that can double-log queries. Set log_statement = 'none' unless you have a specific reason to log DDL statements separately. Also, log_temp_files = 0 logs every time Postgres spills a sort or hash operation to disk. This is extremely useful early on — a query creating temp files is almost always missing a proper index or has a bad work_mem setting. Set it to 0 now, fix what it reveals, then raise the threshold to something like 1024 (kilobytes) once things are clean.

The Full Config Block, Explained

# -----------------------------------------------
# pg_stat_statements setup
# -----------------------------------------------
shared_preload_libraries = 'pg_stat_statements'
pg_stat_statements.max = 10000
pg_stat_statements.track = all
pg_stat_statements.save = on

# -----------------------------------------------
# Slow query logging
# -----------------------------------------------
log_min_duration_statement = 100    # ms; queries slower than this get logged
log_line_prefix = '%t [%p]: [%l-1] user=%u,db=%d,app=%a,client=%h '
log_lock_waits = on                 # log when a query waits on a lock
log_temp_files = 0                  # log every temp file (0 = all sizes)
log_checkpoints = on                # useful for I/O troubleshooting later
log_connections = off               # leave off unless debugging conn pool issues
log_disconnections = off            # same as above

# -----------------------------------------------
# Autovacuum logging (don't skip this)
# -----------------------------------------------
log_autovacuum_min_duration = 250   # log autovacuum runs over 250ms

I include log_checkpoints = on in every setup. Checkpoint warnings in your logs are an early sign that your checkpoint_completion_target or max_wal_size needs tuning — problems that show up as mystery write spikes weeks later. And log_autovacuum_min_duration = 250 is non-negotiable on any table that gets heavy writes or deletes. Autovacuum doing 10-second runs on your orders table while you’re trying to debug a slow SELECT is exactly the kind of interaction that makes query optimization feel like whack-a-mole. Log it so you can see it.

Step 2: Read EXPLAIN ANALYZE Without Guessing

Stop running plain EXPLAIN. It gives you estimates, not reality. The command I run every single time is:

EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT) SELECT * FROM orders
WHERE user_id = 42 AND status = 'pending'
ORDER BY created_at DESC
LIMIT 20;

The ANALYZE flag actually executes the query and records real timings. BUFFERS shows you whether Postgres hit shared memory cache or went to disk — which is often the difference between a 2ms query and a 200ms one. Without both of those flags, you’re reading fortune cookies, not diagnostics. One warning: ANALYZE runs the query for real, so don’t wrap a destructive statement in it without a transaction you plan to roll back.

Seq Scan Is Not Always Your Enemy

The instinct when you see Seq Scan is panic. Resist it. If your query is pulling 70% of the rows in a table, Postgres is correct to do a sequential scan — jumping around an index for most of the table is slower than reading it straight through. The place Seq Scan hurts you is on large tables where you’re filtering down to a small subset. If Postgres is sequentially scanning 2 million rows to return 40, that’s where an index pays off. The node I’m actually watching in the output is this:

Seq Scan on orders  (cost=0.00..89432.00 rows=40 width=128)
                    (actual time=0.043..4823.112 rows=40 loops=1)

Four seconds to return 40 rows. That’s your smoking gun.

Cost vs Actual Time — Only One Number Matters

The cost= value is Postgres’s guess before execution. It’s in arbitrary units, not milliseconds. I used to stare at cost numbers trying to make sense of them. Don’t. The number you care about is actual time=, which is in milliseconds and split into startup time..total time. The total time on your outermost node is the wall clock cost of your query. Everything above that is just Postgres telling you its reasoning, which is useful for diagnosis but not the metric you’re optimizing against.

Row Estimation Errors Break the Planner’s Logic

The thing that caught me off guard the first few times was realizing that bad row estimates cascade. Look at the gap between rows= in the plan and rows= in actual. Something like this is trouble:

Hash Join  (cost=120.45..980.12 rows=12 width=64)
           (actual time=45.231..8432.987 rows=84203 loops=1)

Postgres planned for 12 rows, got 84,203. It chose a Hash Join because it seemed cheap for a small result. With 84,000 rows, that decision is catastrophic. This usually happens when your table statistics are stale — run ANALYZE your_table; — or when you have highly correlated columns that the planner doesn’t know to account for. If statistics are fresh and the misestimate persists, look at extended statistics: CREATE STATISTICS lets you teach Postgres about column correlations it can’t detect on its own.

Use explain.depesz.com Instead of Squinting

EXPLAIN output on a query with four joins and a subquery can run 80 lines. Reading that raw in a terminal is miserable. Paste the entire output into explain.depesz.com and it color-codes every node by actual time, highlights the slowest ones in red, and lets you click through the tree instead of counting indentation levels. I copy EXPLAIN output into it automatically whenever a query breaks 50ms in production. It’s free, doesn’t store your data permanently by default, and the diff feature is underrated — paste before and after your index change to see exactly what moved.

The Index Mistakes I See Most Often

The worst index I ever inherited was on a status column that had four possible values: active, inactive, pending, deleted. Someone had carefully created it, it showed up in \d tablename, and Postgres almost never touched it. Low cardinality columns — ones where a handful of values cover millions of rows — make terrible index candidates. If 60% of your rows are active, Postgres looks at that index and correctly decides a sequential scan is cheaper. Index selectivity is what matters. A column with thousands of distinct values across millions of rows is a good candidate. A boolean or a status enum usually isn’t.

Composite Index Column Order Is Not Arbitrary

This one burned me on a high-traffic notifications table. I had a composite index on (created_at, user_id) because I was thinking about time-series queries. But 90% of our queries were WHERE user_id = $1 ORDER BY created_at DESC. That index was nearly useless for those queries. Postgres can use a composite index efficiently when you filter on a leading column — it can’t skip to the middle. Flipping it to (user_id, created_at) dropped query times from hundreds of milliseconds to single digits on that table. The rule: put the equality-filter column first, the range or sort column second.

-- This index supports: WHERE user_id = 5 ORDER BY created_at DESC
CREATE INDEX idx_notifications_user_time
  ON notifications (user_id, created_at DESC);

-- This index does NOT efficiently support the same query
-- (created_at, user_id) -- wrong order for this workload

Partial Indexes Are Underused and Absolutely Worth It

I switched a soft-delete table from a full index to a partial index and the index shrank by 80%. Here’s the thing: if you have a deleted_at column and 95% of your application queries only touch non-deleted rows, why are you maintaining index entries for all the deleted ones? A partial index lets you scope exactly what gets indexed:

CREATE INDEX idx_users_email_active
  ON users (email)
  WHERE deleted_at IS NULL;

Smaller index means faster reads, less RAM pressure, and faster writes. The trade-off is narrow: if you ever need to query deleted rows by email, this index won’t help. But for the common path — active users logging in — this pays for itself immediately. I use this pattern on every table with soft deletes now, and I add it to any table where a significant chunk of rows are in a state your app rarely queries.

Implicit Casts Silently Kill Index Usage

This one took me an embarrassing amount of time to debug. The symptom: you have an index on user_id (integer), your query runs slow, EXPLAIN shows a seq scan, and you’re staring at the screen confused because the index clearly exists. Then you look at the query parameter — it’s being passed as a string. Postgres won’t cast it silently for index lookups in all contexts. The index is on integer, you’re querying with varchar, and the planner says no thanks.

-- This can break index usage depending on your driver and context:
SELECT * FROM orders WHERE user_id = '12345';

-- Explicit cast fixes it:
SELECT * FROM orders WHERE user_id = 12345;
-- or
SELECT * FROM orders WHERE user_id = '12345'::integer;

The same thing happens with functions wrapping indexed columns — WHERE lower(email) = $1 won’t use an index on email, but will use one on lower(email). Always run EXPLAIN (ANALYZE, BUFFERS) and look for “Filter” vs “Index Cond” in the output. If your condition shows up under “Filter”, the index is being ignored.

Audit What You Actually Have vs What’s Being Used

Most codebases I’ve worked on have indexes that nobody remembers creating. Run this to see everything you’ve got:

SELECT
  schemaname,
  tablename,
  indexname,
  indexdef
FROM pg_indexes
WHERE schemaname = 'public'
ORDER BY tablename, indexname;

Then cross-reference with actual usage from pg_stat_user_indexes. This view tracks how many times each index has been scanned since the last stats reset:

SELECT
  schemaname,
  relname AS table_name,
  indexrelname AS index_name,
  idx_scan,
  idx_tup_read,
  idx_tup_fetch,
  pg_size_pretty(pg_relation_size(indexrelid)) AS index_size
FROM pg_stat_user_indexes
WHERE schemaname = 'public'
ORDER BY idx_scan ASC;

Any index with idx_scan = 0 after weeks of production traffic is a candidate for dropping. Every index you keep is a tax on writes — inserts, updates, and deletes all have to maintain every index on that table. I’ve dropped a dozen indexes on busy tables and watched write throughput visibly improve on those tables. Don’t hoard indexes out of fear. Trust the stats, and if you’re nervous, rename the index first or flag it with a comment before dropping, then monitor for a week.

N+1 Queries: The Silent Killer in ORM-Heavy Apps

The first time I ran pg_stat_statements on a production Rails app and sorted by calls descending, I saw the same SELECT * FROM users WHERE id = $1 query logged 47,000 times in a single hour. The app had maybe 200 users online. That’s your N+1 in the wild — not one expensive query, just one cheap query executed thousands of times inside a loop nobody thought to audit.

Enable pg_stat_statements if you haven’t already — add shared_preload_libraries = 'pg_stat_statements' to your postgresql.conf, restart, then run CREATE EXTENSION pg_stat_statements; in your database. After that, this query becomes your best friend:

SELECT
  query,
  calls,
  total_exec_time / calls AS avg_ms,
  rows / calls AS avg_rows
FROM pg_stat_statements
ORDER BY calls DESC
LIMIT 20;

High calls with low avg_ms is the N+1 fingerprint. The individual query looks fast — 0.3ms — so it never shows up in your slow query log. But multiply that by 10,000 calls per minute and you’ve quietly burned 3 full seconds of database time every 60 seconds. Django users should also check the Django Debug Toolbar’s SQL panel in development — it’ll literally count duplicate queries per request. In Node with Sequelize or TypeORM, the equivalent is turning on logging: console.log temporarily and watching your terminal scroll for 5 seconds on a single endpoint hit.

For the slow query log side, this is the pgBadger command I actually use on a standard Debian/Ubuntu Postgres 14 install:

pgbadger /var/log/postgresql/postgresql-14-main.log -o report.html --format html

Make sure your postgresql.conf has log_min_duration_statement = 100 (logs anything over 100ms) and log_line_prefix = '%t [%p]: [%l-1] user=%u,db=%d,app=%a,client=%h ' — pgBadger needs that structured prefix to parse correctly. Without the right prefix format, you’ll get an output file that’s mostly empty and no obvious error message about why. That one caught me off guard the first time.

JOIN vs Subquery vs Application Batching — Pick the Right Tool

The fix for N+1 isn’t always “just add an eager load.” Here’s how I actually decide:

JOIN — best when you need columns from both tables in the result set and you’re filtering on the association. The tradeoff: if the join multiplies rows (one-to-many), you’ll get duplicates and either need DISTINCT or careful aggregation. I’ve seen JOINs make things measurably worse when the parent table had 500k rows and every join multiplied them 10x.
Subquery / IN clause — my go-to for “give me all orders where the customer is in this set.” Write it as WHERE customer_id IN (SELECT id FROM customers WHERE region = 'EU') and let Postgres optimize it as a semi-join. Cleaner than a JOIN when you don’t need customer columns in the output.
Application-level batching — sometimes the right answer is pulling two separate queries and merging in code. Rails’ includes and Django’s prefetch_related do exactly this: one query for parents, one IN (ids) query for children, merge in memory. This is faster than a JOIN when the dataset is wide and you’d otherwise be shipping megabytes of duplicated parent columns across the wire.

Eager Loading Can Hurt You

I’ve watched developers “fix” N+1 with eager loading and then wonder why their API response time went from 200ms to 1.4 seconds. The culprit: User.includes(:orders).all on an admin dashboard that loaded every user and pulled in every order ever placed. 8,000 users, average 40 orders each — that’s 320,000 rows being hydrated into ORM objects for a page that displays a user count and a recent-activity chart. Eager loading is only useful when you actually consume the associated data. If you’re loading associations to check a boolean condition on maybe 5% of records, a conditional subquery or a WHERE EXISTS is almost always faster and uses a fraction of the memory.

OFFSET Is Not Pagination

This one I explain to at least one junior dev per year: SELECT * FROM posts ORDER BY created_at DESC LIMIT 20 OFFSET 10000 does not skip 10,000 rows. Postgres reads and discards 10,000 rows before handing you your 20. At page 500 of your results, you’re doing a near-full table scan on every single page load. The fix is keyset pagination — also called cursor-based pagination:

-- First page
SELECT id, title, created_at FROM posts
ORDER BY created_at DESC, id DESC
LIMIT 20;

-- Next page (pass last seen created_at and id as cursor)
SELECT id, title, created_at FROM posts
WHERE (created_at, id) < ('2024-03-15 10:22:00', 8821)
ORDER BY created_at DESC, id DESC
LIMIT 20;

This query hits an index on (created_at DESC, id DESC) regardless of which page you're on. Page 1 and page 10,000 have the same execution plan and nearly identical latency. The tradeoff is real though: you can't jump to "page 47" with keyset pagination, only forward/back. For admin tools with arbitrary page jumping, OFFSET under a few thousand rows is fine. For infinite scroll, feeds, or any API where clients page sequentially, OFFSET will eventually crater your database and you'll be debugging it at 2am wondering why your read replicas are maxed out.

VACUUM, ANALYZE, and Table Bloat — The Stuff Nobody Talks About

The Autovacuum Lie (And How to Stop Falling For It)

Autovacuum is not a "set it and forget it" solution. It's a background worker tuned for average tables, and if you have high-write tables — think order events, clickstream data, audit logs — you will absolutely accumulate table bloat without realizing it until your queries inexplicably slow down. I've watched a seemingly fine Postgres instance where one table had a dead tuple ratio above 40%. The autovacuum logs showed it was running. It just wasn't keeping up, because by default it triggers when 20% of rows are dead, which on a 50 million row table means 10 million dead tuples have to pile up before anything kicks off. That's a lot of garbage to carry around.

You can (and should) override autovacuum settings per table using storage parameters. Here's the pattern I use for high-churn tables:

ALTER TABLE order_events SET (
  autovacuum_vacuum_scale_factor = 0.01,
  autovacuum_vacuum_threshold = 1000,
  autovacuum_analyze_scale_factor = 0.005,
  autovacuum_vacuum_cost_delay = 2
);

That scale_factor = 0.01 means vacuum triggers when 1% of rows are dead — on a 50M row table that's 500K rows instead of 10M. The cost_delay = 2 (milliseconds) makes autovacuum more aggressive instead of throttling itself to avoid IO impact. The trade-off is real: more aggressive autovacuum does consume IO. On a write-heavy table during peak hours, I've seen it matter. But dead tuple bloat costs you more in slow sequential scans. Pick your poison, and in my experience the bloat is worse.

Actually Measuring Bloat Before You Guess

Before tuning anything, verify you actually have a bloat problem. The pgstattuple extension gives you the real numbers. Install it once per database and run this:

CREATE EXTENSION IF NOT EXISTS pgstattuple;

SELECT
  table_name,
  (pgstattuple(table_name)).dead_tuple_percent,
  (pgstattuple(table_name)).dead_tuple_len / 1024 / 1024 AS dead_mb,
  (pgstattuple(table_name)).tuple_count AS live_rows
FROM (
  SELECT tablename AS table_name
  FROM pg_tables
  WHERE schemaname = 'public'
) t
ORDER BY dead_tuple_percent DESC;

Fair warning: pgstattuple does a full table scan. Don't run this on a production OLTP database during peak load without thinking about it first. Dead tuple percent above 10-15% is where I start paying attention. Above 30%, things are actively hurting your query plans and you need to act now, not after the next autovacuum cycle. The dead_mb column is useful too — sometimes the percent looks scary on a small table but the absolute waste is negligible. Context matters.

What VACUUM ANALYZE Actually Does to Query Plans

Most developers think of VACUUM as "cleanup" and ANALYZE as something separate. They're right, but they miss the second-order effect. VACUUM reclaims dead tuple space and updates the visibility map. ANALYZE updates the statistics that the query planner uses to estimate row counts. If your stats are stale — which happens fast on high-write tables — the planner will make terrible decisions: choosing sequential scans over index scans, picking the wrong join order, wildly underestimating result sets. I've seen a query go from 8 seconds to 300ms purely from running VACUUM ANALYZE because the planner had been working with statistics from before a bulk data migration.

Run it manually when you've just bulk-loaded data, run a large DELETE, or done a major schema change:

-- Targeted, with verbose output so you can see what happened
VACUUM (ANALYZE, VERBOSE) order_events;

-- Check what the planner now thinks about row estimates
EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM order_events WHERE status = 'pending';

The BUFFERS flag in EXPLAIN is something I wish someone had told me earlier — it shows cache hits vs disk reads, which is often the real culprit when a query with a good plan is still slow.

Transaction ID Wraparound: The Emergency You Don't See Coming

This one is genuinely scary and rarely explained well. Postgres uses 32-bit transaction IDs. There are about 2 billion possible values. When you get close to wrapping around, Postgres goes into a protective mode and starts refusing writes. Not slowing down — refusing. I've never had this happen to me personally, but I've seen the panic posts when it does, and they're bad. The thing that caught me off guard when I first learned about this is that it's not just about how many transactions you run — it's about which ones have been frozen. VACUUM is what freezes old transactions and resets the counter effectively.

Check your current distance from the wraparound limit across all databases:

SELECT
  datname,
  age(datfrozenxid) AS xid_age,
  2147483648 - age(datfrozenxid) AS transactions_remaining
FROM pg_database
ORDER BY xid_age DESC;

If xid_age is above 1.5 billion, you need to think about this immediately. Postgres will start issuing warnings in the logs around 1.6 billion and will shut down writes at 2 billion minus a small buffer. If you're running managed Postgres (RDS, Cloud SQL, Supabase), they'll usually alert you before crisis point, but I've seen teams ignore those alerts. The fix is to let aggressive VACUUM (or autovacuum with autovacuum_freeze_max_age tuned down) run to completion. On a very large, heavily bloated table this can take hours. Don't let it get there.

The Scale Factor Default That Will Bite You

The global default autovacuum_vacuum_scale_factor = 0.2 made sense when the Postgres team set it — most tables were smaller. On a table with 50 million rows, 20% is 10 million dead tuples. At that point you've got bloat that's expanding your table size by potentially gigabytes, slowing down sequential scans, and eating buffer cache with garbage pages. I switched every table above roughly 5 million rows to a custom scale factor as a matter of policy now:

-- For very large tables (50M+ rows): 0.5% threshold
ALTER TABLE large_events_table SET (autovacuum_vacuum_scale_factor = 0.005);

-- For medium tables (5M-50M rows): 2% threshold  
ALTER TABLE medium_events_table SET (autovacuum_vacuum_scale_factor = 0.02);

-- Verify what's set per table
SELECT relname, reloptions
FROM pg_class
WHERE reloptions IS NOT NULL
AND relkind = 'r'
ORDER BY relname;

You can also tune the global default in postgresql.conf if most of your tables are large, but I prefer per-table overrides because they're explicit and survive config reloads without surprising you. The pg_class.reloptions query is your audit trail — run it periodically to confirm your overrides are actually in place. Configuration drift is real, especially after a database restore or migration to a new host.

Connection Pooling: The Fix That Isn't in Your Query

Your Queries Aren't Slow — Your Connections Are

I spent two days tuning indexes on a slow endpoint before realizing the query itself was fine. The problem was that Postgres was spending 80% of the request lifecycle just waiting for a connection to become available. Run this and you'll see it instantly:

SELECT pid, state, wait_event_type, wait_event, query_start, query
FROM pg_stat_activity
WHERE state != 'idle'
ORDER BY query_start;

If you see a wall of rows with state = 'active' and wait_event = 'Client', your queries aren't running — they're queued. Every new request from Rails, Django, or Node opens a fresh Postgres connection. At scale, that means hundreds of connections sitting around eating ~5–10MB of RAM each, and Postgres spending more time on connection overhead than on actual query work. The fix isn't a better index. The fix is a connection pooler sitting between your app and Postgres.

PgBouncer in Transaction Mode: The Config That Actually Works

I've run PgBouncer in front of Postgres on three production apps now. Transaction mode is what you want for web apps — each database transaction gets a connection from the pool, then returns it immediately. Your app thinks it has a persistent connection; Postgres sees a small, clean pool. Here's the pgbouncer.ini that I keep going back to:

[databases]
myapp = host=127.0.0.1 port=5432 dbname=myapp_production

[pgbouncer]
listen_port = 6432
listen_addr = 127.0.0.1
auth_type = md5
auth_file = /etc/pgbouncer/userlist.txt

pool_mode = transaction
max_client_conn = 1000
default_pool_size = 25
min_pool_size = 5
reserve_pool_size = 5
server_idle_timeout = 600
client_idle_timeout = 0
log_connections = 0
log_disconnections = 0

default_pool_size = 25 might look tiny if you're running 400 app processes, but that's the point. Postgres handles 25 active queries well. It handles 400 simultaneous connections badly. max_client_conn = 1000 is what your app sees — PgBouncer handles the fan-in. Set server_idle_timeout = 600 (10 minutes) so PgBouncer cleans up connections to Postgres that have been sitting idle, not so aggressively that you constantly pay reconnection cost. I've seen people set it to 60 seconds and then wonder why they see spikes every minute in their metrics.

PgBouncer vs pgpool-II: I Tried Both, Stayed with One

pgpool-II has features PgBouncer doesn't — query caching, load balancing across replicas, replication. On paper it looks like the better choice. In practice I found the config model confusing, the debugging experience painful (the logs tell you something failed, not why), and the prepared statement handling even more complex than PgBouncer's. I switched a staging environment to pgpool-II for three weeks and spent more time fighting it than improving anything. PgBouncer does one thing: pool connections efficiently. Its config is readable, its behavior is predictable, and when something goes wrong, the logs actually help. If you need read replica load balancing, handle that at the application layer with something like HikariCP or PgBouncer targeting multiple backends — don't introduce pgpool-II complexity just for that.

The Prepared Statement Problem (This Will Bite You)

The one real gotcha with PgBouncer in transaction mode: prepared statements break. In session mode, a prepared statement lives for the life of the connection. In transaction mode, the connection gets handed back to the pool after each transaction, so the next client gets a connection that knows nothing about your prepared statement. You'll see errors like ERROR: prepared statement "s1" does not exist and spend an hour confused.

The fix depends on your stack. In Rails with ActiveRecord, set prepared_statements: false in database.yml:

production:
  adapter: postgresql
  prepared_statements: false
  host: 127.0.0.1
  port: 6432

In Node with node-postgres, avoid named prepared statements or use pg-native with protocol-level prepared statements carefully. With Django, the ORM doesn't use PostgreSQL-level prepared statements by default, so you're usually fine. The performance hit from disabling prepared statements is real but modest — Postgres still has the query plan cache at the server level via pg_prepared_statements for unnamed statements. You lose the per-connection plan reuse, not query planning entirely. For most web apps the connection pooling wins dominate that loss by a large margin.

Caching Query Results Without Building a Mess

The first question I ask before reaching for Redis is: why is this query slow? If the answer is "missing index" or "the query plan is doing a sequential scan on 4 million rows," caching is just lipstick on a pig. I've watched teams build elaborate Redis caching layers over queries that would've been fast with a single CREATE INDEX CONCURRENTLY. The cache buys you time but costs you operational complexity — you've now got two systems to keep synchronized. Fix the query first. Add the index. Only reach for application-layer caching when you've hit the ceiling on what PostgreSQL can do alone: expensive aggregations that touch 10+ million rows, dashboards that need sub-100ms response times, or read-heavy endpoints where the same result gets requested dozens of times per second.

PostgreSQL's own shared_buffers is the thing most developers never touch because it's buried in postgresql.conf and the default is comically conservative — usually 128MB on most Linux installs. That default was set for systems with very little RAM and is basically useless for a production app. The standard recommendation is 25% of your total system RAM, so on a 16GB box you'd set it to 4GB. What that actually changes: Postgres keeps a chunk of its working data in memory rather than going to disk on every read. Disk I/O on reads drops significantly for frequently-accessed pages. The thing that caught me off guard the first time I tuned this was that you also need to set effective_cache_size to roughly 75% of RAM — this doesn't allocate memory, it just tells the query planner how much memory it can assume the OS has available for caching, which nudges it toward index scans over sequential scans.

# /etc/postgresql/15/main/postgresql.conf

shared_buffers = 4GB          # 25% of 16GB RAM
effective_cache_size = 12GB   # 75% of RAM — planner hint only
work_mem = 64MB               # per sort/hash operation, not per connection
maintenance_work_mem = 512MB  # for VACUUM, CREATE INDEX, etc.

Materialized views are genuinely useful for expensive aggregations — the kind where you're joining 3 tables, running SUM and GROUP BY over millions of rows, and the result changes maybe once every 15 minutes. Here's the setup:

CREATE MATERIALIZED VIEW daily_revenue_summary AS
SELECT
  date_trunc('day', created_at) AS day,
  product_category,
  SUM(amount_cents) AS total_revenue_cents,
  COUNT(*) AS order_count
FROM orders
WHERE status = 'completed'
GROUP BY 1, 2;

CREATE UNIQUE INDEX ON daily_revenue_summary(day, product_category);

-- Refresh without locking reads:
REFRESH MATERIALIZED VIEW CONCURRENTLY daily_revenue_summary;

The CONCURRENTLY flag is non-negotiable in production — without it, REFRESH takes an exclusive lock and your queries block until it's done. You need that unique index for concurrent refresh to work. Set up a cron job or a pg_cron task to refresh on a schedule that matches your data freshness tolerance. But here's the downside nobody warns you about: between refreshes, your data is stale, and there's no automatic invalidation. If a batch job corrects historical orders at 2pm and your view refreshes at midnight, your dashboard shows wrong numbers for 10 hours. Materialized views work well when staleness is acceptable and predictable. They fall apart when your data has unpredictable correction patterns or when users expect real-time accuracy.

When PostgreSQL caching isn't enough and your query is already optimal, Redis result caching is the right move — but the cache key strategy will either save you or destroy you. The naive approach is cache_key = "products_page_2", which breaks the moment someone adds a filter. I use a deterministic hash of the full query parameters:

import hashlib, json, redis

def build_cache_key(endpoint: str, params: dict) -> str:
    # Sort keys so {"page": 2, "category": "shoes"} and
    # {"category": "shoes", "page": 2} produce the same key
    canonical = json.dumps(params, sort_keys=True)
    param_hash = hashlib.sha256(canonical.encode()).hexdigest()[:16]
    return f"qcache:{endpoint}:{param_hash}"

r = redis.Redis(host='localhost', port=6379, db=0)

def get_products(page: int, category: str, sort: str):
    params = {"page": page, "category": category, "sort": sort}
    key = build_cache_key("products", params)

    cached = r.get(key)
    if cached:
        return json.loads(cached)

    result = db.query(
        "SELECT ... FROM products WHERE category = %s ORDER BY %s LIMIT 20 OFFSET %s",
        [category, sort, (page - 1) * 20]
    )

    r.setex(key, 300, json.dumps(result))  # 5-minute TTL
    return result

The TTL is your main tool for managing stale data here. For paginated product listings I usually set 5 minutes. For user-specific data I go lower — 60 seconds — or skip caching entirely. The invalidation problem bites you when an admin updates a product and that update doesn't show up for 5 minutes. One pattern that helps: on write operations, delete the cache keys related to that resource. This requires your key structure to be predictable enough that you can identify which keys to purge — which is another reason the structured qcache:{endpoint}:{hash} format beats opaque keys. You can use Redis's SCAN with a pattern to find and delete related keys when data changes, though in high-traffic scenarios even that coordination adds overhead you need to account for.

Partitioning Large Tables — When You Actually Need It

Most developers reach for partitioning too early. I've seen teams partition a 2-million-row table and wonder why their queries got slower. Partitioning isn't a general-purpose performance trick — it's a surgical tool for a specific problem: when your query planner consistently scans data it doesn't need to touch. The canonical case is a time-series or event log table where almost every real-world query filters by a date range. If your events table has 500 million rows and your dashboard only ever queries the last 30 days, partitioning by created_at means Postgres physically skips 95% of your data before it even starts executing the query. That's the win.

Setting Up Range Partitioning by `created_at`

Here's a working setup for a monthly-partitioned events table. You declare the parent table, then create child partitions for each range:

-- Parent table (holds no data itself)
CREATE TABLE events (
    id          BIGSERIAL,
    user_id     BIGINT NOT NULL,
    event_type  TEXT NOT NULL,
    payload     JSONB,
    created_at  TIMESTAMPTZ NOT NULL DEFAULT NOW()
) PARTITION BY RANGE (created_at);

-- Create partitions per month
CREATE TABLE events_2024_01
    PARTITION OF events
    FOR VALUES FROM ('2024-01-01') TO ('2024-02-01');

CREATE TABLE events_2024_02
    PARTITION OF events
    FOR VALUES FROM ('2024-02-01') TO ('2024-03-01');

-- You'll want to automate this — pg_partman handles it well
-- Install via pgxn: pgxn install pg_partman

The thing that caught me off guard the first time: in Postgres versions before 11, indexes, primary keys, and foreign keys on the parent table do not automatically propagate to new partitions. You have to create them explicitly on each child table. Postgres 11+ fixed this — indexes created on the parent now apply automatically to new partitions — but if you're on an older managed database (some cloud providers lag behind), you will silently end up with unindexed partitions and queries that do full sequential scans. Always verify with \d events_2024_01 and check that the indexes are actually there.

Confirming Partition Pruning Is Actually Happening

Don't assume Postgres is skipping partitions — prove it. Run EXPLAIN (ANALYZE, BUFFERS) with a date-filtered query and look for which child tables appear in the plan:

EXPLAIN (ANALYZE, BUFFERS)
SELECT COUNT(*) FROM events
WHERE created_at BETWEEN '2024-06-01' AND '2024-06-30'
  AND event_type = 'purchase';

You want to see something like -> Seq Scan on events_2024_06 with other partitions completely absent from the plan. If you see Append nodes followed by scans on every partition, pruning isn't working — usually because the query parameter isn't a stable literal (parameterized queries from ORMs can sometimes trick the planner, though Postgres 13+ handles this much better with runtime pruning). The enable_partition_pruning setting should be on by default; double-check with SHOW enable_partition_pruning;.

Where Partitioning Actually Makes Sense (and Where It Doesn't)

There's no universal row count where partitioning suddenly becomes worth it. I've seen it help dramatically on 50-million-row tables and do nothing useful on 200-million-row tables — the deciding factor is your access pattern. Ask yourself: do the majority of your queries filter on the partition key, and do those filters consistently land on a small subset of partitions? If yes, you're a good candidate. If you're running analytics that aggregate across all time — say, a query that counts total events by event_type across 3 years of data — partitioning will make that query slower, because Postgres now has to open, plan, and scan dozens of child tables instead of one. The Append node overhead adds up fast when you're hitting 36 monthly partitions simultaneously.

Good fit: Time-series event logs, audit tables, IoT sensor data, anything where "give me data from the last N days" is the dominant query shape
Bad fit: Tables where most queries don't filter on the partition key at all, or tables with heavy cross-partition analytics
Painful surprise: UPDATE or DELETE that moves a row from one partition to another (e.g., updating created_at) causes a delete + insert across partitions — most ORMs don't warn you about this
Operational bonus: Dropping old partitions is instant (DROP TABLE events_2021_01) versus a slow DELETE FROM events WHERE created_at < '2022-01-01' that locks rows and bloats your WAL

Before committing to partitioning, try a partial index first. If your hot data is recent, something like CREATE INDEX events_recent_idx ON events (created_at, event_type) WHERE created_at > '2024-01-01' gives you much of the scan reduction with zero schema complexity. I've saved teams weeks of migration work by suggesting this first. Partitioning is the right answer for truly massive tables with clear time-based access patterns — but it's an architectural commitment, not a query tweak you can easily undo.

My Actual Debugging Workflow on a Slow Production Query

The first place I go is pg_stat_statements, not the application logs. Logs tell you a query was slow. pg_stat_statements tells you which queries are collectively destroying your database across thousands of calls. There's a massive difference between a query that took 4 seconds once and a query that takes 40ms but runs 50,000 times a minute. The extension needs to be loaded first — check shared_preload_libraries in your postgresql.conf — but once it is, this is the query I run every single time I start a performance investigation:

SELECT
  query,
  calls,
  total_exec_time / calls AS avg_ms,
  rows / calls AS avg_rows,
  100.0 * shared_blks_hit / nullif(shared_blks_hit + shared_blks_read, 0) AS cache_hit_ratio
FROM pg_stat_statements
WHERE calls > 100
ORDER BY total_exec_time DESC
LIMIT 20;

Sort by total_exec_time DESC, not avg_ms. A query averaging 2ms but called 200,000 times is costing you more than a 500ms query that runs twice a day. Pull the top offender, copy the raw query, and move to your read replica before you do anything else. Never run EXPLAIN ANALYZE on production primary unless you enjoy explaining to your team why a table just got locked during a debug session.

EXPLAIN ANALYZE on the Replica — and What to Actually Look For

On the replica, wrap the query in a transaction you'll roll back if it has any writes (it shouldn't, but habits matter), then run:

EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT) SELECT ...;

The BUFFERS flag is the one people skip, and it's the one that tells you whether Postgres is pulling data from shared memory or hammering disk. A high Buffers: shared read count relative to shared hit is your signal that you're either missing an index or the working set doesn't fit in shared_buffers. Look for Seq Scan nodes on large tables — anything over a few hundred thousand rows doing a sequential scan on a selective filter is a red flag. Also check the estimated vs actual row counts. If the planner estimated 12 rows and got back 84,000, you have a statistics problem, not just an index problem.

Before You Trust the Planner: Force a Stats Refresh

The thing that caught me off guard early on was debugging a query for an hour, adding an index, seeing no improvement, and eventually realizing the planner's row estimates were completely stale. Postgres autovacuum runs ANALYZE on a schedule, but after a large bulk insert or a table that's grown faster than autovacuum expected, the statistics lag. Before you trust anything the planner tells you, run this:

ANALYZE tablename;

Then re-run your EXPLAIN ANALYZE. I've seen estimated rows jump from 500 to 500,000 after a forced stats update, which completely changed the query plan — and fixed the performance problem without touching the query at all. If the table is huge and you don't want to lock up the replica, you can pass a column list: ANALYZE tablename (column1, column2). Faster, targeted, good enough for targeted investigations.

Testing Index Viability Without Creating Anything

Here's a trick I use constantly: before I go through the overhead of CREATE INDEX CONCURRENTLY on a multi-million row table (which can take 10+ minutes and increases replication lag), I test whether an index would even help by forcing Postgres to pretend seq scans don't exist:

SET enable_seqscan = off;
EXPLAIN ANALYZE SELECT ...;
SET enable_seqscan = on;

With seq scans disabled, the planner is forced to use any available index or bitmap scan. If the query time drops dramatically in the plan output, you have your answer — build the index. If the cost with enable_seqscan = off is still terrible, you either have no useful index to build, or the problem is somewhere else entirely. This is a session-level setting so it only affects your connection, and you should always reset it. I've seen people forget that second line and then wonder why their replica started behaving strangely.

Is It the Query or the Connection Pool? Check pg_stat_activity

Sometimes the query itself is fine and the problem is that 200 connections are queued waiting to run it. Before blaming SQL, I always check:

SELECT
  state,
  wait_event_type,
  wait_event,
  count(*) AS count,
  left(query, 80) AS query_snippet
FROM pg_stat_activity
WHERE datname = 'your_database'
GROUP BY state, wait_event_type, wait_event, query_snippet
ORDER BY count DESC;

If you see a pile of connections in state = 'active' with wait_event = 'Lock' or wait_event_type = 'Lock', you have a locking problem, not a query performance problem. Rewriting the query won't help. Similarly, if most connections are idle in transaction, your application is opening transactions and not closing them promptly — a connection pool misconfiguration or an ORM that's being too aggressive about transaction scope. I've chased a "slow query" for 45 minutes before realizing the query ran in 8ms but waited 3 seconds to get a connection from PgBouncer's pool because the pool size was set to 10 on a 50-RPS endpoint.

When to Rewrite the Query vs When to Touch Postgres Config

My rule is simple: if EXPLAIN ANALYZE shows the plan is bad but the table statistics are accurate and a good index exists, it's a query rewrite problem — look at join order, subquery vs CTE vs lateral join, or unnecessary columns forcing wide row reads. If the plan is reasonable but Buffers: shared read is high and shared_blks_hit is low across all your slow queries, you've outgrown your shared_buffers setting — typically set to 25% of RAM, but a lot of default installs leave it at 128MB, which is embarrassingly small for any real workload. If you're seeing massive variance in query times (sometimes 20ms, sometimes 2 seconds, same query), that's usually work_mem being too low and forcing disk-based sorts. Bump work_mem carefully — it's per-sort-operation, not per-connection, so setting it to 256MB on a 100-connection pool can blow your RAM. Set it at the session level for specific heavy queries instead of globally if you're not sure about the math.

When to Pick Each Approach

Match the Symptom to the Fix — Don't Just Reach for the Nearest Tool

The most expensive mistake I see developers make is throwing caching at a slow query when the real problem is a missing index, or spending a week tuning Postgres when the actual culprit is the ORM firing 400 queries per page load. The fix depends entirely on the shape of the problem, not just the symptom. Here's how I think through it.

Slow Single Query, Called Rarely → EXPLAIN ANALYZE First, Always

If a query takes 3 seconds but only runs a few times per minute, no amount of connection pooling will save you. You need to understand why it's slow. Run this and read it carefully:

EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT o.id, u.email, SUM(oi.price)
FROM orders o
JOIN users u ON u.id = o.user_id
JOIN order_items oi ON oi.order_id = o.id
WHERE o.created_at > NOW() - INTERVAL '30 days'
GROUP BY o.id, u.email;

Look for Seq Scan on large tables — that's your first red flag. The BUFFERS option shows cache hit vs disk read ratios; a high Blocks Read count means you're hitting disk constantly. After you spot the problem, add a targeted index:

CREATE INDEX CONCURRENTLY idx_orders_created_at ON orders (created_at DESC);

The CONCURRENTLY flag is non-negotiable in production — without it, you lock the table. I skipped it once in a low-traffic window and still caused a 90-second outage. Never again.

Fast Query, Thousands of Calls Per Second → Pool Connections, Cache Results

The thing that caught me off guard the first time I profiled a high-traffic app was that the query itself was taking 2ms, but total request time was 40ms. The overhead was Postgres connection establishment — each request was opening and closing its own connection. The fix isn't query tuning, it's PgBouncer in transaction mode:

# pgbouncer.ini
[databases]
myapp = host=127.0.0.1 port=5432 dbname=myapp

[pgbouncer]
pool_mode = transaction
max_client_conn = 1000
default_pool_size = 20

With 20 actual Postgres connections serving 1,000 client connections, connection overhead drops to near zero. Layer Redis on top for results that don't need to be real-time — product catalog, user role lookups, anything with a natural TTL. If you're on Django, django-redis makes this a one-liner. If you're on Node, ioredis is solid. The trade-off: you now have cache invalidation to reason about. Simple TTLs work for most cases. Event-driven invalidation is more correct but significantly harder to maintain.

Aggregate Queries on Time-Range Data → Partitioning or Materialized Views

If you're running dashboard queries like "sales by week for the last 2 years" across a 50-million-row table, you have two good options depending on how fresh the data needs to be. Declarative partitioning by range is the right call when queries almost always filter by time:

CREATE TABLE events (
    id BIGSERIAL,
    user_id INT,
    event_type TEXT,
    created_at TIMESTAMPTZ NOT NULL
) PARTITION BY RANGE (created_at);

CREATE TABLE events_2024_q1 PARTITION OF events
    FOR VALUES FROM ('2024-01-01') TO ('2024-04-01');

Postgres prunes partitions at query time, so a scan that would touch 50M rows now touches maybe 3M. The gotcha: your indexes need to be created on each partition, not just the parent. I learned this the hard way when a partition I created manually never got the covering index that the parent had. Materialized views are better when the query is complex and you can tolerate slightly stale data — refresh them on a schedule with REFRESH MATERIALIZED VIEW CONCURRENTLY mv_weekly_sales; and your dashboard queries drop from seconds to milliseconds.

Everything Slow Under Load → Stop Tuning Queries and Check the Infrastructure

I've seen teams spend two weeks rewriting SQL while their actual problem was table bloat from a stalled autovacuum. If everything degrades under load simultaneously, the query plan isn't your enemy — the table is fragmented, dead tuples are piling up, and Postgres can't see which rows are actually live. Check this first:

SELECT relname, n_dead_tup, n_live_tup, last_autovacuum
FROM pg_stat_user_tables
WHERE n_dead_tup > 10000
ORDER BY n_dead_tup DESC;

If last_autovacuum is NULL or weeks old on a busy table, you have a problem. Manually trigger it: VACUUM ANALYZE orders; — and then fix your autovacuum settings so this doesn't recur. Also check shared_buffers. The Postgres default is 128MB, which is comically small for any production workload. I set it to 25% of total RAM as a starting point, alongside effective_cache_size at 75% of RAM. These two settings alone have halved query times on systems I've inherited that were still running Postgres defaults.

N+1 Detected → Don't Touch Postgres, Fix the ORM Call

If your APM tool (Datadog, New Relic, or even just Django Debug Toolbar) shows 200 nearly-identical queries per request that differ only by a primary key, that's N+1 and no Postgres optimization will fix it — because each individual query is already fast. The problem is volume. In Django, the fix is select_related() for foreign keys and prefetch_related() for many-to-many:

# Before: fires 1 query per order to get the user
orders = Order.objects.filter(status='pending')
for order in orders:
    print(order.user.email)  # N queries

# After: 2 queries total
orders = Order.objects.select_related('user').filter(status='pending')
for order in orders:
    print(order.user.email)  # already loaded

In Sequelize or Prisma, the equivalent is include. The deeper issue is that ORMs make it dangerously easy to write N+1 patterns without realizing it — a single .user property access looks harmless in isolation. I make it a code review rule: any loop that accesses a relationship attribute gets scrutinized. Adding a Postgres index won't help you here. The database is doing exactly what you asked — you're just asking too many times.

Disclaimer: This article is for informational purposes only. The views and opinions expressed are those of the author(s) and do not necessarily reflect the official policy or position of Sonic Rocket or its affiliates. Always consult with a certified professional before making any financial or technical decisions based on this content.

Written by Eric Woo

Lead AI Engineer & SaaS Strategist

Eric is a seasoned software architect specializing in LLM orchestration and autonomous agent systems. With over 15 years in Silicon Valley, he now focuses on scaling AI-first applications.