Reading PostgreSQL Execution Plans with EXPLAIN ANALYZE

A practical walkthrough of EXPLAIN ANALYZE in PostgreSQL — how to read node types, spot bad row estimates, and turn the plan into an actual fix.

#PostgreSQL#Databases#Performance

EXPLAIN ANALYZE is the single most useful tool PostgreSQL gives us for understanding why a query is slow. This post is a practical walkthrough — what the plan actually says, which numbers matter, and how to turn the output into a concrete fix.

EXPLAIN vs EXPLAIN ANALYZE

EXPLAIN shows the planner's intended plan and its estimates. EXPLAIN ANALYZE actually runs the query and reports real timings and row counts alongside the estimates. The difference between estimated and actual rows is usually where the interesting bugs live.

EXPLAIN ANALYZE
SELECT * FROM orders WHERE customer_id = 42;

A word of warning: EXPLAIN ANALYZE executes the query for real. For a SELECT that is fine. For an UPDATE, DELETE, or INSERT, wrap it in a transaction we roll back:

BEGIN;
EXPLAIN ANALYZE UPDATE orders SET status = 'shipped' WHERE id = 7;
ROLLBACK;
The options worth knowing

The defaults are spartan. I'd recommend using these almost every time:

EXPLAIN (ANALYZE, BUFFERS, VERBOSE, SETTINGS, FORMAT TEXT)
SELECT ...;
  • BUFFERS — shows shared/local buffer hits and reads. This is how we tell whether a node is slow because of CPU work or because it is hammering disk.
  • VERBOSE — adds the output column list and schema-qualified names. Helpful when the plan has multiple subqueries.
  • SETTINGS — prints any planner-related GUCs that differ from defaults. Saves us from chasing ghosts when a session has enable_nestloop = off.
  • FORMAT JSON — useful for tooling. For reading by eye, stick with TEXT.

In PostgreSQL 13+, BUFFERS works without ANALYZE too, but the actual I/O numbers only appear when the query runs.

Anatomy of a plan node

A single plan node looks like this:

Index Scan using orders_customer_id_idx on orders  (cost=0.43..8.45 rows=1 width=64) (actual time=0.024..0.031 rows=3 loops=1)
  Index Cond: (customer_id = 42)
  Buffers: shared hit=4

The numbers, in order:

  • cost=0.43..8.45 — planner's estimate in arbitrary units. The first is the startup cost (time before the first row is produced), the second is the total cost. Useful for comparing plans, meaningless as an absolute.
  • rows=1 — planner's estimated row count.
  • width=64 — average estimated row size in bytes.
  • actual time=0.024..0.031 — milliseconds to first row, milliseconds to last row, per loop.
  • rows=3 — actual rows returned, per loop, averaged.
  • loops=1 — how many times this node ran.

The loops multiplier matters a lot. If a node shows actual time=0.5..0.5 rows=1 loops=10000, the total time spent in it is 5 seconds, not 0.5 ms.

Node types we'll see most often

Seq Scan — read the whole table. Fine for small tables or when we need most of the rows anyway. Suspicious on a large table with a selective filter.

Index Scan — walk an index, then fetch each matching heap tuple. Good for selective predicates.

Index Only Scan — answer the query from the index alone, no heap visit. Requires a covering index and recently-vacuumed pages (the visibility map decides). Fastest of the three when it applies.

Bitmap Heap Scan / Bitmap Index Scan — build a bitmap of matching tuple IDs, then fetch them in physical order. Pays off when the index returns many rows but not enough for a Seq Scan to win.

Nested Loop — for each row on the outer side, probe the inner side. Cheap when the outer side is tiny. Catastrophic when the planner thought it was tiny and it is not.

Hash Join — build a hash table on one side, probe with the other. Works well for medium-to-large unsorted inputs.

Merge Join — both sides sorted, walk them together. Wins when inputs are already sorted (e.g. by an index).

What to actually look for

I read plans in a fairly consistent order:

  1. Compare estimated vs actual rows on every node. A 10x or 100x mismatch on a node low in the tree usually propagates upward and is the root cause. The planner picked a Nested Loop because it thought the outer side had 5 rows, but it has 50,000.
  2. Find the node with the most time. "Most time" means actual time * loops, not just the last-row time on the node itself. A child that runs 10,000 times is often the culprit, not the parent.
  3. Check Buffers. shared read=N means N pages came from disk (or the OS cache — Postgres can't tell). High read with low hit on a hot query suggests we're under-cached or the access pattern is scattered.
  4. Look at filters. A Filter: line under a Seq Scan with Rows Removed by Filter: 9,999,000 means we scanned ten million rows to return a thousand. That is an index waiting to be created.
A worked example

Say we have a slow query:

EXPLAIN (ANALYZE, BUFFERS)
SELECT o.id, o.total
FROM orders o
JOIN customers c ON c.id = o.customer_id
WHERE c.country = 'UA' AND o.created_at > now() - interval '7 days';

And the plan comes back like this (abbreviated):

Nested Loop  (actual time=0.412..2841.003 rows=1842 loops=1)
  ->  Seq Scan on customers c  (actual time=0.011..18.402 rows=4120 loops=1)
        Filter: (country = 'UA'::text)
        Rows Removed by Filter: 195880
  ->  Index Scan using orders_customer_id_idx on orders o  (actual time=0.681..0.684 rows=0 loops=4120)
        Index Cond: (customer_id = c.id)
        Filter: (created_at > (now() - '7 days'::interval))
        Rows Removed by Filter: 47

Two things jump out. First, the customers Seq Scan throws away 195k rows to keep 4k — an index on customers(country) would help. Second, the inner Index Scan runs 4,120 times and filters out almost everything it fetches. A composite index on orders(customer_id, created_at) lets the index condition do both jobs:

CREATE INDEX ON customers (country);
CREATE INDEX ON orders (customer_id, created_at);

After that, the planner might switch to a Hash Join with two Index Scans, or keep the Nested Loop but with the inner side now doing zero useless heap fetches. Either way, the Rows Removed by Filter numbers should collapse.

When the estimates are wrong

Bad estimates usually trace back to one of a few causes:

  • Stale statistics. ANALYZE table_name; refreshes them. Autovacuum normally handles this, but after a bulk load we want to do it explicitly.
  • Correlated columns. The planner assumes columns are independent. If country and city correlate strongly, estimates for WHERE country='UA' AND city='Kyiv' will be off. CREATE STATISTICS (extended statistics) fixes this:
    CREATE STATISTICS orders_country_city (dependencies)
      ON country, city FROM addresses;
    ANALYZE addresses;
  • Low default_statistics_target. The default of 100 is fine for most columns. For a column with a highly skewed distribution, bumping it to 1000 on that specific column with ALTER TABLE ... ALTER COLUMN ... SET STATISTICS 1000 gives the planner a better histogram.
Tooling

The raw text output is fine for short plans. For anything bigger, explain.depesz.com and explain.dalibo.com render the plan as a tree with the slowest nodes highlighted. I paste plans into Depesz reflexively now — the colour-coding for "row estimate off by 100x" saves a lot of eyeball time.

Things to keep in mind
  • EXPLAIN ANALYZE adds measurement overhead. A query that runs in 50 ms might report 80 ms under ANALYZE because of the per-row timing calls. The shape of the plan is reliable; the absolute numbers drift a bit.
  • The first run of a query often reads from disk; the second run hits cache. Run it twice and compare Buffers before drawing conclusions about I/O.
  • Parallel plans report per-worker timings. The Gather node shows the wall time; the children show the average across workers.
What I'd Recommend

Make EXPLAIN (ANALYZE, BUFFERS) the default reflex when a query is slow, before reaching for indexes, query rewrites, or pg_stat_statements. Most slow queries I've debugged were one of three things: a missing index, a bad row estimate, or a plan that looked fine but ran 10,000 inner loops. The plan tells us which one, and the fix follows from the diagnosis.