PostgreSQL Scales Further Than Most Think — Here is How to Push It
Practical strategies for scaling PostgreSQL before reaching for distributed databases or microservice rewrites
PostgreSQL can handle far more than most teams ask of it. Before we reach for something exotic, there is a lot we can do with the database we already have.
Connection pooling
PostgreSQL forks a process per connection. This works at low counts, but does not scale well to hundreds or thousands of connections.
PgBouncer sits in front of Postgres and multiplexes connections. Our app can open many connections while Postgres only sees a few.
; pgbouncer.ini
[databases]
myapp = host=127.0.0.1 port=5432 dbname=myapp
[pgbouncer]
listen_port = 6432
listen_addr = 0.0.0.0
auth_type = md5
auth_file = /etc/pgbouncer/userlist.txt
pool_mode = transaction
max_client_conn = 1000
default_pool_size = 25The key setting is pool_mode = transaction. Connections are returned to the pool after each transaction, not after each session. This is more efficient, but means we cannot use prepared statements across transactions (unless we use protocol_native mode in newer PgBouncer versions or switch to pgcat).
If we are running on Kubernetes with per-pod connection pools, we can end up with thousands of connections even with app-level pooling. Put PgBouncer between our pods and Postgres, not inside each pod.
Indexing
Before anything else, look at the queries. Most "scaling problems" are actually "nobody added an index" problems.
-- Find the slowest queries
SELECT
calls,
round(total_exec_time::numeric, 2) AS total_ms,
round(mean_exec_time::numeric, 2) AS avg_ms,
query
FROM pg_stat_statements
ORDER BY total_exec_time DESC
LIMIT 20;Make sure pg_stat_statements is enabled — it is the most valuable extension for performance work.
Once we know what is slow, check if indexes exist:
EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM orders WHERE customer_id = 42 AND status = 'pending';If we see a sequential scan on a large table:
CREATE INDEX CONCURRENTLY idx_orders_customer_status
ON orders (customer_id, status);CONCURRENTLY avoids locking the table during creation.
Partial indexes are underused. If 90% of queries only care about active records:
CREATE INDEX idx_orders_active ON orders (customer_id)
WHERE status != 'archived';Smaller index, faster lookups, less disk I/O.
Partitioning
Once a table crosses tens of millions of rows, even good indexes start to slow down. PostgreSQL's declarative partitioning (available since v10, mature since v12):
CREATE TABLE events (
id bigint GENERATED ALWAYS AS IDENTITY,
created_at timestamptz NOT NULL,
event_type text NOT NULL,
payload jsonb
) PARTITION BY RANGE (created_at);
CREATE TABLE events_2026_01 PARTITION OF events
FOR VALUES FROM ('2026-01-01') TO ('2026-02-01');
CREATE TABLE events_2026_02 PARTITION OF events
FOR VALUES FROM ('2026-02-01') TO ('2026-03-01');
CREATE TABLE events_2026_03 PARTITION OF events
FOR VALUES FROM ('2026-03-01') TO ('2026-04-01');The query planner automatically prunes partitions it does not need.
Automate partition creation with pg_partman or a cron job:
#!/bin/bash
# Run monthly to create next month's partition
NEXT_MONTH=$(date -d "+1 month" +%Y-%m-01)
MONTH_AFTER=$(date -d "+2 months" +%Y-%m-01)
TABLE="events_$(date -d '+1 month' +%Y_%m)"
psql -d myapp -c "
CREATE TABLE IF NOT EXISTS $TABLE PARTITION OF events
FOR VALUES FROM ('$NEXT_MONTH') TO ('$MONTH_AFTER');
"Partitioning also makes DELETE obsolete for old data. Instead of deleting millions of rows (slow, generates a lot of WAL), drop the old partition:
DROP TABLE events_2025_01;Instant, no vacuum needed.
Read replicas
Most applications read more than they write. Streaming replication in PostgreSQL is reliable and simple to set up.
The architecture: one primary handles writes, one or more replicas handle reads. Our app routes queries accordingly.
import psycopg2
write_conn = psycopg2.connect("host=primary dbname=myapp")
read_conn = psycopg2.connect("host=replica1 dbname=myapp")
def get_orders(customer_id):
# Reads go to replica
with read_conn.cursor() as cur:
cur.execute("SELECT * FROM orders WHERE customer_id = %s", (customer_id,))
return cur.fetchall()
def place_order(customer_id, items):
# Writes go to primary
with write_conn.cursor() as cur:
cur.execute(
"INSERT INTO orders (customer_id, items) VALUES (%s, %s)",
(customer_id, items)
)
write_conn.commit()The caveat is replication lag. Writes to the primary take a few milliseconds to appear on replicas. Solutions:
- Read-your-own-writes: route reads to the primary for a short window after a write
- Synchronous replication: eliminates lag but adds latency to writes
- Accept the lag: for dashboards, reports, and search results, a few ms of lag rarely matters
TOAST and large values
If we store large JSON blobs or text, PostgreSQL automatically compresses and stores them out-of-line using TOAST. This is mostly transparent, but means wide rows do not slow down scans of narrow columns.
If we are doing SELECT * on tables with large JSONB columns, we decompress data we might not need. Be explicit:
-- Avoid this
SELECT * FROM events WHERE created_at > now() - interval '1 hour';
-- Prefer this
SELECT id, event_type, created_at FROM events WHERE created_at > now() - interval '1 hour';Materialized views
Some queries are inherently expensive — monthly revenue reports, leaderboard calculations, aggregations across millions of rows. Do not run them on every request.
CREATE MATERIALIZED VIEW monthly_revenue AS
SELECT
date_trunc('month', created_at) AS month,
sum(amount) AS total_revenue,
count(*) AS order_count
FROM orders
WHERE status = 'completed'
GROUP BY 1;
-- Refresh periodically (e.g., every hour via cron)
REFRESH MATERIALIZED VIEW CONCURRENTLY monthly_revenue;CONCURRENTLY keeps the view readable while refreshing. We need a unique index on the materialized view for this to work:
CREATE UNIQUE INDEX idx_monthly_revenue_month ON monthly_revenue (month);Vacuum and autovacuum tuning
PostgreSQL's MVCC means dead rows accumulate. Autovacuum cleans them up, but the defaults are conservative. On write-heavy tables, we may need to be more aggressive:
ALTER TABLE events SET (
autovacuum_vacuum_scale_factor = 0.01, -- default 0.2
autovacuum_analyze_scale_factor = 0.005, -- default 0.1
autovacuum_vacuum_cost_delay = 2 -- default 2ms (was 20ms in older versions)
);For large tables, the default "vacuum when 20% of rows are dead" means millions of dead rows can accumulate before vacuum starts. Lowering the scale factor fixes this.
Monitor dead tuples:
SELECT
relname,
n_dead_tup,
n_live_tup,
round(n_dead_tup::numeric / greatest(n_live_tup, 1) * 100, 2) AS dead_pct,
last_autovacuum
FROM pg_stat_user_tables
ORDER BY n_dead_tup DESC
LIMIT 10;Things to keep in mind
Application-level sharding adds enormous complexity — cross-shard queries, migrations, rebalancing. Unless we are at the scale that genuinely requires it, this is premature.
Switching to NoSQL for performance. I've seen teams move from PostgreSQL to MongoDB because "it's faster," then spend months rebuilding the query capabilities they lost. PostgreSQL with proper indexing is fast enough for most workloads.
Citus/distributed PostgreSQL is useful when we genuinely need horizontal write scaling, but it changes how we model data (everything needs a distribution key) and some queries become impossible or slow. Try everything else first.
The scaling ladder
Here is roughly the order I'd recommend:
- Enable
pg_stat_statementsand fix the queries. This alone solves most problems. - Add proper indexes (including partial and covering indexes).
- Set up PgBouncer for connection pooling.
- Tune
work_mem,shared_buffers,effective_cache_sizefor our hardware. Use PGTune as a starting point. - Add read replicas to offload read traffic.
- Partition large tables that grow unboundedly.
- Use materialized views for expensive aggregations.
- Only then consider Citus, sharding, or alternative databases.
Most applications will never get past step 5. And that is fine — it means PostgreSQL is doing its job.