How to Upgrade PostgreSQL Without Losing Data
The three ways to upgrade a PostgreSQL major version: pg_dump, pg_upgrade, and logical replication, and the problems you can hit with each one.
How do you move a PostgreSQL database to a new major version without taking it down for hours? There are three ways to do it. This is what tends to break with each, and which one to reach for before the others.
Why major version upgrades are different
Minor version upgrades (14.1 to 14.2) are easy. The on-disk format stays the same, so you swap binaries and restart. Major version upgrades (14 to 16) change the internal storage format. This means you can't just point the new binary at the old data directory.
Three tools actually do this. pg_dump / pg_restore is the old standard: slow, very reliable. pg_upgrade is the faster in-place upgrade, and it's the one most people end up using once a database gets big enough that a dump-and-restore window stops being acceptable. Logical replication is the heavy option for when you need near-zero downtime.
Dumping and restoring with pg_dump
Dump everything out, bring up the new version, load it back in.
# Dump the entire cluster
pg_dumpall -U postgres -h localhost -p 5432 > full_dump.sql
# Stop old server, start new one on the same port
sudo systemctl stop postgresql@14-main
sudo systemctl start postgresql@16-main
# Restore
psql -U postgres -f full_dump.sqlFor parallelism and compression, use the custom format per-database:
pg_dump -U postgres -Fc -j 4 -d myapp > myapp.dump
pg_restore -U postgres -Fc -j 4 -d myapp myapp.dumpSimple, works every time, and you come out the other side with a clean, freshly-organized database. The catch is downtime: the database is fully offline for the whole dump and restore, which on a 500 GB database can run into hours. Fine for small databases (under 50 GB or so), dev and staging, or whenever there's a long maintenance window.
pg_upgrade, in place
pg_upgrade upgrades in place. It moves the data files directly, either by copying them or by hard-linking them.
# Install the new version alongside the old one
sudo apt install postgresql-16
# Stop both clusters
sudo systemctl stop postgresql@14-main
sudo systemctl stop postgresql@16-main
# Run pg_upgrade in check mode first (always do this)
sudo -u postgres /usr/lib/postgresql/16/bin/pg_upgrade \
--old-datadir=/var/lib/postgresql/14/main \
--new-datadir=/var/lib/postgresql/16/main \
--old-bindir=/usr/lib/postgresql/14/bin \
--new-bindir=/usr/lib/postgresql/16/bin \
--check
# If checks pass, run with --link for speed
sudo -u postgres /usr/lib/postgresql/16/bin/pg_upgrade \
--old-datadir=/var/lib/postgresql/14/main \
--new-datadir=/var/lib/postgresql/16/main \
--old-bindir=/usr/lib/postgresql/14/bin \
--new-bindir=/usr/lib/postgresql/16/bin \
--linkThe --link flag creates hard links instead of copying data files. This turns a two-hour copy into a few seconds. The tradeoff is that once you start the new cluster you can't go back to the old one, because both share the same files. Take a filesystem snapshot or backup before you start.
After the upgrade:
# Start the new cluster
sudo systemctl start postgresql@16-main
# Run the recommended analyze
sudo -u postgres /usr/lib/postgresql/16/bin/vacuumdb \
--all --analyze-in-stagesDon't skip the --analyze-in-stages step. Right after the upgrade the planner has no statistics, so your queries get bad plans until it builds them. It runs three passes with increasing accuracy. The first pass is rough but quick, so the database is usable again sooner.
This is the fast one. Minutes instead of hours, well tested, and it ships with PostgreSQL. You do pay in downtime, since both clusters have to be stopped while it runs. Usually that's 5 to 15 minutes, but I plan for 30+ so I don't have to watch the clock.
A few things can cause problems. Extensions that aren't available for the new version will block the upgrade, but the --check flag catches that ahead of time. Custom postgresql.conf settings don't carry over, so diff the configs by hand. And if you use tablespaces, the paths have to be set up correctly on both sides.
Logical replication, for near-zero downtime
Run old and new versions side by side, replicate data in real time, then cut over.
On the old server (publisher):
-- Make sure wal_level is set to 'logical' (requires restart if changed)
ALTER SYSTEM SET wal_level = 'logical';
-- Create a publication for all tables
CREATE PUBLICATION upgrade_pub FOR ALL TABLES;On the new server (subscriber):
# First, copy the schema (logical replication does not handle DDL)
pg_dump -U postgres -s -d myapp | psql -U postgres -d myapp -h new-server-- Create a subscription pointing to the old server
CREATE SUBSCRIPTION upgrade_sub
CONNECTION 'host=old-server dbname=myapp user=replicator password=secret'
PUBLICATION upgrade_pub;Once the initial data sync completes and replication lag drops to zero, cut over:
# Verify replication is caught up
psql -U postgres -c "SELECT * FROM pg_stat_subscription;"
# Point your application to the new server
# (update connection strings, DNS, pgbouncer config, etc.)
# On the new server, drop the subscription
psql -U postgres -c "DROP SUBSCRIPTION upgrade_sub;"The application stays up the whole time, and you get to poke at the new server before you switch to it. That's the whole reason to do it. Everything else about logical replication is a list of things that don't come for free: sequences don't replicate, so you sync them by hand before cutover; large objects aren't supported at all; any DDL during the migration has to be applied to both sides by you; and schema and extension versions have to stay matched across the two servers.
Syncing sequences before cutover:
-- On the old server, grab current sequence values
SELECT sequencename, last_value
FROM pg_sequences
WHERE schemaname = 'public';
-- On the new server, set them (with some headroom)
SELECT setval('users_id_seq', 158432 + 1000);
SELECT setval('orders_id_seq', 891204 + 1000);Adding a buffer prevents ID collisions if writes happen during cutover.
Doing this with containers
The ideas are the same. You just swap image tags instead of installing packages side by side.
A common Docker Compose pattern:
services:
postgres-old:
image: postgres:14
volumes:
- pgdata:/var/lib/postgresql/data
ports:
- "5432:5432"
postgres-new:
image: postgres:16
volumes:
- pgdata-new:/var/lib/postgresql/data
ports:
- "5433:5432"Use pg_dump from one container to the other, or set up logical replication between them. pg_upgrade is harder here, because it needs both sets of binaries in the same container. So you end up building a custom image.
In Kubernetes with operators like CloudNativePG or Zalando's postgres-operator, a major upgrade usually means starting a new cluster on the target version and moving the data over with pg_dump or logical replication.
What I reach for
For most teams pg_upgrade --link is the right call. It's fast, well tested, and the downtime fits inside a normal maintenance window. I've run it on a fairly large database and finished well inside half an hour.
Logical replication can do more, but it's also where I've watched the most things go wrong. A forgotten sequence. Schema drift. An extension that's a version behind on the new side. Save it for when you genuinely need near-zero downtime and have time to test the whole process properly.
Whatever you pick, test it on a copy first. Restore a recent backup to a spare server and run the thing end to end. The step that breaks is rarely the one you were worried about. Then take a fresh backup right before the real run.
And read more release notes than you think you need to. If you're jumping several major versions at once, read the notes for every version in between, not just the target. A breaking change can sit in a version you skipped right over.