Avoiding the WAL Archives Retention Trap in pgBackRest
While answering support issues on pgBackRest, I regularly see some users falling in the infinite archives retention trap and asking the same question: Why are my old WAL archives not being expired?
This is pretty much always linked to a bad configuration of the repo-retention-archive setting. For example, using --repo1-retention-archive=7 --repo1-retention-diff=7 --repo1-retention-full=1
.
Let’s see what this setting means and how it should be used.
Definition and example
According to its documentation, the repo-retention-archive
option specifies the number of backups worth of continuous WAL to retain.
If this value is not set and repo-retention-full-type is count (default), then the archive to expire will default to the repo-retention-full (or repo-retention-diff) value corresponding to the repo-retention-archive-type if set to full (or diff). This will ensure that WAL is only expired for backups that are already expired.
Let’s use a very basic configuration for this example:
[global] repo1-path=/var/lib/pgbackrest repo1-retention-full=2 start-fast=y log-level-console=info log-level-file=detail compress-type=zst [my_stanza] pg1-path=/var/lib/pgsql/17/data
The repository is stored locally for the purpose of this demo, with the configuration set to keep two full backups.
Let’s generate some data using pgbench
:
$ pgbench -i -s 50 test dropping old tables... NOTICE: table "pgbench_accounts" does not exist, skipping NOTICE: table "pgbench_branches" does not exist, skipping NOTICE: table "pgbench_history" does not exist, skipping NOTICE: table "pgbench_tellers" does not exist, skipping creating tables... generating data (client-side)... vacuuming... creating primary keys... done in 5.36 s... $ pgbackrest info stanza: my_stanza status: error (no valid backups) cipher: none db (current) wal archive min/max (17): 000000010000000000000008/00000001000000000000002E $ ./walNb.pl --min=000000010000000000000008 --max=00000001000000000000002E Number of WAL segments between min and max WAL: 39
Right now, we have 39 WAL archives in the repository but no backup. So, let’s take a backup:
$ pgbackrest --stanza=my_stanza backup --type=full P00 INFO: backup command begin 2.54.2: ... P00 INFO: execute non-exclusive backup start: backup begins after the requested immediate checkpoint completes P00 INFO: backup start archive = 000000010000000000000031, lsn = 0/31000028 P00 INFO: check archive for segment 000000010000000000000031 P00 INFO: execute non-exclusive backup stop and wait for all WAL segments to archive P00 INFO: backup stop archive = 000000010000000000000032, lsn = 0/32000088 P00 INFO: check archive for segment(s) 000000010000000000000031:000000010000000000000032 P00 INFO: new backup label = 20250124-092814F P00 INFO: full backup size = 777.3MB, file total = 1279 P00 INFO: backup command end: completed successfully P00 INFO: expire command begin 2.54.2: ... P00 INFO: expire command end: completed successfully $ pgbackrest info stanza: my_stanza status: ok cipher: none db (current) wal archive min/max (17): 000000010000000000000008/000000010000000000000034 full backup: 20250124-092753F timestamp start/stop: 2025-01-24 09:27:53+01 / 2025-01-24 09:27:56+01 wal start/stop: 000000010000000000000030 / 000000010000000000000030 database size: 777.3MB, database backup size: 777.3MB repo1: backup set size: 37.3MB, backup size: 37.3MB
The expire
command was triggered automatically after the successful backup but did not remove the WAL archives generated before this backup. This is because the repo1-retention-full=2
limit has not been reached yet.
Would those archives have been removed if we had set the limit to only one full backup? The answer is yes:
$ pgbackrest --stanza=my_stanza expire --dry-run --repo1-retention-full=1 P00 INFO: [DRY-RUN] expire command begin 2.54.2: ... P00 INFO: [DRY-RUN] repo1: 17-1 remove archive, start = 000000010000000000000008, stop = 00000001000000000000002F P00 INFO: [DRY-RUN] expire command end: completed successfully
000000010000000000000008 is the oldest (min) archive in the repository. But what is 00000001000000000000002F?
postgres=# SELECT x'2F'::integer, x'30'::integer; int4 | int4 ------+------ 47 | 48 (1 row)
Remember that the WAL segment filename parts are hexadecimal and 00000001000000000000002F is the segment right before 000000010000000000000030, which is the first WAL needed for our 20250124-092753F
backup consistency.
This behaviour is a common source of confusion for users. Expiring older archives at the first backup can cause problems for standby replicas that require older WAL, so this behaviour is intentional! What about the problem with repo-retention-archive
then?
$ pgbackrest --stanza=my_stanza expire --dry-run --repo1-retention-full=1 --repo1-retention-archive=2 P00 INFO: [DRY-RUN] expire command begin 2.54.2: ... P00 INFO: [DRY-RUN] expire command end: completed successfully
If repo-retention-archive
isn’t set up properly (bigger than the number of backups to retain), we’ll never have enough backups in the repository to meet the archives requirement and would end up with an infinite WAL archives retention… For example, with --repo1-retention-full=1 --repo1-retention-archive=2
, we want to keep WAL archives for 2 full backups but only want to keep 1 full backup…
Our usual recommendation is this: unless you want to aggressively expire archives, you shouldn’t modify repo-retention-archive. It will be handled and set automatically based on repo-retention-full
.
What is repo-retention-archive used for?
According to the documentation:
If disk space is limited, this setting, in conjunction with repo-retention-archive-type, can be used to aggressively expire WAL segments. However, doing so negates the ability to perform PITR from the backups with expired WAL and is therefore not recommended.
and repo-retention-archive-type is the backup type to consider for WAL retention:
If set to full pgBackRest will keep archive logs for the number of full backups defined by repo-retention-archive. If set to diff (differential) pgBackRest will keep archive logs for the number of full and differential backups defined by repo-retention-archive, meaning if the last backup taken was a full backup, it will be counted as a differential for the purpose of repo-retention. If set to incr (incremental) pgBackRest will keep archive logs for the number of full, differential, and incremental backups defined by repo-retention-archive. It is recommended that this setting not be changed from the default which will only expire WAL in conjunction with expiring full backups.
Let’s take a second full backup and clean-up the WAL archives repository accordingly:
$ pgbackrest --stanza=my_stanza backup --type=full P00 INFO: backup command begin 2.54.2: ... P00 INFO: execute non-exclusive backup start: backup begins after the requested immediate checkpoint completes P00 INFO: backup start archive = 000000010000000000000036, lsn = 0/36000028 P00 INFO: check archive for prior segment 000000010000000000000035 P00 INFO: execute non-exclusive backup stop and wait for all WAL segments to archive P00 INFO: backup stop archive = 000000010000000000000036, lsn = 0/36000120 P00 INFO: check archive for segment(s) 000000010000000000000036:000000010000000000000036 P00 INFO: new backup label = 20250124-104114F P00 INFO: full backup size = 777.3MB, file total = 1279 P00 INFO: backup command end: completed successfully P00 INFO: expire command begin 2.54.2: ... P00 INFO: repo1: 17-1 remove archive, start = 000000010000000000000008, stop = 00000001000000000000002F P00 INFO: expire command end: completed successfully $ pgbackrest info stanza: my_stanza status: ok cipher: none db (current) wal archive min/max (17): 000000010000000000000030/000000010000000000000036 full backup: 20250124-092753F timestamp start/stop: 2025-01-24 09:27:53+01 / 2025-01-24 09:27:56+01 wal start/stop: 000000010000000000000030 / 000000010000000000000030 database size: 777.3MB, database backup size: 777.3MB repo1: backup set size: 37.3MB, backup size: 37.3MB full backup: 20250124-104114F timestamp start/stop: 2025-01-24 10:41:14+01 / 2025-01-24 10:41:18+01 wal start/stop: 000000010000000000000036 / 000000010000000000000036 database size: 777.3MB, database backup size: 777.3MB repo1: backup set size: 37.3MB, backup size: 37.3MB
We now have two full backups, and the older WAL archives have been removed. According to the info
command output, we know that 000000010000000000000030 and 000000010000000000000036 are required for backup consistency, but everything in between is only there if we want to perform Point-in-Time Recovery between those two backups.
Sometimes, in production, you might want to keep older full backups just in case but retaining all the WAL segments might not be possible due to limited disk space.
In our example, reducing repo1-retention-archive
to 1 will remove those in-between WAL archives:
$ pgbackrest --stanza=my_stanza expire --dry-run --repo1-retention-archive=1 P00 INFO: [DRY-RUN] expire command begin 2.54.2: ... P00 INFO: [DRY-RUN] repo1: 17-1 remove archive, start = 000000010000000000000031, stop = 000000010000000000000035 P00 INFO: [DRY-RUN] expire command end: completed successfully
Let’s start making things more complex with differential backups:
$ pgbackrest info stanza: my_stanza status: ok cipher: none db (current) wal archive min/max (17): 000000010000000000000036/00000001000000000000009B full backup: 20250124-104114F timestamp start/stop: 2025-01-24 10:41:14+01 / 2025-01-24 10:41:18+01 wal start/stop: 000000010000000000000036 / 000000010000000000000036 database size: 777.3MB, database backup size: 777.3MB repo1: backup set size: 37.3MB, backup size: 37.3MB diff backup: 20250124-104114F_20250124-105414D timestamp start/stop: 2025-01-24 10:54:14+01 / 2025-01-24 10:54:17+01 wal start/stop: 000000010000000000000050 / 000000010000000000000050 database size: 784MB, database backup size: 755.5MB repo1: backup set size: 41.3MB, backup size: 37.9MB backup reference total: 1 full diff backup: 20250124-104114F_20250124-105441D timestamp start/stop: 2025-01-24 10:54:41+01 / 2025-01-24 10:54:45+01 wal start/stop: 00000001000000000000005B / 00000001000000000000005B database size: 783.4MB, database backup size: 754.9MB repo1: backup set size: 41.2MB, backup size: 37.8MB backup reference total: 1 full diff backup: 20250124-104114F_20250124-105516D timestamp start/stop: 2025-01-24 10:55:16+01 / 2025-01-24 10:55:21+01 wal start/stop: 00000001000000000000006C / 00000001000000000000006C database size: 785.5MB, database backup size: 757MB repo1: backup set size: 42.5MB, backup size: 39MB backup reference total: 1 full full backup: 20250124-110801F timestamp start/stop: 2025-01-24 11:08:01+01 / 2025-01-24 11:08:06+01 wal start/stop: 000000010000000000000089 / 000000010000000000000089 database size: 789.2MB, database backup size: 789.2MB repo1: backup set size: 45.0MB, backup size: 45.0MB
We now have two full backups and three differential backups. What settings should we use to expire WAL archives between backups and only keep those generated following the last backup?
First, we need to adjust repo-retention-diff
. Your instinct would tell you that using --repo1-retention-diff=3
would be the thing to do, right?
$ pgbackrest --stanza=my_stanza expire --dry-run --repo1-retention-diff=3 P00 INFO: [DRY-RUN] expire command begin 2.54.2: ... P00 INFO: [DRY-RUN] repo1: expire diff backup 20250124-104114F_20250124-105414D P00 INFO: [DRY-RUN] repo1: remove expired backup 20250124-104114F_20250124-105414D P00 INFO: [DRY-RUN] repo1: 17-1 no archive to remove P00 INFO: [DRY-RUN] expire command end: completed successfully
Another tip here, full backups are considered differential for the purpose of retention. Example: F1, D1, D2, F2, repo-retention-diff=2, then F1,D2,F2 will be retained, not D2 and D1 as might be expected.
Back to our demo, using --repo1-retention-full=2 --repo1-retention-diff=4
.
First of all, if we set --repo1-retention-archive=4
, we might end up in the infinite WAL archives retention trap described earlier.
Let’s try with the minimal value allowed for this setting:
$ pgbackrest --stanza=my_stanza expire --dry-run --repo1-retention-diff=4 --repo1-retention-archive=1 P00 INFO: [DRY-RUN] expire command begin 2.54.2: ... P00 INFO: [DRY-RUN] repo1: 17-1 remove archive, start = 000000010000000000000037, stop = 00000001000000000000004F P00 INFO: [DRY-RUN] repo1: 17-1 remove archive, start = 000000010000000000000051, stop = 00000001000000000000005A P00 INFO: [DRY-RUN] repo1: 17-1 remove archive, start = 00000001000000000000005C, stop = 00000001000000000000006B P00 INFO: [DRY-RUN] repo1: 17-1 remove archive, start = 00000001000000000000006D, stop = 000000010000000000000088 P00 INFO: [DRY-RUN] expire command end: completed successfully
Cool, every WAL archive before 000000010000000000000089 (our latest full backup start WAL) and that is not needed for the backups consistency would be expired!
But what if we had a differential backup after the full one?
$ pgbackrest info stanza: my_stanza status: ok cipher: none db (current) wal archive min/max (17): 000000010000000000000036/0000000100000000000000A5 full backup: 20250124-104114F timestamp start/stop: 2025-01-24 10:41:14+01 / 2025-01-24 10:41:18+01 wal start/stop: 000000010000000000000036 / 000000010000000000000036 database size: 777.3MB, database backup size: 777.3MB repo1: backup set size: 37.3MB, backup size: 37.3MB diff backup: 20250124-104114F_20250124-105441D timestamp start/stop: 2025-01-24 10:54:41+01 / 2025-01-24 10:54:45+01 wal start/stop: 00000001000000000000005B / 00000001000000000000005B database size: 783.4MB, database backup size: 754.9MB repo1: backup set size: 41.2MB, backup size: 37.8MB backup reference total: 1 full diff backup: 20250124-104114F_20250124-105516D timestamp start/stop: 2025-01-24 10:55:16+01 / 2025-01-24 10:55:21+01 wal start/stop: 00000001000000000000006C / 00000001000000000000006C database size: 785.5MB, database backup size: 757MB repo1: backup set size: 42.5MB, backup size: 39MB backup reference total: 1 full full backup: 20250124-110801F timestamp start/stop: 2025-01-24 11:08:01+01 / 2025-01-24 11:08:06+01 wal start/stop: 000000010000000000000089 / 000000010000000000000089 database size: 789.2MB, database backup size: 789.2MB repo1: backup set size: 45.0MB, backup size: 45.0MB diff backup: 20250124-110801F_20250124-112408D timestamp start/stop: 2025-01-24 11:24:08+01 / 2025-01-24 11:24:10+01 wal start/stop: 00000001000000000000009D / 00000001000000000000009D database size: 788.8MB, database backup size: 759.8MB repo1: backup set size: 45.6MB, backup size: 42.1MB backup reference total: 1 full $ pgbackrest --stanza=my_stanza expire --dry-run --repo1-retention-diff=4 --repo1-retention-archive=1 P00 INFO: [DRY-RUN] expire command begin 2.54.2: ... P00 INFO: [DRY-RUN] repo1: 17-1 remove archive, start = 000000010000000000000037, stop = 00000001000000000000005A P00 INFO: [DRY-RUN] repo1: 17-1 remove archive, start = 00000001000000000000005C, stop = 00000001000000000000006B P00 INFO: [DRY-RUN] repo1: 17-1 remove archive, start = 00000001000000000000006D, stop = 000000010000000000000088 P00 INFO: [DRY-RUN] expire command end: completed successfully
The WAL archives generated after our last full backup are retained. We then need to add --repo1-retention-archive-type=diff
:
$ pgbackrest --stanza=my_stanza expire --dry-run --repo1-retention-diff=4 --repo1-retention-archive=1 --repo1-retention-archive-type=diff P00 INFO: [DRY-RUN] expire command begin 2.54.2: ... P00 INFO: [DRY-RUN] repo1: 17-1 remove archive, start = 000000010000000000000037, stop = 00000001000000000000005A P00 INFO: [DRY-RUN] repo1: 17-1 remove archive, start = 00000001000000000000005C, stop = 00000001000000000000006B P00 INFO: [DRY-RUN] repo1: 17-1 remove archive, start = 00000001000000000000006D, stop = 000000010000000000000088 P00 INFO: [DRY-RUN] repo1: 17-1 remove archive, start = 00000001000000000000008A, stop = 00000001000000000000009C P00 INFO: [DRY-RUN] expire command end: completed successfully
Conclusion
As you’ve seen in the examples above, defining proper retention settings is not always straightforward. And furthermore, it should be carefully planned and tested in relation to your backup schedule!
I generally recommend keeping it as simple as possible and avoiding modifications to the WAL archives retention settings unless it is absolutely necessary and you have enough experience dealing with it.
Finally, the last tip for today is to use the expire --dry-run
option to understand and preview the impact of changing your retention settings. Additionally, you could disable the expire-auto option if you prefer to run the expire command manually after your next backup.