News & Blog back

Subscribe

Avoiding the WAL Archives Retention Trap in pgBackRest

While answering support issues on pgBackRest, I regularly see some users falling in the infinite archives retention trap and asking the same question: Why are my old WAL archives not being expired?

This is pretty much always linked to a bad configuration of the repo-retention-archive setting. For example, using --repo1-retention-archive=7 --repo1-retention-diff=7 --repo1-retention-full=1.

Let’s see what this setting means and how it should be used.


Definition and example

According to its documentation, the repo-retention-archive option specifies the number of backups worth of continuous WAL to retain.

If this value is not set and repo-retention-full-type is count (default), then the archive to expire will default to the repo-retention-full (or repo-retention-diff) value corresponding to the repo-retention-archive-type if set to full (or diff). This will ensure that WAL is only expired for backups that are already expired.

Let’s use a very basic configuration for this example:

[global]
repo1-path=/var/lib/pgbackrest
repo1-retention-full=2
start-fast=y
log-level-console=info
log-level-file=detail
compress-type=zst

[my_stanza]
pg1-path=/var/lib/pgsql/17/data

The repository is stored locally for the purpose of this demo, with the configuration set to keep two full backups.

Let’s generate some data using pgbench:

$ pgbench -i -s 50 test
dropping old tables...
NOTICE:  table "pgbench_accounts" does not exist, skipping
NOTICE:  table "pgbench_branches" does not exist, skipping
NOTICE:  table "pgbench_history" does not exist, skipping
NOTICE:  table "pgbench_tellers" does not exist, skipping
creating tables...
generating data (client-side)...
vacuuming...                                                                                
creating primary keys...
done in 5.36 s...

$ pgbackrest info
stanza: my_stanza
    status: error (no valid backups)
    cipher: none

    db (current)
        wal archive min/max (17): 000000010000000000000008/00000001000000000000002E

$ ./walNb.pl --min=000000010000000000000008 --max=00000001000000000000002E
Number of WAL segments between min and max WAL: 39

Right now, we have 39 WAL archives in the repository but no backup. So, let’s take a backup:

$ pgbackrest --stanza=my_stanza backup --type=full
P00   INFO: backup command begin 2.54.2: ...
P00   INFO: execute non-exclusive backup start: 
backup begins after the requested immediate checkpoint completes
P00   INFO: backup start archive = 000000010000000000000031, lsn = 0/31000028
P00   INFO: check archive for segment 000000010000000000000031
P00   INFO: execute non-exclusive backup stop and wait for all WAL segments to archive
P00   INFO: backup stop archive = 000000010000000000000032, lsn = 0/32000088
P00   INFO: check archive for segment(s) 000000010000000000000031:000000010000000000000032
P00   INFO: new backup label = 20250124-092814F
P00   INFO: full backup size = 777.3MB, file total = 1279
P00   INFO: backup command end: completed successfully
P00   INFO: expire command begin 2.54.2: ...
P00   INFO: expire command end: completed successfully

$ pgbackrest info
stanza: my_stanza
    status: ok
    cipher: none

    db (current)
        wal archive min/max (17): 000000010000000000000008/000000010000000000000034

        full backup: 20250124-092753F
            timestamp start/stop: 2025-01-24 09:27:53+01 / 2025-01-24 09:27:56+01
            wal start/stop: 000000010000000000000030 / 000000010000000000000030
            database size: 777.3MB, database backup size: 777.3MB
            repo1: backup set size: 37.3MB, backup size: 37.3MB

The expire command was triggered automatically after the successful backup but did not remove the WAL archives generated before this backup. This is because the repo1-retention-full=2 limit has not been reached yet.

Would those archives have been removed if we had set the limit to only one full backup? The answer is yes:

$ pgbackrest --stanza=my_stanza expire --dry-run --repo1-retention-full=1
P00   INFO: [DRY-RUN] expire command begin 2.54.2: ...
P00   INFO: [DRY-RUN] repo1: 17-1 remove archive, 
			start = 000000010000000000000008, stop = 00000001000000000000002F
P00   INFO: [DRY-RUN] expire command end: completed successfully

000000010000000000000008 is the oldest (min) archive in the repository. But what is 00000001000000000000002F?

postgres=# SELECT x'2F'::integer, x'30'::integer;
 int4 | int4 
------+------
   47 |   48
(1 row)

Remember that the WAL segment filename parts are hexadecimal and 00000001000000000000002F is the segment right before 000000010000000000000030, which is the first WAL needed for our 20250124-092753F backup consistency.

This behaviour is a common source of confusion for users. Expiring older archives at the first backup can cause problems for standby replicas that require older WAL, so this behaviour is intentional! What about the problem with repo-retention-archive then?

$ pgbackrest --stanza=my_stanza expire --dry-run --repo1-retention-full=1 --repo1-retention-archive=2
P00   INFO: [DRY-RUN] expire command begin 2.54.2: ...
P00   INFO: [DRY-RUN] expire command end: completed successfully

If repo-retention-archive isn’t set up properly (bigger than the number of backups to retain), we’ll never have enough backups in the repository to meet the archives requirement and would end up with an infinite WAL archives retention… For example, with --repo1-retention-full=1 --repo1-retention-archive=2, we want to keep WAL archives for 2 full backups but only want to keep 1 full backup…

Our usual recommendation is this: unless you want to aggressively expire archives, you shouldn’t modify repo-retention-archive. It will be handled and set automatically based on repo-retention-full.

What is repo-retention-archive used for?

According to the documentation:

If disk space is limited, this setting, in conjunction with repo-retention-archive-type, can be used to aggressively expire WAL segments. However, doing so negates the ability to perform PITR from the backups with expired WAL and is therefore not recommended.

and repo-retention-archive-type is the backup type to consider for WAL retention:

If set to full pgBackRest will keep archive logs for the number of full backups defined by repo-retention-archive. If set to diff (differential) pgBackRest will keep archive logs for the number of full and differential backups defined by repo-retention-archive, meaning if the last backup taken was a full backup, it will be counted as a differential for the purpose of repo-retention. If set to incr (incremental) pgBackRest will keep archive logs for the number of full, differential, and incremental backups defined by repo-retention-archive. It is recommended that this setting not be changed from the default which will only expire WAL in conjunction with expiring full backups.

Let’s take a second full backup and clean-up the WAL archives repository accordingly:

$ pgbackrest --stanza=my_stanza backup --type=full
P00   INFO: backup command begin 2.54.2: ...
P00   INFO: execute non-exclusive backup start: 
backup begins after the requested immediate checkpoint completes
P00   INFO: backup start archive = 000000010000000000000036, lsn = 0/36000028
P00   INFO: check archive for prior segment 000000010000000000000035
P00   INFO: execute non-exclusive backup stop and wait for all WAL segments to archive
P00   INFO: backup stop archive = 000000010000000000000036, lsn = 0/36000120
P00   INFO: check archive for segment(s) 000000010000000000000036:000000010000000000000036
P00   INFO: new backup label = 20250124-104114F
P00   INFO: full backup size = 777.3MB, file total = 1279
P00   INFO: backup command end: completed successfully
P00   INFO: expire command begin 2.54.2: ...
P00   INFO: repo1: 17-1 remove archive, 
		start = 000000010000000000000008, stop = 00000001000000000000002F
P00   INFO: expire command end: completed successfully

$ pgbackrest info
stanza: my_stanza
    status: ok
    cipher: none

    db (current)
        wal archive min/max (17): 000000010000000000000030/000000010000000000000036

        full backup: 20250124-092753F
            timestamp start/stop: 2025-01-24 09:27:53+01 / 2025-01-24 09:27:56+01
            wal start/stop: 000000010000000000000030 / 000000010000000000000030
            database size: 777.3MB, database backup size: 777.3MB
            repo1: backup set size: 37.3MB, backup size: 37.3MB

        full backup: 20250124-104114F
            timestamp start/stop: 2025-01-24 10:41:14+01 / 2025-01-24 10:41:18+01
            wal start/stop: 000000010000000000000036 / 000000010000000000000036
            database size: 777.3MB, database backup size: 777.3MB
            repo1: backup set size: 37.3MB, backup size: 37.3MB

We now have two full backups, and the older WAL archives have been removed. According to the info command output, we know that 000000010000000000000030 and 000000010000000000000036 are required for backup consistency, but everything in between is only there if we want to perform Point-in-Time Recovery between those two backups.
Sometimes, in production, you might want to keep older full backups just in case but retaining all the WAL segments might not be possible due to limited disk space.

In our example, reducing repo1-retention-archive to 1 will remove those in-between WAL archives:

$ pgbackrest --stanza=my_stanza expire --dry-run --repo1-retention-archive=1
P00   INFO: [DRY-RUN] expire command begin 2.54.2: ...
P00   INFO: [DRY-RUN] repo1: 17-1 remove archive, 
			start = 000000010000000000000031, stop = 000000010000000000000035
P00   INFO: [DRY-RUN] expire command end: completed successfully

Let’s start making things more complex with differential backups:

$ pgbackrest info
stanza: my_stanza
    status: ok
    cipher: none

    db (current)
        wal archive min/max (17): 000000010000000000000036/00000001000000000000009B

        full backup: 20250124-104114F
            timestamp start/stop: 2025-01-24 10:41:14+01 / 2025-01-24 10:41:18+01
            wal start/stop: 000000010000000000000036 / 000000010000000000000036
            database size: 777.3MB, database backup size: 777.3MB
            repo1: backup set size: 37.3MB, backup size: 37.3MB

        diff backup: 20250124-104114F_20250124-105414D
            timestamp start/stop: 2025-01-24 10:54:14+01 / 2025-01-24 10:54:17+01
            wal start/stop: 000000010000000000000050 / 000000010000000000000050
            database size: 784MB, database backup size: 755.5MB
            repo1: backup set size: 41.3MB, backup size: 37.9MB
            backup reference total: 1 full

        diff backup: 20250124-104114F_20250124-105441D
            timestamp start/stop: 2025-01-24 10:54:41+01 / 2025-01-24 10:54:45+01
            wal start/stop: 00000001000000000000005B / 00000001000000000000005B
            database size: 783.4MB, database backup size: 754.9MB
            repo1: backup set size: 41.2MB, backup size: 37.8MB
            backup reference total: 1 full

        diff backup: 20250124-104114F_20250124-105516D
            timestamp start/stop: 2025-01-24 10:55:16+01 / 2025-01-24 10:55:21+01
            wal start/stop: 00000001000000000000006C / 00000001000000000000006C
            database size: 785.5MB, database backup size: 757MB
            repo1: backup set size: 42.5MB, backup size: 39MB
            backup reference total: 1 full

        full backup: 20250124-110801F
            timestamp start/stop: 2025-01-24 11:08:01+01 / 2025-01-24 11:08:06+01
            wal start/stop: 000000010000000000000089 / 000000010000000000000089
            database size: 789.2MB, database backup size: 789.2MB
            repo1: backup set size: 45.0MB, backup size: 45.0MB

We now have two full backups and three differential backups. What settings should we use to expire WAL archives between backups and only keep those generated following the last backup?

First, we need to adjust repo-retention-diff. Your instinct would tell you that using --repo1-retention-diff=3 would be the thing to do, right?

$ pgbackrest --stanza=my_stanza expire --dry-run --repo1-retention-diff=3
P00   INFO: [DRY-RUN] expire command begin 2.54.2: ...
P00   INFO: [DRY-RUN] repo1: expire diff backup 20250124-104114F_20250124-105414D
P00   INFO: [DRY-RUN] repo1: remove expired backup 20250124-104114F_20250124-105414D
P00   INFO: [DRY-RUN] repo1: 17-1 no archive to remove
P00   INFO: [DRY-RUN] expire command end: completed successfully

Another tip here, full backups are considered differential for the purpose of retention. Example: F1, D1, D2, F2, repo-retention-diff=2, then F1,D2,F2 will be retained, not D2 and D1 as might be expected.

Back to our demo, using --repo1-retention-full=2 --repo1-retention-diff=4.
First of all, if we set --repo1-retention-archive=4, we might end up in the infinite WAL archives retention trap described earlier.

Let’s try with the minimal value allowed for this setting:

$ pgbackrest --stanza=my_stanza expire --dry-run --repo1-retention-diff=4 --repo1-retention-archive=1
P00   INFO: [DRY-RUN] expire command begin 2.54.2: ...
P00   INFO: [DRY-RUN] repo1: 17-1 remove archive, 
			start = 000000010000000000000037, stop = 00000001000000000000004F
P00   INFO: [DRY-RUN] repo1: 17-1 remove archive, 
			start = 000000010000000000000051, stop = 00000001000000000000005A
P00   INFO: [DRY-RUN] repo1: 17-1 remove archive, 
			start = 00000001000000000000005C, stop = 00000001000000000000006B
P00   INFO: [DRY-RUN] repo1: 17-1 remove archive, 
			start = 00000001000000000000006D, stop = 000000010000000000000088
P00   INFO: [DRY-RUN] expire command end: completed successfully

Cool, every WAL archive before 000000010000000000000089 (our latest full backup start WAL) and that is not needed for the backups consistency would be expired!
But what if we had a differential backup after the full one?

$ pgbackrest info
stanza: my_stanza
    status: ok
    cipher: none

    db (current)
        wal archive min/max (17): 000000010000000000000036/0000000100000000000000A5

        full backup: 20250124-104114F
            timestamp start/stop: 2025-01-24 10:41:14+01 / 2025-01-24 10:41:18+01
            wal start/stop: 000000010000000000000036 / 000000010000000000000036
            database size: 777.3MB, database backup size: 777.3MB
            repo1: backup set size: 37.3MB, backup size: 37.3MB

        diff backup: 20250124-104114F_20250124-105441D
            timestamp start/stop: 2025-01-24 10:54:41+01 / 2025-01-24 10:54:45+01
            wal start/stop: 00000001000000000000005B / 00000001000000000000005B
            database size: 783.4MB, database backup size: 754.9MB
            repo1: backup set size: 41.2MB, backup size: 37.8MB
            backup reference total: 1 full

        diff backup: 20250124-104114F_20250124-105516D
            timestamp start/stop: 2025-01-24 10:55:16+01 / 2025-01-24 10:55:21+01
            wal start/stop: 00000001000000000000006C / 00000001000000000000006C
            database size: 785.5MB, database backup size: 757MB
            repo1: backup set size: 42.5MB, backup size: 39MB
            backup reference total: 1 full

        full backup: 20250124-110801F
            timestamp start/stop: 2025-01-24 11:08:01+01 / 2025-01-24 11:08:06+01
            wal start/stop: 000000010000000000000089 / 000000010000000000000089
            database size: 789.2MB, database backup size: 789.2MB
            repo1: backup set size: 45.0MB, backup size: 45.0MB

        diff backup: 20250124-110801F_20250124-112408D
            timestamp start/stop: 2025-01-24 11:24:08+01 / 2025-01-24 11:24:10+01
            wal start/stop: 00000001000000000000009D / 00000001000000000000009D
            database size: 788.8MB, database backup size: 759.8MB
            repo1: backup set size: 45.6MB, backup size: 42.1MB
            backup reference total: 1 full

$ pgbackrest --stanza=my_stanza expire --dry-run --repo1-retention-diff=4 --repo1-retention-archive=1
P00   INFO: [DRY-RUN] expire command begin 2.54.2: ...
P00   INFO: [DRY-RUN] repo1: 17-1 remove archive, 
			start = 000000010000000000000037, stop = 00000001000000000000005A
P00   INFO: [DRY-RUN] repo1: 17-1 remove archive, 
			start = 00000001000000000000005C, stop = 00000001000000000000006B
P00   INFO: [DRY-RUN] repo1: 17-1 remove archive, 
			start = 00000001000000000000006D, stop = 000000010000000000000088
P00   INFO: [DRY-RUN] expire command end: completed successfully

The WAL archives generated after our last full backup are retained. We then need to add --repo1-retention-archive-type=diff:

$ pgbackrest --stanza=my_stanza expire --dry-run --repo1-retention-diff=4 --repo1-retention-archive=1 --repo1-retention-archive-type=diff
P00   INFO: [DRY-RUN] expire command begin 2.54.2: ...
P00   INFO: [DRY-RUN] repo1: 17-1 remove archive, 
						start = 000000010000000000000037, stop = 00000001000000000000005A
P00   INFO: [DRY-RUN] repo1: 17-1 remove archive, 
						start = 00000001000000000000005C, stop = 00000001000000000000006B
P00   INFO: [DRY-RUN] repo1: 17-1 remove archive, 
						start = 00000001000000000000006D, stop = 000000010000000000000088
P00   INFO: [DRY-RUN] repo1: 17-1 remove archive, 
						start = 00000001000000000000008A, stop = 00000001000000000000009C
P00   INFO: [DRY-RUN] expire command end: completed successfully

 


Conclusion

As you’ve seen in the examples above, defining proper retention settings is not always straightforward. And furthermore, it should be carefully planned and tested in relation to your backup  schedule!

I generally recommend keeping it as simple as possible and avoiding modifications to the WAL archives retention settings unless it is absolutely necessary and you have enough experience dealing with it.

Finally, the last tip for today is to use the expire --dry-run option to understand and preview the impact of changing your retention settings. Additionally, you could disable the expire-auto option if you prefer to run the expire command manually after your next backup.

You may also like: