News & Blog back

Subscribe

PITR and Streaming Replication environments

Recently, I had to address a support case where a user was struggling to perform Point-in-Time Recovery in his PostgreSQL cluster using streaming replication. We have already discussed how to achieve a successful recovery in our past webinars, but the question remains: what is the impact on the standby servers? In this specific case, the user was unsure whether he needed to apply the Point-in-Time Recovery steps on the standby as well, and after doing so, the standby refused to connect to the primary.

In this post, we will examine this specific use case, perform a recovery on the primary, and explore the best approach to re-synchronize the standby servers.


For the purpose of this post, we will use 2 nodes called pg1 and pg2. Both are running on Rocky Linux 9.

Installation

On both pg1 and pg2 server, first configure the PGDG yum repositories:

$ sudo dnf install -y https://download.postgresql.org/pub/repos/yum/reporpms/\
    EL-9-x86_64/pgdg-redhat-repo-latest.noarch.rpm
$ sudo dnf -qy module disable postgresql

Then, install PostgreSQL and create a basic PostgreSQL cluster on pg1:

$ sudo dnf install -y postgresql16-server
$ sudo PGSETUP_INITDB_OPTIONS="--data-checksums"\
    /usr/pgsql-16/bin/postgresql-16-setup initdb
$ sudo systemctl enable postgresql-16 --now
$ sudo -iu postgres ps -o pid,cmd fx

As we are talking about Point-in-time Recovery, we need to store the WAL archives in a location shared across the nodes. In this example, we will create an archives directory inside a /shared NFS mount:

$ sudo -iu postgres mkdir /shared/archives

Configure the listening addresses and enable archiving in the postgresql(.auto).conf file:

listen_addresses = '*'
archive_mode = on
archive_command = 'test ! -f /shared/archives/%f && cp %p /shared/archives/%f'

Finally, restart the PostgreSQL service using:

$ sudo systemctl restart postgresql-16.service

Setup the streaming replication

On pg1, create a specific user for the replication:

$ sudo -iu postgres psql
postgres=# CREATE ROLE replic_user WITH LOGIN REPLICATION PASSWORD 'mypwd';

Configure pg_hba.conf:

host replication replic_user pg2 scram-sha-256

Reload configuration and allow the service in the firewall (if needed):

$ sudo systemctl reload postgresql-16.service
$ sudo firewall-cmd --permanent --add-service=postgresql
$ sudo firewall-cmd --reload

Configure ~postgres/.pgpass on pg2:

$ sudo su - postgres
$ echo "*:*:replication:replic_user:mypwd" >> ~postgres/.pgpass
$ chown postgres: ~postgres/.pgpass
$ chmod 0600 ~postgres/.pgpass

Initialize the standby server on pg2 by copying the data directory of pg1 using pg_basebackup:

$ sudo su - postgres
$ pg_basebackup -h pg1 -U replic_user -D /var/lib/pgsql/16/data\
    --write-recovery-conf --checkpoint=fast --slot='pg2' --create-slot --progress
23211/23211 kB (100%), 1/1 tablespace
$ cat /var/lib/pgsql/16/data/postgresql.auto.conf |grep primary_conninfo
primary_conninfo = '...'

The pg_basebackup command will create the pg2 physical replication slot, so the primary will keep all the necessary WAL files needed for the standby to catch-up.

Start and enable on startup the PostgreSQL service using:

$ sudo systemctl enable postgresql-16 --now
$ sudo -iu postgres ps -o pid,cmd fx

Check the streaming replication status using pg_stat_replication and pg_replication_slots on pg1, and pg_stat_wal_receiver on pg2:

postgres=# SELECT * FROM pg_stat_replication;
-[ RECORD 1 ]----+------------------------------
pid              | 2599
usesysid         | 16384
usename          | replic_user
application_name | walreceiver
client_addr      | 192.168.121.101
client_hostname  | pg2
client_port      | 46890
backend_start    | 2024-09-03 09:12:36.20348+00
backend_xmin     |
state            | streaming
sent_lsn         | 0/3000060
write_lsn        | 0/3000060
flush_lsn        | 0/3000060
replay_lsn       | 0/3000060
write_lag        |
flush_lag        |
replay_lag       |
sync_priority    | 0
sync_state       | async
reply_time       | 2024-09-03 09:12:55.094425+00
postgres=# SELECT * FROM pg_replication_slots;
-[ RECORD 1 ]-------+----------
slot_name           | pg2
plugin              |
slot_type           | physical
datoid              |
database            |
temporary           | f
active              | t
active_pid          | 2599
xmin                |
catalog_xmin        |
restart_lsn         | 0/3000060
confirmed_flush_lsn |
wal_status          | reserved
safe_wal_size       |
two_phase           | f
conflicting         |
postgres=# SELECT * FROM pg_stat_wal_receiver;
-[ RECORD 1 ]---------+------------------------------
pid                   | 2289
status                | streaming
receive_start_lsn     | 0/3000000
receive_start_tli     | 1
written_lsn           | 0/3000060
flushed_lsn           | 0/3000000
received_tli          | 1
last_msg_send_time    | 2024-09-03 09:13:36.230958+00
last_msg_receipt_time | 2024-09-03 09:13:35.104889+00
latest_end_lsn        | 0/3000060
latest_end_time       | 2024-09-03 09:12:36.207722+00
slot_name             | pg2
sender_host           | pg1
sender_port           | 5432
conninfo              | ...

Database content initialization

Let’s now create some test data using pgbench:

$ sudo -iu postgres createdb test
$ sudo -iu postgres /usr/pgsql-16/bin/pgbench -i -s 600 test

Since we want to experiment Point-in-time Recovery, we also need to take an initial backup of pg1:

$ sudo su - postgres
$ mkdir /shared/backups
$ pg_basebackup -D "/shared/backups/$(date +'%F_%T')" \
    --format=plain --wal-method=none --checkpoint=fast --progress
NOTICE: all required WAL segments have been archived
9218540/9218540 kB (100%), 1/1 tablespace

Generate some activity using pgbench:

$ /usr/pgsql-16/bin/pgbench -c 4 -j 2 -T 300 test

Let’s dig into our database content to find out a possible restore time using the pgbench_history.mtime column:

test=# SELECT min(mtime),max(mtime) FROM pgbench_history;
           min              |            max
----------------------------+----------------------------
 2024-09-03 09:25:30.474732 | 2024-09-03 09:30:30.469797
(1 row)
test=# SELECT * FROM pgbench_history WHERE mtime
between '2024-09-03 09:27:00' AND '2024-09-03 09:28:00'
ORDER BY mtime DESC LIMIT 10;
tid   | bid |   aid    | delta |           mtime            | filler
------+-----+----------+-------+----------------------------+--------
614   | 206 |  2388179 |  1489 | 2024-09-03 09:27:59.999669 |
5278  | 569 | 38093646 |   235 | 2024-09-03 09:27:59.999645 |
2470  | 160 |  6196059 | -1997 | 2024-09-03 09:27:59.99962  |
598   | 466 | 27063999 |  4034 | 2024-09-03 09:27:59.999108 |
542   | 438 | 50846207 |  2724 | 2024-09-03 09:27:59.997713 |
568   | 452 | 17938862 |  1501 | 2024-09-03 09:27:59.996939 |
1738  | 377 | 55042253 |  3608 | 2024-09-03 09:27:59.996903 |
4002  | 497 | 16634903 | -2938 | 2024-09-03 09:27:59.996889 |
5551  | 577 |   382773 |  4765 | 2024-09-03 09:27:59.995913 |
2398  |  51 |  8128754 |  2714 | 2024-09-03 09:27:59.995004 |
(10 rows)

mtime is a timestamp without time zone, which could potentially create some issues but for our example we will try to restore up until 2024-09-03 09:28:00.

Perform Point-in-time Recovery on the primary

First, stop the PostgreSQL service and (if you get enough disk space) save the old data directory:

$ sudo systemctl stop postgresql-16
$ sudo mv /var/lib/pgsql/16/data /var/lib/pgsql/16/data.old

Restore the backup and create the recovery signal:

$ sudo su - postgres
$ cp -rfp /shared/backups/2024-09-03_09\:21\:46/ /var/lib/pgsql/16/data
$ touch /var/lib/pgsql/16/data/recovery.signal

Add the recovery settings to the postgresql(.auto).conf:

restore_command = 'cp /shared/archives/%f %p'
recovery_target_action = 'promote'
recovery_target_time = '2024-09-03 09:28:00'
recovery_target_inclusive = off

Start the PostgreSQL service and watch the recovery process in the logs:

LOG: starting point-in-time recovery to 2024-09-03 09:28:00+00
LOG: starting backup recovery with redo LSN 1/D0000028,...
LOG: restored log file "0000000100000001000000D0" from archive
LOG: restored log file "0000000100000001000000D1" from archive
LOG: completed backup recovery with redo LSN 1/D0000028 and end LSN 1/D0000138
LOG: consistent recovery state reached at 1/D0000138
...
LOG: recovery stopping before commit of transaction 259244, time 2024-09-03 09:28:00.000048+00
LOG: redo done at 2/B1F61158
LOG: last completed transaction was at log time 2024-09-03 09:27:59.998357+00
LOG: selected new timeline ID: 2
LOG: archive recovery complete
LOG: database system is ready to accept connections

Check the database content:

test=# SELECT min(mtime),max(mtime) FROM pgbench_history;
           min              |            max
----------------------------+----------------------------
 2024-09-03 09:25:30.474732 | 2024-09-03 09:27:59.997713
(1 row)

Obviously, since we left the standby server untouched, the streaming replication is broken. You should see this kind of messages in the PostgreSQL logs:

FATAL: could not start WAL streaming: ERROR: replication slot "pg2" does not exist
LOG: new timeline 2 forked off current database system timeline 1 before current recovery point 3/760000A0
LOG: waiting for WAL to become available at 3/760000B8

Standby server re-synchronization

There are several possibilities for re-synchronizing the standby server. The first approach is to reuse the same command that was used to build it the first time, using pg_basebackup.

After a successful recovery, a common best practice is to take a fresh full backup of the primary. Instead of using pg_basebackup to rebuild the standby server, we could use it to take a backup of the primary and then restore that backup on the standby!

Finally, the approach we will explore in this blog post is to rewind the standby server using the old backup. We will ask PostgreSQL to replay the WAL archives up to the current primary location, follow the new timeline, and then stop. Once PostgreSQL has been recovered to that location, we can remove the recovery settings and re-enable streaming replication.

To determine the current location of the primary, we will first create a physical replication slot on pg1. This ensures that the primary retains all the necessary WAL files needed for the standby to catch up.

postgres=# SELECT * FROM pg_create_physical_replication_slot(
    slot_name := 'pg2',
    immediately_reserve := TRUE);
 slot_name |    lsn
-----------+------------
 pg2       | 2/B2323178

On pg2, let’s now stop the PostgreSQL service and (if you get enough disk space) save the old data directory:

$ sudo systemctl stop postgresql-16
$ sudo mv /var/lib/pgsql/16/data /var/lib/pgsql/16/data.old

Restore the old backup and create the recovery signal:

$ sudo su - postgres
$ cp -rfp /shared/backups/2024-09-03_09\:21\:46/ /var/lib/pgsql/16/data
$ touch /var/lib/pgsql/16/data/recovery.signal

Add the recovery settings to the postgresql(.auto).conf:

archive_mode = off
restore_command = 'cp /shared/archives/%f %p'
recovery_target_action = 'shutdown'
recovery_target_lsn = '2/B2323178'
recovery_target_timeline = 'latest'

Start the PostgreSQL service and watch the recovery process in the logs:

LOG: starting point-in-time recovery to WAL location (LSN) "2/B2323178"
LOG: starting backup recovery with redo LSN 1/D0000028,...
LOG: restored log file "00000002.history" from archive
LOG: restored log file "0000000100000001000000D0" from archive
LOG: redo starts at 1/D0000028
LOG: restored log file "0000000100000001000000D1" from archive
LOG: completed backup recovery with redo LSN 1/D0000028 and end LSN 1/D0000138
LOG: consistent recovery state reached at 1/D0000138
LOG: database system is ready to accept read-only connections
...
LOG: restored log file "0000000100000002000000B0" from archive
LOG: restored log file "0000000200000002000000B1" from archive
LOG: restored log file "0000000200000002000000B2" from archive
LOG: recovery stopping after WAL location (LSN) "2/B2323178"
LOG: shutdown at recovery target

The standby server should be stopped. So we can now replace the recovery settings by the streaming replication settings in postgresql(.auto).conf:

primary_conninfo = 'user=replic_user host=pg1'
primary_slot_name = 'pg2'

Create the standby signal:

$ sudo -iu postgres mv 16/data/recovery.signal 16/data/standby.signal

Start the PostgreSQL service again and watch the logs:

LOG: database system was shut down in recovery
LOG: entering standby mode
LOG: redo starts at 2/B1F611D0
LOG: consistent recovery state reached at 2/B23231B0
LOG: database system is ready to accept read-only connections
LOG: started streaming WAL from primary at 2/B3000000 on timeline 2

Check the streaming replication status using pg_stat_replication and pg_replication_slots on pg1, and pg_stat_wal_receiver on pg2:

postgres=# SELECT * FROM pg_stat_replication;
-[ RECORD 1 ]----+------------------------------
pid              | 4955
usesysid         | 16384
usename          | replic_user
application_name | walreceiver
client_addr      | 192.168.121.101
client_hostname  | pg2
client_port      | 35622
backend_start    | 2024-09-03 10:14:48.468542+00
backend_xmin     |
state            | streaming
sent_lsn         | 2/B3000148
write_lsn        | 2/B3000148
flush_lsn        | 2/B3000148
replay_lsn       | 2/B3000148
write_lag        |
flush_lag        |
replay_lag       |
sync_priority    | 0
sync_state       | async
reply_time       | 2024-09-03 10:15:47.430308+00
postgres=# SELECT * FROM pg_replication_slots;
-[ RECORD 1 ]-------+-----------
slot_name           | pg2
plugin              |
slot_type           | physical
datoid              |
database            |
temporary           | f
active              | t
active_pid          | 4955
xmin                |
catalog_xmin        |
restart_lsn         | 2/B3000148
confirmed_flush_lsn |
wal_status          | reserved
safe_wal_size       |
two_phase           | f
conflicting         |
postgres=# SELECT * FROM pg_stat_wal_receiver;
-[ RECORD 1 ]---------+------------------------------
pid                   | 4419
status                | streaming
receive_start_lsn     | 2/B3000000
receive_start_tli     | 2
written_lsn           | 2/B3000148
flushed_lsn           | 2/B3000148
received_tli          | 2
last_msg_send_time    | 2024-09-03 10:16:18.569324+00
last_msg_receipt_time | 2024-09-03 10:16:17.443183+00
latest_end_lsn        | 2/B3000148
latest_end_time       | 2024-09-03 10:14:48.533651+00
slot_name             | pg2
sender_host           | pg1
sender_port           | 5432
conninfo              | ...

And that’s it: the standby server is back online!


Conclusion

The fastest way to re-sync a standby server after PITR is usually to create a new base backup and set up the standby again. Rebuilding from a fresh backup is generally quicker than replaying WAL archives from the point of recovery.

And if you know me, you probably already know that I’d recommend using pgBackRest over pg_basebackup. The pgBackRest --delta restore would save a lot of time when restoring a backup to refresh a standby server.

You may also like: