PostgreSQL hot standby feedback. How it works.

From time to time, people come to us with a problem, “Our queries cancelled on standby servers, and we enabled hot_standby_feedback, but queries on standby get cancelled. Why?”. Indeed, after reading the documentation, it may seem that the feedback solves query cancelling problems, and if it is enabled the conflicts will disappear. However, this is not true. In this article we take a look at the feedback, how it works and which problems it is intended to solve.
To begin with let us turn to the official documentation:
hot_standby_feedback – Specifies whether or not a hot standby will send feedback to the primary or upstream standby about queries currently executing on the standby. This parameter can be used to eliminate query cancels caused by cleanup records, but can cause database bloat on the primary for some workloads. Feedback messages will not be sent more frequently than once per wal_receiver_status_interval. The default value is off. This parameter can only be set in the postgresql.conf file or on the server command line.

As seen from the description, standby server sends information about queries which are currently running on it. What kind of information? By default the feedback is disabled, in which case wal receiver process sends only information about which parts of the transaction log are written, flushed and replayed (see pg_stat_replication view for details). To send feedback, enable the option and reload postgres.
Now we take a look into postgresql source, particularly on the code of the wal receiver process. This process runs on standby server and receives WALs from master server. We search src/backend/replication/walreceiver.c for the function XLogWalRcvSendHSFeedback. WAL receiver process use this function for sending feedback. It sends feedback only when following conditions are met:

wal_receiver_status_interval parameter is greater than zero;
hot_standby_feedback parameter enabled.

Feedback send interval specified in wal_receiver_status_interval. Before sending feedback the oldest xmin is to be determined: it is the ID of the oldest transaction which executes longer than others transactions (see GetOldestXmin function in src/backend/storage/ipc/procarray.c for details). Note, while searching oldest xmin, PostgreSQL checks all currently running transactions and all logical replication slots. Slots have their own values of xmin and catalog_xmin. Xmin is the oldest transaction ID which retention for this database is needed by this slot. Catalog_xmin is the oldest transaction which affects the system catalogs that should retain in the database for this slot. VACUUM cannot remove tuples deleted by any later transaction.

When the id of the oldest transaction is determined, the vacuum_defer_cleanup_age is subtracted from this value. vacuum_defer_cleanup_age parameter is a means to postpone the VACUUM, it is a parameter in the earlier versions of postgresql that is now replaced by feedback:
vacuum_defer_cleanup_age – Specifies the number of transactions by which VACUUM and HOT updates will defer cleanup of dead row versions. The default is zero transactions, meaning that dead row versions can be removed as soon as possible, that is, as soon as they are no longer visible to any open transaction. You may wish to set this to a non-zero value on a primary server that is supporting hot standby servers, as described in Section 25.5. This allows more time for queries on the standby to complete without incurring conflicts due to early cleanup of rows. However, since the value is measured in terms of number of write transactions occurring on the primary server, it is difficult to predict just how much additional grace time will be made available to standby queries. This parameter can only be set in the postgresql.conf file or on the server command line.

Thus the oldest xmin is equal to the oldest transaction id running on the standby minus vacuum_defer_cleanup_age, or the oldest transaction id that is needed for logical replication slot minus vacuum_defer_cleanup_age.

But that’s not all, now we need to take into account the epoch, because it may happen that the epoch has changed, and transactions counter was restarted. To do this, we take the value of the next transaction and compare it to the oldest xmin. If the oldest xmin is greater than the next transaction, then the epoch value is decreased.

Now we have to construct our feedback message and send it to the master. To do this, initialize a blank message, add the label which says that the this message is feedback, then add current timestamp, oldest xmin and epoch. Now send the message to the master, using walrcv_send (actually libpqrcv_send) function. At this point, the XLogWalRcvSendHSFeedback is done.

Next, the message will be accepted by wal sender process that is running on the master or an upstream server. Oldest xmin value will be recorded in the WAL sender’s internal structure MyPgXact->xmin. Every postgres backend has it’s own structure. In the future, when the vacuum occurs, and before the vacuuming relation, the oldest transaction will also be determined (with the function GetOldestXmin that we already know) and the WAL senders’s MyPgXact->xmin will be also taken into account. The calculated oldest xmin will be used to check tuple visibility in any open transactions. If the tuple is still seeing at least one transaction, it cannot be vacuumed.

Thus wal sender process acts as an intermediary for standby and if the vacuum has to process tuples which are visible on standby transactions, it will be forced to skip these tuples.

Summary. The parameter hot_standby_feedback does nothing with conflict resolution of streaming replication and its only purpose is to avoid queries cancellation on standby in the case the vacuum on master server cleans something up.

Thanks!

Comments: 8

8 responses to “PostgreSQL hot standby feedback. How it works.”

dif says:

September 23, 2015 at 12:03 am

> "Our queries cancelled on standby servers, and we enabled hot_standby_feedback, but queries on standby get cancelled. Why?"

Ok, what should i do to not get canceled?

Reply
dif says:

September 23, 2015 at 12:37 am

This comment has been removed by the author.

Reply
Alexey Lesovsky says:

October 5, 2015 at 7:11 am

Sorry for late answer, see max_standby_streaming_delay.

Reply
ioguix says:

October 13, 2015 at 10:02 pm

…or if your concurrency and workload allows it, put your replication on hold with pg_xlog_replay_pause(), send your query, then resume your replication with pg_xlog_replay_resume().

Reply
Ilya Kosmodemiansky says:

October 15, 2015 at 11:02 am

The key word is "if". The first issue in such case will be autovacuum performance on master if there is any heavy write workload

Reply
Anonymous says:

May 30, 2023 at 5:41 am

what is the role of hot standby feedback in logical replication

Reply
raj says:

May 30, 2023 at 5:42 am

role of hot_standby_feedback parameter in case of logical replication

Reply
- Sergei Kornilov says:
  
  June 4, 2023 at 8:22 pm
  
  Up to 16th version of PostgreSQL hot_standby_feedback doesn’t play any role in logical replication. Logical decoding can be started only on primary, and PostgreSQL tracks the required oldest xmin for logical decoding internally (pg_replication_slots.catalog_xmin) regardless whether or not hot_standby_feedback is activated on the logical replication side. I took a quick look at the code: hot_standby_feedback is actually only used in the wal receiver code, not in the logical replication worker. That is, it only makes sense for streaming (physical) replication.
  
  Starting Postgres 16 (release is planned for sometime in the fall) there will be a nuance here. The process is the same if we set up logical replication from primary to another primary, however, it will be possible to start logical decoding from a physical replica: primary -> streaming replication -> standby, and from that standby to launch logical replication to a different Postgresql instance. In this case, that standby should have hot_standby_feedback turned on to work normally.
  
  Hope this helps,
  Sergei
  
  Reply

Package	Price*
PostgreSQL configuration analysis	from 500 €
Hardware and OS configuration analysis	from 600 €
Slow SQL queries analysis	from 500 €
Comprehensive performance Health Check*	from 3500 €
* For single PostgreSQL installation (1 master + 1 replica)

Basic	Premium	VIP
Incl. up to 5h/month DBA time	Incl. up to 15h/month DBA time	Incl. up to 30h/month DBA time
Slack, Mattermost, Telegram	Slack, Mattermost, Telegram	Slack, Mattermost, Telegram
Response time — 1 Hour	Response time — 1 Hour	Response time — 30 minutes
Up to 2 PostgreSQL instances	Up to 5 PostgreSQL instances	Up to 10 PostgreSQL instances

News & Blog back