News & Blog back

Subscribe

Automated index bloat management: How pg_index_watch keeps PostgreSQL indexes lean.

To follow up on my introductory blog post, here is a more extensive deep dive into ins and outs of pg_index_watch. In this post, I will take you through my thinking that led to the utility development and will explain how it works.

Pg_index_watch – a utility for prevention of bloat on frequently updated PostgreSQL tables.

Purpose

Uncontrollable index bloat on frequently updated tables is a known issue in PostgreSQL. The built-in autovacuum can not (in most cases) deal with index bloat regardless of its settings. pg_index_watch resolves this issue by automatically rebuilding indexes when needed. 

Concept

With the introduction of REINDEX CONCURRENTLY in PostgreSQL 12 there was finally a way to (almost)safely rebuild any indexes and as a result debloat them. Despite that, the question remained – based on which criterion do we determine that an index is bloated and whether there is a need to rebuild it?

There was a need for a simple statistical model that would allow us to assess the extent to which index is bloated without the need for a full index review. 

pg_index_watch offers following approach to this problem:

PostgreSQL allows you to access the following (and almost free of charge):

1) number of rows in the index (in pg_class.reltuples for the index) 

and 

2) index size.

Further on, assuming that the ratio of index size to the number of index entries in the index is constant (this is, in the majority of cases, a correct assumption), we can speculate that if, compared to its regular state, the ratio has doubled – it is most certain that the index is close to 100% bloat rate.

Now, we have a system similar to an autovacuum that automatically tracks level of index bloat and reindexes the indexes when they become bloated without any need for manual DBA work or additional analysis (except for the initial setup).

pg_index_watch utilises the ratio between index size and index tuples (that is kept up-to-date by autovacuum) to determine the extent of index bloat relative to the ideal situation when the index is not bloated.

Basic requirements for installation and usage

  1. PostgreSQL version 12.0 or higher
  2. Superuser access to the database and the ability to run cron with this access (psql remote access is sufficient), root or shell access to the database server isn’t required.
  3. Passwordless (or with help of ~/.pgpass) access to the database as superuser (i.e. you should be able to run psql -U postgres -d datname without entering the password).

Recommendations 

  1. If server resources allow, set up non-zero max_parallel_maintenance_workers to speed up the index rebuild.
  2. Significant wal_keep_segments (5000 is normally sufficient = 80GB) recommended unless the wal archive is in use in case of active streaming replication.

Installation

(as PostgreSQL user)

#get the git clone code

https://github.com/dataegret/pg_index_watch

cd pg_index_watch
#create tables’ structure (can be installed in any database not only Postgres, will run cluster-wide)
psql -1 -d postgres -f index_watch_tables.sql
#import the code (stored procedures)
psql -1 -d postgres -f index_watch_functions.sql
#initial launch

IMPORTANT: with the first launch ALL the indexes bigger than 10MB (default setting) will be rebuilt at once (but only once) in order to have up to date not bloated index size info.

! This process might take several hours (or even days) on the large databases, therefore I suggest performing the initial launch manually. 

nohup psql -d postgres -qt -c "CALL index_watch.periodic(TRUE);" >> index_watch.log

After that only new or bloated indexes will be processed.

Alternatively, index_watch can use current index sizes as baseline via index_watch.do_force_populate_index_stats()

! Before you go, here are a few steps that you need to set up following the installation:

Set up cron daily, for example at midnight (from superuser of the database = normally postgres) or hourly if there is a high number of writes to the database. 

! IMPORTANT: Ensure that the time doesn’t coincide with pg_dump and other long maintenance tasks.

00 * * * *   psql -d postgres -AtqXc "select not pg_is_in_recovery();" | grep -qx t || exit; psql -d postgres -qt -c "CALL index_watch.periodic(TRUE);"

That’s it!

Hope you will find it helpful. We had it installed for several of our clients and they been very happy with the way it works and makes their life easier

If you want to see what pg_index_watch been worked on historically after the installation you can use

psql -d postgres -c 'select * from index_watch.history limit 20'

As with any development project I have some future plans for this nifty utility like adding support for working with remote databases and DBAAS like Amazon RDS etc., but that’s something to be worked on.

For now, enjoy and let me know what you think!

You may also like: