Please welcome Pg_index_watch – a utility for dealing with index bloat on frequently updated tables.
One of the big topics in Postgres is bloat. It is something that every DBA comes across and in fact we have a few good posts on how to work with bloat in our blog.
Update-heavy tables is a special case of data that due to business requirements needs to be updated frequently to stay relevant. Autovacuum may be a good solution for table bloat (you can check this post describing how to do that), however it does not help with index bloat.
Pg_index_watch resolves this issue by automatically rebuilding indexes when needed.
How it works?
With the introduction of REINDEX CONCURRENTLY in PostgreSQL 12 there was finally a safe way to rebuild indexes without heavy locks. At the same time, it was still unclear how to identify a criteria based on which we can decide whether the index is bloated (and should be rebuilt) or not.
What was missing is a simple and cheap to run statistical model that would allow us to estimate index bloat ratio without the requirement of reading and analyzing the whole index.
Now, PostgreSQL allows you to access the following:
1) number of rows in the index (in pg_class.reltuples for the index)
and
2) index size.
Assuming that the ratio of index size to the number of entries is almost always constant (this is correct in 99.9% of cases), we can speculate that if, compared to its ideal state, the ratio has doubled it is most certain that the index was bloated x2.
Based on this assumption we developed a system, similar to an AUTOVACUUM, that automatically tracks level of bloated indices and rebuilds them when bloat goes over the threshold.
Read more on GitHub and try it out!
Let me know what you think.
Maxim