pg_utils: Low_used_indexes.sql, how it works and why it should be lower than 0.01?
If you work with PostgreSQL the chances that you are aware or used our pg_utils
are quite high. One of them is low_used_indexes.sql
that lately we received a few questions about, so I thought it might be a good idea to write a short blog post about it.
As you might gather from its name, low_used_indexes.sql
goal is to identify indexes that are rarely used. “Rarely” is the key here, since those indexes that are never used can be discarded and those that are used frequently should definitely be kept. With low_used_indexes.sql
we are searching for indexes that need closer investigation since while taking up the precious disk space their usage also can’t be justified from the business logic perspective.
How is this “rare usage” calculated? Let’s take a look at this example:
SELECT pg_stat_user_indexes.schemaname||'.'||pg_stat_user_indexes.relname, indexrelname, pg_stat_user_indexes.idx_scan, (coalesce(n_tup_ins,0)+coalesce(n_tup_upd,0)-coalesce(n_tup_hot_upd,0)+coalesce(n_tup_del,0)) as write_activity, pg_stat_user_tables.seq_scan, pg_stat_user_tables.n_live_tup, pg_size_pretty(pg_relation_size(pg_index.indexrelid::regclass)) as size from pg_stat_user_indexes join pg_stat_user_tables on pg_stat_user_indexes.relid=pg_stat_user_tables.relid join pg_index ON pg_index.indexrelid=pg_stat_user_indexes.indexrelid where pg_index.indisunique is false and pg_stat_user_indexes.idx_scan::float/(coalesce(n_tup_ins,0)+coalesce(n_tup_upd,0)-coalesce(n_tup_hot_upd,0)+coalesce(n_tup_del,0)+1)::float<0.01 and (coalesce(n_tup_ins,0)+coalesce(n_tup_upd,0)-coalesce(n_tup_hot_upd,0)+coalesce(n_tup_del,0))>10000 order by 4 desc,1,2
At first glance, it’s not clear why pg_stat_user_indexes.idx_scan
is divided by write_activity
and why it should be lower than 0.01? Afterall, we could choose indexes withidx_scan
= 0?
First, unfortunately, among indexes that are seldomly used idx_scan=0
will miss those that are used once in twenty-four hours or once a week or once a month.
Second, if there are not many writes in the table such rarely used indexes do not pose any threat, that is aside the fact that they still use your disk space. However, if you have rarely used indexes AND your table has many writes you will sooner or later be facing quite severe consequences in terms of disk space. So, in the latter case it is better to delete it and perform a regular sequential scan
, especially if pg_stat_user_tables.seq_scan
isn’t equal to 0.
<0.01 is a somewhat rule-of-thumb number that is mainly based on our own experience working with a variety of clients.
Normally indexes that fall into this category are worthwhile reviewing and based on what business logic stands behind them, in what queries they are used and based on that consider if there won’t be any faults in the process if you leave them as they are.
Usually, the smaller the table the less risky is to remove one of those rarely used indexes, but one should consider any deletion of that level carefully.
Hope this is useful and if you would like us to post more information like this about the rest of our utils let us know in the comments below.