Waiting for PostgreSQL 11: Covering + unique indexes.
Indexes with INCLUDE columns and their support in B-tree
Last weekend, a very useful patch was committed, that enhances an already extensive Postgres indexing functionality.
commit 8224de4f42ccf98e08db07b43d52fed72f962ebb
Author: Teodor Sigaev <[email protected]>
Date: Sat Apr 7 23:00:39 2018 +0300
Indexes with INCLUDE columns and their support in B-tree
This patch introduces INCLUDE clause to index definition. This clause
specifies a list of columns which will be included as a non-key part in
the index. The INCLUDE columns exist solely to allow more queries to
benefit from index-only scans. Also, such columns don’t need to have
appropriate operator classes. Expressions are not supported as INCLUDE
columns since they cannot be used in index-only scans.
Index access methods supporting INCLUDE are indicated by amcaninclude flag
in IndexAmRoutine. For now, only B-tree indexes support INCLUDE clause.
In B-tree indexes INCLUDE columns are truncated from pivot index tuples
(tuples located in non-leaf pages and high keys). Therefore, B-tree indexes
now might have variable number of attributes. This patch also provides
generic facility to support that: pivot tuples contain number of their
attributes in t_tid.ip_posid. Free 13th bit of t_info is used for indicating
that. This facility will simplify further support of index suffix truncation.
The changes of above are backward-compatible, pg_upgrade doesn’t need special
handling of B-tree indexes for that.
Bump catalog version
Author: Anastasia Lubennikova with contribition by Alexander Korotkov and me
Reviewed by: Peter Geoghegan, Tomas Vondra, Antonin Houska, Jeff Janes,
David Rowley, Alexander Korotkov
Discussion: https://www.postgresql.org/message-id/flat/[email protected]
Commit message doesn’t quite explain why this feature is so amazing, so I decided to say a few words on this great patch.
The new patch allows to combine covering and unique functionality for btree indexes. What does it mean? It means, that it now became possible to create an index with key columns while adding further columns to it. All columns might be used as scan keys, but the unique constraint relates only to the key columns. Moreover, all columns can be returned from index by IndexOnlyScan.
Let’s consider a hypothetical situation with one table:
$ CREATE TABLE t (c1 INT, c2 INT, c3 INT, c4 INT);
$ INSERT INTO t (SELECT x, 2*x, 3*x, 4 FROM generate_series(1,10000000) AS x);
$ CREATE UNIQUE INDEX uniqueidx ON t USING btree (c1, c2);
Let’s assume we have two queries
- SELECT c1, c2 FROM t WHERE c1<10000;
- SELECT c1, c2 FROM t WHERE c1<10000 AND c3<20;
Everything is fine with the first query, it executes through IndexOnlyScan operation which is fast. The second query uses IndexScan operation, which is a bit slower since it has to read heap pages. If we want to make this query as fast as the first query, with IndexOnlyScan, we need to create covering index:
As a result, we have to create two separate indexes, which contain duplicated data and lead to increased size of the database. Another, less obvious drawback, is that the additional index incurs extra overhead over write operations such as INSERT/UPDATE/DELETEs – when you make changes to the table Postgres amends the table and its indexes.
With this patch you only need one index in the example above.
$ CREATE UNIQUE INDEX newidx ON t USING btree (c1, c2) INCLUDE (c3, c4);
That’s it, there is no need for additional covering index, and now IndexOnlyScan can be used on the queries above.
Big thanks to Anastasia Lubennikova for her great work and also to Alexander Korotkov, Teodor Sigaev, and all the reviewers involved.