Force Index Usage by Manipulating the Query Planner
by Christoph Schiessl on PostgreSQL
PostgreSQL plans queries to minimize their cost, which is proportional, but not equivalent, to the queries' execution time. The cost is an estimate and depends on many factors, such as the number of rows the query is predicted to return. Usually, this is no problem, but this behavior can be irritating during development. Specifically, it can be confusing if PostgreSQL chooses to use sequential scans instead of the indexes you have just created to optimize the performance of some query. In this article, I will show you how you can manipulate PostgreSQL's query planner to force it to use your indexes, even though this may not be the best strategy for your query.
postgres=# CREATE TABLE people (
postgres(# id SERIAL PRIMARY KEY,
postgres(# marital_status TEXT NOT NULL DEFAULT 'single'
postgres(# );
CREATE TABLE
postgres=# \d people
Table "public.people"
Column | Type | Collation | Nullable | Default
----------------+---------+-----------+----------+------------------------------------
id | integer | | not null | nextval('people_id_seq'::regclass)
marital_status | text | | not null | 'single'::text
Indexes:
"people_pkey" PRIMARY KEY, btree (id)
postgres=# INSERT INTO people (marital_status) VALUES ('single'), ('married'), ('divorced'), ('other');
INSERT 0 4
postgres=# EXPLAIN SELECT * FROM people ORDER BY marital_status;
QUERY PLAN
-----------------------------------------------------------------
Sort (cost=88.17..91.35 rows=1270 width=36)
Sort Key: marital_status
-> Seq Scan on people (cost=0.00..22.70 rows=1270 width=36)
(3 rows)
For this particular table and query, you can see in the EXPLAIN
statement output that PostgreSQL planned the query with a sequential scan. This is the best possible execution plan because no index applies to this SELECT
query. In other words, a sequential scan is the only possibility for this query. Now, if we create an index for the column marital_status
and EXPLAIN
the query again, then you'll see that PostgreSQL is still planning with a sequential scan.
postgres=# CREATE INDEX index_people_on_marital_status
postgres-# ON people (marital_status);
CREATE INDEX
postgres=# EXPLAIN SELECT * FROM people ORDER BY marital_status;
QUERY PLAN
-------------------------------------------------------------
Sort (cost=1.08..1.09 rows=4 width=36)
Sort Key: marital_status
-> Seq Scan on people (cost=0.00..1.04 rows=4 width=36)
(3 rows)
This is unexpected because the new index does apply to the SELECT
query. The reason for this is simple: PostgreSQL has evaluated all possible query plans and estimated that a sequential scan still has the lowest cost (i.e., the cost of the query with index scan would be higher).
There are two ways to get PostgreSQL to use the new index. Firstly, we can artificially force it to do so by setting the option enable_seqscan
to off
. The official documentation explains the option as follows:
Enables or disables the query planner's use of sequential scan plan types. It is impossible to suppress sequential scans entirely, but turning this variable off discourages the planner from using one if there are other methods available. The default is
on
.
You cannot completely turn off sequential scans (e.g., if a sequential scan is the only possibility), but you can encourage the query planner to avoid sequential scans as much as possible. Here, we use a simple SET
statement to set this option for the remainder of our SQL session:
postgres=# SET SESSION enable_seqscan = off;
SET
If we now EXPLAIN
the SELECT
query again, we get a query plan with an index scan!
postgres=# EXPLAIN SELECT * FROM people ORDER BY marital_status;
QUERY PLAN
-----------------------------------------------------------------------------------------------
Index Scan using index_people_on_marital_status on people (cost=0.13..12.19 rows=4 width=36)
(1 row)
Adding more data is the second and more natural way to encourage index scans. Assuming that the index applies to your query in the first place, generally speaking, the cost of sequential scans increases faster than the cost of index scans for an increasing number of rows. This means that at some point if you add more and more data, you eventually breach the threshold, after which the cost of sequential scans is higher than the cost of index scans. Then, given that PostgreSQL always uses the query plan with the lowest cost, it must pick the plan with an index scan.
postgres=# SET SESSION enable_seqscan = DEFAULT; -- re-enable default behavior of query planner
SET
postgres=# INSERT INTO people (marital_status)
postgres-# SELECT (ARRAY['single', 'married', 'divorced', 'other'])[n % 4 + 1]
postgres-# FROM GENERATE_SERIES(1, 10000) AS n;
INSERT 0 10000
postgres=# EXPLAIN SELECT * FROM people ORDER BY marital_status;
QUERY PLAN
----------------------------------------------------------------------------------------------------
Index Scan using index_people_on_marital_status on people (cost=0.29..410.35 rows=10004 width=36)
(1 row)
In this last example, I reset the option enable_seqscan
to its default value (i.e., on
) and inserted 10000 rows into the table. This number of rows is sufficient to make sequential scans more expensive than index scans, so we get a query plan with an index scan. That's everything for today. Thank you for reading, and see you next time!