Repairing Corrupted Indexes with REINDEXing

by Christoph Schiessl on PostgreSQL

I recently published an article explaining how to alphabetically order by columns using a custom ENUM type. For illustration, I defined the type marital_status and the table people.

postgres=# CREATE TYPE marital_status_type
postgres-#   AS ENUM ('single', 'married', 'divorced', 'other');
CREATE TYPE
postgres=# CREATE TABLE people (
postgres(#   id SERIAL PRIMARY KEY,
postgres(#   marital_status marital_status_type NOT NULL
postgres(# );
CREATE TABLE
postgres=# CREATE INDEX index_people_on_marital_status
postgres-#   ON people (marital_status);
CREATE INDEX

Furthermore, I also defined an IMMUTABLE function to typecast the ENUM values to TEXT so that I could create an index for the alphabetical ordering.

postgres=# CREATE FUNCTION marital_status_as_text(marital_status marital_status_type)
postgres-#   RETURNS TEXT LANGUAGE SQL IMMUTABLE AS $sql$ SELECT CAST(marital_status AS TEXT); $sql$;
CREATE FUNCTION
postgres=# CREATE INDEX index_people_on_marital_status_text
postgres-#   ON people (marital_status_as_text(marital_status));
CREATE INDEX

This function was needed because the typecast itself is mutable! You can think about this typecast as a function whose output depends not only on its parameter but also on external factors. Another way to say this is that the typecast to TEXT is not referentially transparent precisely because of its dependency on factors other than its parameter. You may get different results if you call the function multiple times, even though you use the same parameters. In PostgreSQL terminology, referentially transparent functions are called IMMUTABLE; certain operations work only with such functions. The creation of INDEXes is a prime example of this restriction.

postgres=# INSERT INTO people (marital_status)
postgres-#   VALUES ('single'), ('married'), ('divorced'), ('other');
INSERT 0 4
postgres=# SET SESSION enable_seqscan = off;
SET
postgres=# -- Query using the index `index_people_on_marital_status` ...
postgres=# SELECT * FROM people WHERE marital_status = 'other'::marital_status_type;
 id | marital_status
----+----------------
  4 | other
(1 row)

postgres=# -- Query using the index `index_people_on_marital_status_text` ...
postgres=# SELECT * FROM people WHERE marital_status_as_text(marital_status) = 'other'::TEXT;
 id | marital_status
----+----------------
  4 | other
(1 row)

So, we told PostgreSQL the function marital_status_as_text is IMMUTABLE, which was needed so that the function could be used to define an INDEX. However, given that the function is just a wrapper around the typecast, which is not referentially transparent, it cannot be IMMUTABLE. So, essentially, we have lied to PostgreSQL, but I have not yet explained the external factors that influence the typecast. Well, the typecast returns the name of the ENUM value as it was defined when we created the custom type. This is a problem because these values can be renamed later on. For instance, we may rename the value other to unknown.

postgres=# ALTER TYPE marital_status_type
postgres-#   RENAME VALUE 'other' TO 'unknown';
ALTER TYPE

The problem arises because we already had data in the people table before the rename. Now, we have a funny situation — note that I'm setting enable_seqscan=off to force the Query Planner to use our indexes:

postgres=# SET SESSION enable_seqscan = off;
SET
postgres=# SELECT * FROM people WHERE marital_status = 'unknown'::marital_status_type;
 id | marital_status
----+----------------
  4 | unknown
(1 row)

postgres=# SELECT * FROM people WHERE marital_status_as_text(marital_status) = 'unknown'::TEXT;
 id | marital_status
----+----------------
(0 rows)

So, we get different results for essentially the same query. If the query uses the ENUM value directly, we get a single row as a result, but if we search based on the corresponding TEXT value, we get an empty result set. This is clearly not the expected behavior! But it gets even worse because depending on the planned query (with or without index usage), we can also get different results for one-to-one the same query.

postgres=# -- turn off sequential scans to force index scans ...
postgres=# SET SESSION enable_seqscan = off;
SET
postgres=# SET SESSION enable_indexscan = DEFAULT;
SET
postgres=# SET SESSION enable_bitmapscan = DEFAULT;
SET
postgres=# SELECT * FROM people WHERE marital_status_as_text(marital_status) = 'other'::TEXT;
 id | marital_status
----+----------------
  4 | unknown
(1 row)

postgres=# -- turn off index scans to force sequential scans ...
postgres=# SET SESSION enable_seqscan = DEFAULT;
SET
postgres=# SET SESSION enable_indexscan = off;
SET
postgres=# SET SESSION enable_bitmapscan = off;
SET
postgres=# SELECT * FROM people WHERE marital_status_as_text(marital_status) = 'other'::TEXT;
 id | marital_status
----+----------------
(0 rows)

Crazy stuff, right? The data tracked by the index is out of sync with the data in the table — the index is now said to be corrupted. This is 100% our fault because we incorrectly declared our function as IMMUTABLE, and PostgreSQL took us at our word. So, there is no way it could have known that renaming the ENUM value would affect the index. Anyway, you should never do something like this in the first place, but what should you do if you nonetheless find yourself in a situation like this? There's only one thing that you can do, really, ...

REINDEXing

Your only option is to rebuild the corrupted indexes. In this case, we have only one corrupted index, and we can start a rebuild with a simple REINDEX statement.

postgres=# REINDEX INDEX index_people_on_marital_status_text;
REINDEX

We can rerun the same experiment to confirm that the index is back in sync with the table's data.

postgres=# SET SESSION enable_seqscan = off;
SET
postgres=# SET SESSION enable_indexscan = DEFAULT;
SET
postgres=# SET SESSION enable_bitmapscan = DEFAULT;
SET
postgres=# SELECT * FROM people WHERE marital_status_as_text(marital_status) = 'other'::text;
 id | marital_status
----+----------------
(0 rows)

postgres=# SET SESSION enable_seqscan = DEFAULT;
SET
postgres=# SET SESSION enable_indexscan = off;
SET
postgres=# SET SESSION enable_bitmapscan = off;
SET
postgres=# SELECT * FROM people WHERE marital_status_as_text(marital_status) = 'other'::text;
 id | marital_status
----+----------------
(0 rows)

VoilĂ ! The problem is fixed, but not its root cause. If we now renamed another ENUM value, we would run into the same problem again. The lesson here is that you must be careful when defining custom functions. Namely, you can run into issues if you set your functions' volatility class to be stricter than they really are.

That's everything for today! Thank you for reading, and see you soon!

Ready to Learn More Web Development?

Join my Mailing List to receive one article per week.


I send one email per week on building performant and resilient Web Applications with Python, JavaScript and PostgreSQL. No spam. Unscubscribe at any time.

Continue Reading?

Here are a few more Articles for you ...


Custom ENUM Type Columns and ORDER BY

Learn how custom ENUM types in PostgreSQL can affect SELECT queries and ORDER BY clauses. Be mindful of indexing and typecasting.

By Christoph Schiessl on PostgreSQL

Force Index Usage by Manipulating the Query Planner

Learn how to manipulate PostgreSQL's query planner to force it to use your indexes while working on optimizing the performance of your queries.

By Christoph Schiessl on PostgreSQL

Caching Expensive Queries with MATERIALIZED VIEWs

Learn how to use PostgreSQL's MATERIALIZED VIEWs to improve performance of complex queries. Persist query results and refresh them manually or automatically.

By Christoph Schiessl on PostgreSQL

Christoph Schiessl

Hi, I'm Christoph Schiessl.

I help you build robust and fast web applications.


I'm available for hire as a freelance web developer, so you can take advantage of the more than a decade of experience I have collected working on many projects across several industries. Most of my clients are building web-based SaaS applications in a B2B context and depend on my expertise in various capacities.

More often than not, my involvement includes hands-on development work using technologies like Python, JavaScript, and PostgreSQL. Furthermore, if you already have an established team, I can support you as a technical product manager with a passion for simplifying complex processes. Lastly, I'm also an avid writer and educator who takes pride in breaking down technical concepts into the simplest possible terms.