Caching Expensive Queries with Materialized Views in PostgreSQL

by Christoph Schiessl on [PostgreSQL]

PostgreSQL 9.3 introduced a new feature referred to as materialized views. This article attempts to explain when to use it (based on a contrived example).

Let's assume you are working on a small database to keep track of your customers and the invoices you are sending to them. Currently, your database consists of the following three tables:

CREATE TABLE customers (
    id INT PRIMARY KEY,
    legal_name TEXT NOT NULL
);

CREATE TABLE invoices (
    id INT PRIMARY KEY,
    customer_id INT NOT NULL REFERENCES customers (id),
    issued_on DATE NOT NULL
);

CREATE TABLE invoice_items (
    id INT PRIMARY KEY,
    invoice_id INT NOT NULL REFERENCES invoices (id),
    amount_in_cents INTEGER NOT NULL
);

As you can see, a customer has many invoices, which in turn have many invoice items. With that in place, you now want a convenient way to calculate the total revenue for each of your customers. After thinking about the requirement for a second or two, you come up with the following SELECT query:

SELECT customers.*,
       SUM(invoice_items.amount_in_cents) AS revenue
    FROM customers
        INNER JOIN invoices ON customers.id = invoices.customer_id
        INNER JOIN invoice_items ON invoices.id = invoice_items.invoice_id
    GROUP BY customers.id ORDER BY revenue DESC;

While this query certainly works, it is pretty tedious to type repeatedly and may also be too slow for your needs. The easiest way to navigate around both of these problems is a materialized view:

CREATE MATERIALIZED VIEW customers_with_revenue AS
    SELECT customers.*,
           SUM(invoice_items.amount_in_cents) AS revenue
    FROM customers
        INNER JOIN invoices ON customers.id = invoices.customer_id
        INNER JOIN invoice_items ON invoices.id = invoice_items.invoice_id
    GROUP BY customers.id ORDER BY revenue DESC;

Firstly, it's very convenient to use, because it hides the complex query from you — just like an ordinary view would. Secondly, it's also improving performance because it's caching the query's result.

How does it work? Well, when you create the materialized view, it's actually computing the given SELECT statement immediately and persisting its result set. Later on, when you select something from the view, it's using that pre-computed result set, instead of computing the defining SELECT query again and again.

There's only one drawback: If the underlying data changes, you have to manually refresh the materialized view. This can be accomplished as follows:

REFRESH MATERIALIZED VIEW customers_with_revenue;

If your data doesn't change very frequently, you might want to consider using a TRIGGER to execute the refresh-statement automatically.