Caching Expensive Queries with MATERIALIZED VIEW
s
by Christoph Schiessl on PostgreSQL
PostgreSQL 9.3 introduced a new feature called MATERIALIZED VIEW
s, and this article explains how to use it based on a contrived example. Let's assume you are working on a database to keep track of your customers and the invoices you send them. Currently, your database consists of three tables:
postgres=# CREATE TABLE customers (
postgres(# id INT PRIMARY KEY,
postgres(# legal_name TEXT NOT NULL
postgres(# );
CREATE TABLE
postgres=# CREATE TABLE invoices (
postgres(# id INT PRIMARY KEY,
postgres(# customer_id INT NOT NULL REFERENCES customers (id),
postgres(# issued_on DATE NOT NULL
postgres(# );
CREATE TABLE
postgres=# CREATE TABLE invoice_items (
postgres(# id INT PRIMARY KEY,
postgres(# invoice_id INT NOT NULL REFERENCES invoices (id),
postgres(# amount_in_cents INTEGER NOT NULL
postgres(# );
CREATE TABLE
As you can see, a customer has many invoices, which in turn have many invoice items. With that in place, you now want a convenient way to calculate your customers' total revenue. After thinking about the requirement for a second or two, you come up with the following SELECT
statement:
postgres=# SELECT customers.*,
postgres-# SUM(invoice_items.amount_in_cents) AS revenue
postgres-# FROM customers
postgres-# INNER JOIN invoices ON customers.id = invoices.customer_id
postgres-# INNER JOIN invoice_items ON invoices.id = invoice_items.invoice_id
postgres-# GROUP BY customers.id ORDER BY revenue DESC;
id | legal_name | revenue
----+------------+---------
(0 rows)
While this query certainly works, it is error-prone to type repeatedly and may also be too slow for your needs. A MATERIALIZED VIEW
is the easiest way to circumnavigate both problems.
postgres=# CREATE MATERIALIZED VIEW customers_with_revenue AS
postgres-# SELECT customers.*,
postgres-# SUM(invoice_items.amount_in_cents) AS revenue
postgres-# FROM customers
postgres-# INNER JOIN invoices ON customers.id = invoices.customer_id
postgres-# INNER JOIN invoice_items ON invoices.id = invoice_items.invoice_id
postgres-# GROUP BY customers.id ORDER BY revenue DESC;
SELECT 0
Firstly, it's very convenient because it hides complex queries from you — just like an ordinary VIEW
would. Secondly, it also improves performance because it caches the query's result.
How does it work? When you create the MATERIALIZED VIEW
, it immediately executes its defining SELECT
statement and persists its result set. Later, when you select something from the MATERIALIZED VIEW
, it uses that pre-computed result set instead of computing the defining SELECT
query again and again.
There's only one drawback: If the underlying data changes, you must manually refresh the MATERIALIZED VIEW
. This can be accomplished as follows ...
postgres=# REFRESH MATERIALIZED VIEW customers_with_revenue;
REFRESH MATERIALIZED VIEW
If your data doesn't change frequently, you might consider using a TRIGGER
to execute the REFRESH MATERIALIZED VIEW
statement automatically.