Using Transaction Chaining to Reduce Server Round-Trips
by Christoph Schiessl on PostgreSQL
Implementing business applications with PostgreSQL, or really any relational database, often amounts to executing sequences of transactions. It doesn't matter if a given transaction ends with a COMMIT
or ROLLBACK
because, in both cases, the next transaction starts as soon as the previous one has finished. Furthermore, for most applications, the previous and the next transactions share similar characteristics, such as their ISOLATION LEVEL
. Essentially, what you end up with is a pattern that looks as follows:
START TRANSACTION;
-- workload of 1st transaction
COMMIT;
START TRANSACTION;
-- workload of 2nd transaction
ROLLBACK;
START TRANSACTION;
-- workload of 3rd transaction
COMMIT;
In the SQL script above, you see three subsequent transactions and a total of six statements (two per transaction), but you can imagine much longer transaction sequences. Here is the thing: Each statement requires a separate server round-trip to execute. There is no way around this, and even the fact that three example transactions are empty — don't contain any statements — doesn't make a difference.
Monitoring Connection States
Rather than take my word for it, we can quite easily prove that the server round-trips are actually happening. To make this experiment, we need two separate connections to the same PostgreSQL server (e.g., two psql
sessions). The first connection uses the postgres
database, and the second uses the demo
database.
Alright, now we can use pg_stat_activity
, which is a built-in VIEW
that belongs to the pg_catalog
schema, on the first connection to ask PostgreSQL about the state of the second connection that is using the demo
database:
postgres=# SELECT state FROM pg_stat_activity WHERE datname='demo';
state
-------
idle
(1 row)
As you can see, the connection is currently idle
because it's not doing anything. However, if we start a transaction on the second connection, we can see that the connection's state immediately changes to idle in transaction
.
demo=# START TRANSACTION;
START TRANSACTION
postgres=# SELECT state FROM pg_stat_activity WHERE datname='demo';
state
---------------------
idle in transaction
(1 row)
This proves that the START TRANSACTION
statement caused a round-trip with the server because if it had not, the first connection would not have been able to see a state change of the second connection. The story is the same if we finish the transaction with a COMMIT
or ROLLBACK
, in which case the connection immediately reverts to the idle
state:
demo=*# COMMIT;
COMMIT
postgres=# SELECT state FROM pg_stat_activity WHERE datname='demo';
state
-------
idle
(1 row)
Overhead from Round-Trips
So, long story short, the starting and finishing of transactions can pose a significant overhead. Specifically, if any of the following conditions are met:
- The round-trip time to the server is slow. Usually, this is true if the distance in terms of networking between client and server is great.
- There are many transactions with short average runtimes. That is because the overhead as a percentage of the total runtime is higher for shorter transactions.
So, what can we do to reduce the overhead and improve performance?
Transaction Chaining
The SQL standard has a built-in solution that is implemented by PostgreSQL: the AND CHAIN
parameter. This parameter is available for the COMMIT
and the ROLLBACK
statements and has the following effect ...
If the AND CHAIN
parameter is provided, then the current transaction is committed (or rolled back), and additionally, a follow-up transaction with the same characteristics (e.g., ISOLATION LEVEL
) is started immediately.
Therefore, if we apply this to the original example, we can reduce the number of server round-trips by basically 50% (from n
to n/2+1
).
START TRANSACTION;
-- workload of 1st transaction
COMMIT AND CHAIN;
-- workload of 2nd transaction
ROLLBACK AND CHAIN;
-- workload of 3rd transaction
COMMIT;
We can run the same experiment to prove that this works as expected. I'm again using two connections, one using the postgres
database and the other using the demo
database. Initially, the demo
connection was idle, but as soon as we started a new transaction, its state changed to idle in transaction
.
demo=# START TRANSACTION;
START TRANSACTION
postgres=# SELECT state FROM pg_stat_activity WHERE datname='demo';
state
---------------------
idle in transaction
(1 row)
So, what happens if the demo
connection executes a COMMIT AND CHAIN
statement? As I said, PostgreSQL immediately starts a follow-up transaction, so we don't see a state change.
demo=# COMMIT AND CHAIN;
COMMIT AND CHAIN
postgres=# SELECT state FROM pg_stat_activity WHERE datname='demo';
state
---------------------
idle in transaction
(1 row)
When we do ROLLBACK AND CHAIN
, we get precisely the same behavior — no apparent state change.
demo=# ROLLBACK AND CHAIN;
ROLLBACK AND CHAIN
postgres=# SELECT state FROM pg_stat_activity WHERE datname='demo';
state
---------------------
idle in transaction
(1 row)
Finally, when we issue a normal COMMIT
or ROLLBACK
, the state changes back to the initial idle
state.
demo=# COMMIT;
COMMIT
postgres=# SELECT state FROM pg_stat_activity WHERE datname='demo';
state
-------
idle
(1 row)
I hope you found this article interesting and learned something new. Thank you very much for reading, and see you soon!