Project Inquiry
Published on

Smarter Synchronization of your PostgreSQL Development and Production DBs

I like to keep my Ruby on Rails projects’ development databases in sync with their production counterparts. Firstly, this allows me to reproduce almost all production-bugs locally. Secondly, having the complete production database available locally is very helpful for testing new migrations.

Synchronization of PostgreSQL databases usually consists of the following two steps:

  1. Create a dump of the production DB with pg_dump and download the generated dump file to your local machine.
  2. Use the dump file to restore your development DB with pg_restore.

The above workflow works great in principal, but takes quite a bit of time to complete. Especially the second step, gets slower and slower as the size of your production DB increases.

Recently, the dump-size of one of my production DBs exceeded 350 MB and continues to grow.

$ du -sh latest.dump
357M	latest.dump

For me, this means, that pg_restore takes about eleven minutes on my machine.

$ time pg_restore --no-acl --no-owner \
       -d core_development latest.dump
pg_restore --no-acl --no-owner \
           -d core_development latest.dump \
18.16s user 13.99s system 4% cpu 11:00.24 total

As it turns out, we can get a speed-boost by parallelizing pg_restore with its --jobs option. Almost all modern computers have multiple cores, but pg_restore doesn’t utilize these cores automatically. I was able to cut pg_restore’s runtime in half with --jobs=4 on my MacBook Air.

$ time pg_restore --no-acl --no-owner --jobs=4 \
                  -d core_development latest.dump
pg_restore --no-acl --no-owner --jobs=4 \
           -d core_development latest.dump \
21.80s user 16.34s system 11% cpu 5:37.57 total

Introducing PostgreSQL’s Template DBs

When PostgreSQL executes a CREATE DATABASE statement, it’s actually cloning a template database (i.e. copying the structure and all the data of the template DB). The default template is an empty database called template1. However, it’s trivial to specify a different DB to be used as the template.

The point here is that cloning an existing database is an order of magnitude faster than restoring the same DB with pg_restore.

$ time createdb core_development --template=core_production
createdb core_development --template=core_production \
  0.00s user 0.00s system 0% cpu 18.697 total

Taking that into consideration, we can vastly improve the synchronization workflow:

  1. Create a dump of the production DB with pg_dump and download the generated dump file to your local machine.
  2. Use the dump file to restore a copy of the production DB (not the development DB) on your local machine with pg_restore.
  3. Recreate your development DB using the restored production DB from step (2) as the template.

I’m executing steps (1) and (2) only occasionally, but I’m frequently executing step (3). With PostgreSQL’s database templates, the workflow is now optimized for the last step.

Template DB Support in Ruby on Rails

Even though, most developers don’t know about it, ActiveRecord has support for PostgreSQL’s template DBs built-in. To use the feature, you have to enable it in your database.yml file:

development:
  adapter: postgresql
  database: core_development
  # Name of the DB to use as the template:
  template: core_production

You can still create your development DB as usual with rake:

$ rake db:create

PostgreSQL is now using core_production as the template. Consequently, the newly created core_development DB, immediately contains the structure and data of your local core_production database.


Notes
Software: PostgreSQL v9.3.5, Ruby on Rails 4.2.0