I like to keep my Ruby on Rails projects’ development databases in sync with their production counterparts. Firstly, this allows me to reproduce almost all production-bugs locally. Secondly, having the complete production database available locally is very helpful for testing new migrations.
Synchronization of PostgreSQL databases usually consists of the following two steps:
pg_dump
and download the generated dump file to your local machine.pg_restore
.The above workflow works great in principal, but takes quite a bit of time to complete. Especially the second step, gets slower and slower as the size of your production DB increases.
Recently, the dump-size of one of my production DBs exceeded 350 MB and continues to grow.
$ du -sh latest.dump
357M latest.dump
For me, this means, that pg_restore
takes about eleven minutes on my machine.
$ time pg_restore --no-acl --no-owner \
-d core_development latest.dump
pg_restore --no-acl --no-owner \
-d core_development latest.dump \
18.16s user 13.99s system 4% cpu 11:00.24 total
As it turns out, we can get a speed-boost by parallelizing pg_restore
with its --jobs
option. Almost all modern computers have multiple cores, but pg_restore
doesn’t utilize these cores automatically. I was able to cut pg_restore
’s runtime in half with --jobs=4
on my MacBook Air.
$ time pg_restore --no-acl --no-owner --jobs=4 \
-d core_development latest.dump
pg_restore --no-acl --no-owner --jobs=4 \
-d core_development latest.dump \
21.80s user 16.34s system 11% cpu 5:37.57 total
When PostgreSQL executes a CREATE DATABASE
statement, it’s actually cloning a template database (i.e. copying the structure and all the data of the template DB). The default template is an empty database called template1
. However, it’s trivial to specify a different DB to be used as the template.
The point here is that cloning an existing database is an order of magnitude faster than restoring the same DB with pg_restore
.
$ time createdb core_development --template=core_production
createdb core_development --template=core_production \
0.00s user 0.00s system 0% cpu 18.697 total
Taking that into consideration, we can vastly improve the synchronization workflow:
pg_dump
and download the generated dump file to your local machine.pg_restore
.I’m executing steps (1) and (2) only occasionally, but I’m frequently executing step (3). With PostgreSQL’s database templates, the workflow is now optimized for the last step.
Even though, most developers don’t know about it, ActiveRecord has support for PostgreSQL’s template DBs built-in. To use the feature, you have to enable it in your database.yml
file:
development:
adapter: postgresql
database: core_development
# Name of the DB to use as the template:
template: core_production
You can still create your development DB as usual with rake
:
$ rake db:create
PostgreSQL is now using core_production
as the template. Consequently, the newly created core_development
DB, immediately contains the structure and data of your local core_production
database.
Notes
Software: PostgreSQL v9.3.5, Ruby on Rails 4.2.0