Article
A query about how to fork PostgreSQL databases like you can on Heroku was sent to me. Since I could not find any good examples of how to do this, I made the decision to write a brief article.
Making a copy of a database so you can make changes to it without affecting the original source database is known as “forking” a database. In this case, the easiest thing to do is make a copy of the database, which is also called a backup, and start a new database instance on it. I won’t linger on this point because there are enough articles and documents that cover backup procedures. The obvious solution has a performance problem because copying a lot of data takes a long time. We can’t afford to have to wait several hours every time we use database forks to update our staging or testing environments (depending on whether or not they need to be kept secret).
We must find a way to copy everything without actually making copies of it. The good news is that you can accomplish this with your storage layer’s help. The technical term for what we’re after is a copy-on-write snapshot. Specialized storage units and the Linux LVM layer, Btrfs, and ZFS filesystems all do this in software.
Btrfs is the easiest to set up for this demonstration, and it’s also what I just so happened to have on hand. I have a 99% filled up RAID1 Btrfs filesystem mounted at /mnt/data
, backed by three Western Digital Green spinning disks (<6000RPM). To put it another way, a storage system as slow as you can make it.
I first created a database on which to run our tests. I make a fresh PostgreSQL database, start it up on port 6000, and create a fresh subvolume for the master database.
/mnt/data/dbforking$ btrfs subvolume create master Create subvolume './master /mnt/data/dbforking$ initdb master/ ... snipped /mnt/data/dbforking $ sed -i 's/# port =.*/port = 6000/' master/postgresql.conf /mnt/data/dbforking $ pg_ctl -D master/ -l master/postgres.log start server starting /mnt/data/dbforking $ createdb -p 6000 /mnt/data/dbforking $ psql -p 6000 -c "Select "Hello, World"?" column? ------------- hello world (1 row)
Let’s create some fake data on the master server now that it is operational. For this, I’ll use pgBench.
/mnt/data/dbforking$ time pgbench -p 6000 -i -s 1000 ... few minutes of progress reports go by 100000000 of 100000000 tuples (100%) done (elapsed 173.58 s, remaining 0.00 s). vacuum... set primary keys... done. real 8m35.011s user 0m21.746s sys 0m0.739s /mnt/data/dbforking /mnt/data/dbforking $ du -sh master 15G master/
I currently have a master database that contains 15 GB of data. Let’s update some master data as well so we can keep track of our forks. We will set thefiller
column on a row in pgbench_branches
table to do this.
/mnt/data/dbforking$ psql -p 6000 -c "UPDATE pgbench_branches" Set filler = "master before forking" WHERE bid = 1" UPDATE 1 /mnt/data/dbforking$ psql -p 6000 -c " SELECT * FROM pgbench_branches WHERE bid = "1" bid | bbalance | filler -----+----------+------------------------------------------------------------------------------------------ 1 | 0 | master before forking (1 row)
Normal PostgreSQL hot backup procedure is to start your backup with pg_start_backup()
, copy database contents over, end backup with pg_stop_backup()
and then copy xlogs over. However, Btrfs supports atomic snapshots. With atomic snapshots, we can quickly create a snapshot with the exact same contents as a regular copy that would have been made if PostgreSQL had been SIGKILLed at that time. PostgreSQL’s mechanisms for durability make sure that we will get a consistent state that only shows the successful commits up to that point. In our situation, the backup management commands are not necessary.
To add some spice, I will launch a workload on the master in a separate shell to show that we can do this on a production database with ease. It should be possible to maintain a steady load of 50 transactions per second with 16 clients. 50 transactions per second is slow, but it’s a big challenge for the slow storage system we’re using here.
/mnt/data/dbforking$ pgbench -p 6000 --rate=50 --client=16 --progress=5 --time=1200 starting vacuum... end. progress: 5.0 s, 44.6 tps, lat 129.484 ms, stddev 30.302, lag 0.304 ms.
Let’s fork the database now that we are prepared to do so:
/mnt/data/dbforking$ time btrfs subvolume snapshot master fork1 Create a snapshot of "master" in "./fork1". real 0m6.295s user 0m0.000s sys 0m0.202s /mnt/data/dbforking $ du -sh fork1/ 15G fork1/
And these are the performance metrics from that period:
progress: 30.0 s, 54.5 tps, lat 134.819 ms stddev 34.012, lag 0.500 ms progress: 35.1 s, 44.4 tps, lat 199.910 ms stddev 75.235, lag 3.243 ms progress: 40.1 s, 40.5 tps, lat 1281.642 ms stddev 791.303, lag 970.009 ms progress: 45.0 s, 62.1 tps, lat 349.229 ms stddev 313.576, lag 145.631 ms progress: 50.0 s, 50.0 tps, lat 146.155 ms stddev 45.599, lag 2.543 ms progress: 55.1 s, 53.0 tps, lat 146.554 ms stddev 40.694, lag 0.562 ms
So, we were able to split a 15 GB database in 6 seconds with only a small hiccup in performance. The forked database can now be launched.
We must do a few things before the fork can begin. The master’s pid file must first be deleted. PostgreSQL can usually remove it by itself, but since the master is still running on the same machine, it doesn’t know if it’s safe to do so in this case, so we have to do it ourselves. Second, the fork needs to be set up to use a different port from the master. When this is finished, we can restart the server and see that it recovers successfully from a crash.
/mnt/data/dbforking$ rm fork1/postmaster.pid /mnt/data/dbforking $ sed -i "s/port =. */port = 6001/" fork1/postgresql.conf /mnt/data/dbforking /mnt/data/dbforking $ pg_ctl -D fork1 -l fork1/postgres.log start /mnt/data/dbforking $ tail fork1/postgres.log LOG: The database system was interrupted; it was last known to be up at 2015-01-09 14:47:29 EET. LOG: database system was not properly shut down; automatic recovery in progress LOG: Undo starts at 0/7086E90 LOG: record with zero length at 0/8BDD888 LOG: redo done at 0/8BDD858 LOG: last completed transaction was at log time 2015-01-09 14:51:44.56637+02 Log: The database system is ready to accept connections. LOG: autovacuum launcher started
Next, we can confirm that we are actually using two different databases. Change the tracking row to ensure that it is distinct in both the fork database and the master database.
Let’s update the database that was forked:
/mnt/data/dbforking$ psql -p 6001 -c "UPDATE pgbench_branches" SET filler = "fork1 after forking" WHERE bid = 1" UPDATE 1 /mnt/data/dbforking$ psql -p 6001 -c " SELECT * FROM pgbench_branches WHERE bid = "1" bid | bbalance | filler -----+----------+------------------------------------------------------------------------------------------ 1 | 1025 | fork1 after forking (1 row)
And check that the master database still has the old data (except for thebbalance
column that pgbench has updated):
/mnt/data/dbforking$ psql -p 6000 -c "SELECT * FROM pgbench_branches" WHERE bid = "1" bid | bbalance | filler -----+----------+------------------------------------------------------------------------------------------ 1 | -33546 | master before forking (1 row)
With this, we have a plan for how to fork a production database without any major problems. On the same machine as the master, you probably don’t want to run your staging tests or your development environment. To copy new databases from the streaming standby and put them on different machines, you would need a streaming replication standby running in your staging or development environment. Just make sure to replace or delete the recovery.conf file. If you don’t, your snapshot will keep running in standby mode.
Additionally, it’s a good idea to move the fork to a new timeline so PostgreSQL is aware of the fork and can alert you to mistakes like having the fork replicate data from master (or vice versa) if necessary. To do this create a recovery.conf that contains the single line restore_command = '/bin/false'
. This will switch PostgreSQL to point-in-time recovery mode (as opposed to regular crash recovery), creating a timeline switch at the end of the transaction log./bin/false
Is there any sign that there is no archive where more transaction logs can be found?
About Enteros
Enteros offers a patented database performance management SaaS platform. It finds the root causes of complex database scalability and performance problems that affect business across a growing number of cloud, RDBMS, NoSQL, and machine learning database platforms.
The views expressed on this blog are those of the author and do not necessarily reflect the opinions of Enteros Inc. This blog may contain links to the content of third-party sites. By providing such links, Enteros Inc. does not adopt, guarantee, approve, or endorse the information, views, or products available on such sites.
Are you interested in writing for Enteros’ Blog? Please send us a pitch!
RELATED POSTS
Enhancing Identity and Access Management in Healthcare with Enteros
- 19 November 2024
- Database Performance Management
In the fast-evolving world of finance, where banking and insurance sectors rely on massive data streams for real-time decisions, efficient anomaly man…
Maximizing Efficiency with Enteros: Revolutionizing Cost Allocation Through a Cloud Center of Excellence
In the fast-evolving world of finance, where banking and insurance sectors rely on massive data streams for real-time decisions, efficient anomaly man…
Driving Efficiency in the Transportation Sector: Enteros’ Cloud FinOps and Database Optimization Solutions
- 18 November 2024
- Database Performance Management
In the fast-evolving world of finance, where banking and insurance sectors rely on massive data streams for real-time decisions, efficient anomaly man…
Empowering Nonprofits with Enteros: Optimizing Cloud Resources Through AIOps Platform
In the fast-evolving world of finance, where banking and insurance sectors rely on massive data streams for real-time decisions, efficient anomaly man…