Preamble
Despite the fact that Postgres is a fantastic all-around product with an unbeatable price-performance ratio, there is one area where things could be done a little more effectively: replica validation, or perhaps more broadly, cluster validation.
What’s the issue? There isn’t actually a “pushbutton” way to determine whether a new replica is flawless after it has been constructed. The replica building tool pg_basebackup simply streams over the datafiles and transaction logs and writes them to disk, and that’s it. There are, of course, other tools/approaches, but I suspect that the pg_basebackup tool has 90% of the “market.” Then you typically start your server, check to see if you can connect to it, and possibly run some “select count(*) from my_favorite_table” queries to make sure everything is more-or-less plausible, before grinning with satisfaction and going to get a coffee to work on other pressing issues. But how far did we go to ensure that the cluster was still intact?
What could go wrong with replicas?
What could possibly go wrong when creating a new replica? Naturally, there isn’t much cause for concern if pg_basebackup reports success; it is highly unlikely that something is wrong. But a few things still irritate me. I may be a little paranoid about, but keep reading to find out what I mean.
Starting with hard disks, disk errors should theoretically be a thing of the past, and business-grade disk systems should have error detection built in… But everything in the modern world is so virtualized and abstracted that you hardly even know what’s underneath. Perhaps there is some network storage at fault? Then, it definitely needs drivers for both the hardware and the operating system. And every piece of software has bugs, as we are all aware. Google says that there are usually 15 to 50 errors for every 1000 lines of code that is sent. Therefore, when writing datafiles, it is impossible to completely rule out the possibility of silent data corruption, especially since replicas frequently use new hardware that has not yet been thoroughly tested.
In these circumstances, Postgres’ “data-checksums” flag can be useful, but it must be configured during cluster initialization and has a negligible performance impact, so it isn’t always used.
What else? Postgres makes it possible to create replicas across different operating systems (for instance, master on Ubuntu and replica on CentOS, which is obviously a great feature in and of itself), as well as the potential for minor version differences, such as master on 9.6.2 and replica accidentally on 9.6.0. Furthermore, there is a chance that the replication code will malfunction; I can think of at least one instance from a few years ago in which committed rows showed up on replicas as uncommitted. This is very rare of course but could for sure technically happen again.
Status Quo
In other words, there are a few “ifs,” and the issue is, “What can be done there?” The only option that is 100 percent trustworthy is to perform a SQL dump/restore because, as was previously stated, there is sadly no tool from the Postgres project to perform full verification (and I’m not even sure if there should be one or if it can even be done). However, if you need to switch over the master in a covert manner, this obviously won’t work and could take hours and hours for databases that are 1TB or larger.
Therefore, simply dumping the database is a good compromise there to feel even marginally better before making the switchover! At the very least, this will confirm that there isn’t any silent data corruption. However, depending on your DB-size and hardware, this could also take hours. But there are some tricks to make it go faster—more on that later.
Speeding up dummy SQL dumps on replicas
1. Using “/dev/null” is a great performance booster since we only need to confirm that data can be read out, not actually store the SQL dump. When using the default plain SQL mode of pg_dumpall or pg_dump, it is very simple to:
pg_dumpall -h /var/run/postgresql >/dev/null
2. Connecting to the local server via the Unix socket rather than the “localhost” TCP stack should already yield a 10-15% speedup.
3. Using several processes to dump the data! When time is of the essence, this can be very helpful for larger databases. Simply set the “jobs” flag to an appropriate value to place a strain on the IO-subsystem. NB! For a number of tasks, the “directory” output format for the pg_dump (-Fd/-format=directory + -f/-file=dirname) is a must, but it also has some drawbacks:
- There are of course workarounds, e.g. check out the nullfs FUSE filesystem driver but it’s tedious to say the least. pg_dump “directory” format does not get along with /dev/null as it wants to set up a directory structure.
- For our “quick verification” goal, this is not too catastrophic, and we can ignore the initial error by providing the “-no-synchronized-snapshots” flag. This will be fixed in upcoming Postgres 10, but for now, it is not possible to obtain a consistent snapshot when dumping replicas with multiple processes.
- Pg_dump only works “per database,” so if your cluster contains a large number of DBs, you may need to use some lines of Bash, for example.
4. Create a quick custom script that takes care of the unfavorable effects of approach number 3 so you don’t have to. It’s a straightforward Python script that simply spawns a predetermined number of worker processes (by default, half of the CPUs) and dumps all tables sequentially to /dev/null; all that’s needed is the Postgres “bindir” and the Unix socket as the host.
NB: If there is activity on the master, make sure the query conflict situation has been resolved before starting the dumping process on the replica as it will fail quickly otherwise! Configuring hot standby feedback or allowing the replica to “fall behind” by setting the maximum standby streaming delay to -1 or a large value are two options.
That’s all, I sincerely hope you followed along and found some food for thought. Feedback is always welcome!
About Enteros
Enteros offers a patented database performance management SaaS platform. It finds the root causes of complex database scalability and performance problems that affect business across a growing number of cloud, RDBMS, NoSQL, and machine learning database platforms.
The views expressed on this blog are those of the author and do not necessarily reflect the opinions of Enteros Inc. This blog may contain links to the content of third-party sites. By providing such links, Enteros Inc. does not adopt, guarantee, approve, or endorse the information, views, or products available on such sites.
Are you interested in writing for Enteros’ Blog? Please send us a pitch!
RELATED POSTS
Enhancing Identity and Access Management in Healthcare with Enteros
- 19 November 2024
- Database Performance Management
In the fast-evolving world of finance, where banking and insurance sectors rely on massive data streams for real-time decisions, efficient anomaly man…
Maximizing Efficiency with Enteros: Revolutionizing Cost Allocation Through a Cloud Center of Excellence
In the fast-evolving world of finance, where banking and insurance sectors rely on massive data streams for real-time decisions, efficient anomaly man…
Driving Efficiency in the Transportation Sector: Enteros’ Cloud FinOps and Database Optimization Solutions
- 18 November 2024
- Database Performance Management
In the fast-evolving world of finance, where banking and insurance sectors rely on massive data streams for real-time decisions, efficient anomaly man…
Empowering Nonprofits with Enteros: Optimizing Cloud Resources Through AIOps Platform
In the fast-evolving world of finance, where banking and insurance sectors rely on massive data streams for real-time decisions, efficient anomaly man…