Prerequisites
Of course, the majority of individuals do not want their databases to be harmed. These people will benefit from staying away from the strategies mentioned in this article. But some people might find it useful to corrupt a database on purpose, such as when testing a tool or process that will be used to find or fix corrupted data.
Prerequisites
We require a database with some data in it as well as active activity for several of our studies. The built-in PostgreSQL benchmark pgbench can be used for that. As a result of using a scale factor of 100, the largest table has 10 million rows:
1
2
3
4
5
6
7
8
|
$ pgbench -q -i -s 100 dropping old tables... creating tables... generating data (client-side)... 10000000 of 10000000 tuples (100%) done (elapsed 7.44 s, remaining 0.00 s) vacuuming... creating primary keys... done in 10.12 s (drop tables 0.18 s, create tables 0.01 s, client-side generate 7.52 s, vacuum 0.14 s, primary keys 2.28 s). |
Load will be generated with 5 concurrent client sessions:
1
|
$ pgbench -c 5 -T 3600 |
Creating a corrupt database by setting fsync = off
Let’s turn off the server while it is under load by setting fsync = off in postgresql.conf.
The AMCHEK plugin allows us to quickly identify data corruption.
1
2
3
4
5
6
7
|
postgres=# CREATE EXTENSION amcheck; CREATE EXTENSION postgres=# SELECT bt_index_parent_check( 'pgbench_accounts_pkey' , TRUE , TRUE ); WARNING: concurrent delete in progress within table "pgbench_accounts" ERROR: could not access status of transaction 1949706 DETAIL: Could not read from file "pg_subtrans/001D" at offset 196608: read too few bytes. CONTEXT: while checking uniqueness of tuple (131074,45) in relation "pgbench_accounts" |
What happened? Data were no longer flushed to disk in the correct order, so that data modifications could hit the disk before the WAL did. This leads to data corruption during crash recovery.
Creating a corrupt database from a backup
While pgbench
is running, we create a base backup:
1
2
3
4
5
6
|
$ psql postgres=# SELECT pg_backup_start( 'test' ); pg_backup_start ═════════════════ 1/47F8A130 (1 row) |
The function to start backup mode is pg backup start() rather than pg start backup because I am using PostgreSQL v15. This is due to PostgreSQL v15’s removal of the exclusive backup API, which had been deprecated since PostgreSQL 9.6. Read my revised post on the linked page to learn more.
Let’s determine the database’s and pgbench accounts’ primary key index object IDs:
1
2
3
4
5
6
7
8
9
10
11
12
13
|
postgres=# SELECT relfilenode FROM pg_class WHERE relname = 'pgbench_accounts_pkey' ; relfilenode ═════════════ 16430 (1 row) postgres=# SELECT oid FROM pg_database WHERE datname = 'postgres' ; oid ═════ 5 (1 row) |
We copy the data directory to build a backup. Then, to make sure they are more current than the others, we duplicate the pgbench accounts primary key index and commit log:
1
2
3
4
|
$ cp -r data backup $ cp data /base/5/16430 * backup /base/5 $ cp data /pg_xact/ * backup /pg_xact/ $ rm backup /postmaster .pid |
The crucial part: do not create backup_label
Now we exit backup mode, but ignore the contents of the backup_label
file returned from pg_backup_stop()
:
1
2
3
4
5
6
7
8
9
10
11
12
13
|
postgres=# SELECT labelfile FROM pg_backup_stop(); NOTICE: WAL archiving is not enabled; you must ensure that all required WAL segments are copied through other means to complete the backup labelfile ════════════════════════════════════════════════════════════════ START WAL LOCATION: 1/47F8A130 (file 000000010000000100000047)↵ CHECKPOINT LOCATION: 1/65CD24F0 ↵ BACKUP METHOD: streamed ↵ BACKUP FROM : primary ↵ START TIME : 2022-07-05 08:32:47 CEST ↵ LABEL: test ↵ START TIMELINE: 1 ↵ (1 row) |
Then, let’s make sure that the last checkpoint in the control file is different:
1
2
3
|
$ pg_controldata -D backup | grep REDO Latest checkpoint's REDO location: 1 /890077D0 Latest checkpoint's REDO WAL file : 000000010000000100000089 |
Great! Let’s start the server:
1
2
3
4
|
$ echo 'port = 5555' >> backup /postgresql .auto.conf $ pg_ctl -D backup start waiting for server to start..... done server started |
Now an index scan on pgbench_accounts
fails, because the index contains more recent data than the table:
postgres=# SELECT * FROM pgbench_accounts ORDER BY aid; ERROR: could not read block 166818 in file "base/5/16422.1": read only 0 of 8192 bytes
What took place? We recovered from the incorrect checkpoint because the backup label file was left out, and as a result, the data in the table and its index were no longer consistent. I merely wanted to underline the significance of the backup label; keep in mind that we can achieve the same result without pg backup start() and pg backup stop().
corrupting a database using pg resetwal
We crash the database while pgbench is loading it.
1
|
pg_ctl stop -m immediate -D data |
Then we run pg_resetwal
:
1
2
3
4
5
6
|
pg_resetwal -D data The database server was not shut down cleanly. Resetting the write-ahead log might cause data to be lost. If you want to proceed anyway, use -f to force reset. $ pg_resetwal -f -D data Write-ahead log reset |
Then we start the server and use amcheck
like before to check the index for integrity:
1
2
3
4
5
6
7
|
postgres=# CREATE EXTENSION amcheck; CREATE EXTENSION postgres=# SELECT bt_index_parent_check( 'pgbench_accounts_pkey' , TRUE , TRUE ); WARNING: concurrent delete in progress within table "pgbench_accounts" ERROR: could not access status of transaction 51959 DETAIL: Could not read from file "pg_subtrans/0000" at offset 204800: read too few bytes. CONTEXT: while checking uniqueness of tuple (1,1) in relation "pgbench_accounts" |
What happened? pg_resetwal
is only safe to use on a cluster that was shutdown cleanly. The option -f
is intended as a last-ditch effort to get a corrupted server to start and salvage some data. Only experts should use it.
Creating a corrupt database with pg_upgrade --link
We create a second cluster with initdb
:
1
|
$ initdb -E UTF8 --locale=C -U postgres data2 |
Then we edit postgresql.conf
and choose a different port number. After shutting down the original cluster, we run an “upgrade” in link mode:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
|
$ pg_upgrade -d /home/laurenz/data -D /home/laurenz/data2 \ > -b /usr/pgsql-15/bin -B /usr/pgsql-15/bin -U postgres --link Performing Consistency Checks ... Performing Upgrade ... Adding ".old" suffix to old global /pg_control ok If you want to start the old cluster, you will need to remove the ".old" suffix from /home/laurenz/data/global/pg_control .old. Because "link" mode was used, the old cluster cannot be safely started once the new cluster has been started. ... Upgrade Complete ---------------- Optimizer statistics are not transferred by pg_upgrade. Once you start the new server, consider running: /usr/pgsql-15/bin/vacuumdb -U postgres --all --analyze- in -stages Running this script will delete the old cluster's data files: . /delete_old_cluster .sh |
pg_upgrade
renamed the control file of the old cluster, so that it cannot get started by accident. We’ll undo that:
1
2
|
mv /home/laurenz/data/global/pg_control .old \ > /home/laurenz/data/global/pg_control |
Now we can start both clusters and run pgbench
on both. Soon we will see error messages like
1
2
3
4
5
6
7
8
9
10
11
12
13
14
|
ERROR: unexpected data beyond EOF in block 1 of relation base/5/16397 HINT: This has been seen to occur with buggy kernels; consider updating your system. ERROR: duplicate key value violates unique constraint "pgbench_accounts_pkey" DETAIL: Key (aid)=(8040446) already exists. WARNING: could not write block 13 of base/5/16404 DETAIL: Multiple failures --- write error might be permanent. ERROR: xlog flush request 0/98AEE3E0 is not satisfied --- flushed only to 0/648CDC58 CONTEXT: writing block 13 of relation base/5/16404 ERROR: could not access status of transaction 39798 DETAIL: Could not read from file "pg_subtrans/0000" at offset 155648: read too few bytes. |
What happened? Since both clusters share the same data files, we managed to start two servers on the same data files. This leads to data corruption.
Creating a corrupt database by manipulating data files
For that, we figure out the file name that belongs to the table pgbench_accounts
:
1
2
3
4
5
6
|
postgres=# SELECT relfilenode FROM pg_class WHERE relname = 'pgbench_accounts' ; relfilenode ═════════════ 16396 (1 row) |
Now we stop the server and write some garbage into the first data block:
1
2
3
4
|
yes 'this is garbage' | dd of=data /base/5/16396 bs=1024 seek=2 count=1 conv=notrunc 0+1 records in 0+1 records out 1024 bytes (1.0 kB, 1.0 KiB) copied, 0.00031255 s, 3.3 MB /s |
Then we start the server and try to select from the table:
1
2
|
postgres=# TABLE pgbench_accounts ; ERROR: compressed pglz data is corrupt |
What happened? We tampered with the data files, so it’s unsurprising that the table is corrupted.
Creating a corrupt database with catalog modifications
Who needs ALTER TABLE
to drop a table column? We can simply run
1
2
3
|
DELETE FROM pg_attribute WHERE attrelid = 'pgbench_accounts' ::regclass AND attname = 'bid' ; |
After that, an attempt to query the table will result in an error:
1
|
ERROR: pg_attribute catalog is missing 1 attribute(s) for relation OID 16396 |
What took place? We disregarded the fact that deleting a column in the pg attribute really removes the entry instead of setting the drop attribute to TRUE. Additionally, we failed to lock the table against concurrent access and failed to correctly check for dependencies in pg depend. It is not recommended to modify catalog tables because you get to keep both parts of the database if it fails.
Conclusion
There are numerous ways to corrupt a PostgreSQL database, as we have seen. Some of these were evident, but others could surprise a novice. A damaged database is not something you want.
- Keep track of the system catalogs.
- Don’t ever make changes to the data directory (with the exception of configuration files).
- Run without FSYNC turned off.
- Avoid calling pg resetwal -f on a wrecked server; instead, use pg upgrade —link to delete the old cluster following an upgrade.
- Never remove or forget the backup label.
- To avoid known software flaws, run PostgreSQL in a known version on dependable hardware.
With this information, hopefully some databases can be saved! Read my post on join strategies if you want to learn more about PostgreSQL performance problems.
About Enteros
Enteros offers a patented database performance management SaaS platform. It proactively identifies root causes of complex business-impacting database scalability and performance issues across a growing number of clouds, RDBMS, NoSQL, and machine learning database platforms.
The views expressed on this blog are those of the author and do not necessarily reflect the opinions of Enteros Inc. This blog may contain links to the content of third-party sites. By providing such links, Enteros Inc. does not adopt, guarantee, approve, or endorse the information, views, or products available on such sites.
Are you interested in writing for Enteros’ Blog? Please send us a pitch!
RELATED POSTS
Enhancing Identity and Access Management in Healthcare with Enteros
- 19 November 2024
- Database Performance Management
In the fast-evolving world of finance, where banking and insurance sectors rely on massive data streams for real-time decisions, efficient anomaly man…
Maximizing Efficiency with Enteros: Revolutionizing Cost Allocation Through a Cloud Center of Excellence
In the fast-evolving world of finance, where banking and insurance sectors rely on massive data streams for real-time decisions, efficient anomaly man…
Driving Efficiency in the Transportation Sector: Enteros’ Cloud FinOps and Database Optimization Solutions
- 18 November 2024
- Database Performance Management
In the fast-evolving world of finance, where banking and insurance sectors rely on massive data streams for real-time decisions, efficient anomaly man…
Empowering Nonprofits with Enteros: Optimizing Cloud Resources Through AIOps Platform
In the fast-evolving world of finance, where banking and insurance sectors rely on massive data streams for real-time decisions, efficient anomaly man…