HOW TO FIX POSTGRESQL DATABASE CORRUPTION

Home > Enteros’ Blog – Thoughts on Database Technology, Machine / Deep learning, and a Generative AI > Database Performance Management > HOW TO FIX POSTGRESQL DATABASE CORRUPTION

Prerequisites

Of course, the majority of individuals do not want their databases to be harmed. These people will benefit from staying away from the strategies mentioned in this article. But some people might find it useful to corrupt a database on purpose, such as when testing a tool or process that will be used to find or fix corrupted data.

Prerequisites

We require a database with some data in it as well as active activity for several of our studies. The built-in PostgreSQL benchmark pgbench can be used for that. As a result of using a scale factor of 100, the largest table has 10 million rows:

$ pgbench -q -i -s 100

dropping old tables...

creating tables...

generating data (client-side)...

10000000 of 10000000 tuples (100%) done (elapsed 7.44 s, remaining 0.00 s)

vacuuming...

creating primary keys...

done in 10.12 s (drop tables 0.18 s, create tables 0.01 s, client-side generate 7.52 s, vacuum 0.14 s, primary keys 2.28 s).

Load will be generated with 5 concurrent client sessions:

1	`$ pgbench -c 5 -T 3600`

Creating a corrupt database by setting `fsync = off`

Let’s turn off the server while it is under load by setting fsync = off in postgresql.conf.

The AMCHEK plugin allows us to quickly identify data corruption.

postgres=# CREATEEXTENSION amcheck;

CREATEEXTENSION

postgres=# SELECTbt_index_parent_check('pgbench_accounts_pkey', TRUE, TRUE);

WARNING: concurrent deleteinprogress within table"pgbench_accounts"

ERROR: could notaccess status oftransaction1949706

DETAIL: Could notreadfromfile "pg_subtrans/001D"atoffset 196608: readtoo few bytes.

CONTEXT: while checking uniqueness oftuple (131074,45) inrelation "pgbench_accounts"

What happened? Data were no longer flushed to disk in the correct order, so that data modifications could hit the disk before the WAL did. This leads to data corruption during crash recovery.

Creating a corrupt database from a backup

While pgbench is running, we create a base backup:

$ psql

postgres=# SELECTpg_backup_start('test');

pg_backup_start

═════════════════

1/47F8A130

(1 row)

The function to start backup mode is pg backup start() rather than pg start backup because I am using PostgreSQL v15. This is due to PostgreSQL v15’s removal of the exclusive backup API, which had been deprecated since PostgreSQL 9.6. Read my revised post on the linked page to learn more.

Let’s determine the database’s and pgbench accounts’ primary key index object IDs:

postgres=# SELECTrelfilenode FROMpg_class

WHERErelname = 'pgbench_accounts_pkey';

relfilenode

═════════════

16430

(1 row)

postgres=# SELECToid FROMpg_database

WHEREdatname = 'postgres';

oid

═════

5

(1 row)

We copy the data directory to build a backup. Then, to make sure they are more current than the others, we duplicate the pgbench accounts primary key index and commit log:

$ cp-r data backup

$ cpdata/base/5/16430* backup/base/5

$ cpdata/pg_xact/* backup/pg_xact/

$ rmbackup/postmaster.pid

The crucial part: do not create `backup_label`

Now we exit backup mode, but ignore the contents of the backup_label file returned from pg_backup_stop():

postgres=# SELECTlabelfile FROMpg_backup_stop();

NOTICE: WAL archiving isnotenabled; you must ensure that allrequired WAL segments are copied through other means tocomplete the backup

labelfile

════════════════════════════════════════════════════════════════

START WAL LOCATION: 1/47F8A130 (file 000000010000000100000047)↵

CHECKPOINTLOCATION: 1/65CD24F0 ↵

BACKUP METHOD: streamed ↵

BACKUP FROM: primary ↵

START TIME: 2022-07-05 08:32:47 CEST ↵

LABEL: test ↵

START TIMELINE: 1 ↵

(1 row)

Then, let’s make sure that the last checkpoint in the control file is different:

$ pg_controldata -D backup | grepREDO

Latest checkpoint's REDO location: 1/890077D0

Latest checkpoint's REDO WAL file: 000000010000000100000089

Great! Let’s start the server:

$ echo'port = 5555'>> backup/postgresql.auto.conf

$ pg_ctl -D backup start

waiting forserver to start..... done

server started

Now an index scan on pgbench_accounts fails, because the index contains more recent data than the table:

postgres=# SELECT * FROM pgbench_accounts ORDER BY aid;
ERROR:  could not read block 166818 in file "base/5/16422.1": read only 0 of 8192 bytes

What took place? We recovered from the incorrect checkpoint because the backup label file was left out, and as a result, the data in the table and its index were no longer consistent. I merely wanted to underline the significance of the backup label; keep in mind that we can achieve the same result without pg backup start() and pg backup stop().

corrupting a database using pg resetwal

We crash the database while pgbench is loading it.

1	`pg_ctl stop -m immediate -D data`

Then we run pg_resetwal:

pg_resetwal -D data

The database server was not shut down cleanly.

Resetting the write-ahead log might cause data to be lost.

If you want to proceed anyway, use -f to force reset.

$ pg_resetwal -f -D data

Write-ahead log reset

Then we start the server and use amcheck like before to check the index for integrity:

postgres=# CREATEEXTENSION amcheck;

CREATEEXTENSION

postgres=# SELECTbt_index_parent_check('pgbench_accounts_pkey', TRUE, TRUE);

WARNING: concurrent deleteinprogress within table"pgbench_accounts"

ERROR: could notaccess status oftransaction51959

DETAIL: Could notreadfromfile "pg_subtrans/0000"atoffset 204800: readtoo few bytes.

CONTEXT: while checking uniqueness oftuple (1,1) inrelation "pgbench_accounts"

What happened? pg_resetwal is only safe to use on a cluster that was shutdown cleanly. The option -fis intended as a last-ditch effort to get a corrupted server to start and salvage some data. Only experts should use it.

Creating a corrupt database with `pg_upgrade --link`

We create a second cluster with initdb:

1	`$ initdb -E UTF8 --locale=C -U postgres data2`

Then we edit postgresql.conf and choose a different port number. After shutting down the original cluster, we run an “upgrade” in link mode:

$ pg_upgrade -d /home/laurenz/data-D /home/laurenz/data2\

> -b /usr/pgsql-15/bin-B /usr/pgsql-15/bin-U postgres --link

Performing Consistency Checks

...

Performing Upgrade

...

Adding ".old"suffix to old global/pg_control ok

If you want to start the old cluster, you will need to remove

the ".old"suffix from /home/laurenz/data/global/pg_control.old.

Because "link"mode was used, the old cluster cannot be safely

started once the new cluster has been started.

...

Upgrade Complete

----------------

Optimizer statistics are not transferred by pg_upgrade.

Once you start the new server, consider running:

/usr/pgsql-15/bin/vacuumdb-U postgres --all --analyze-in-stages

Running this script will delete the old cluster's data files:

./delete_old_cluster.sh

pg_upgrade renamed the control file of the old cluster, so that it cannot get started by accident. We’ll undo that:

1 2	`mv/home/laurenz/data/global/pg_control.old \` `>` `/home/laurenz/data/global/pg_control`

Now we can start both clusters and run pgbench on both. Soon we will see error messages like

ERROR: unexpected data beyond EOF in block 1 of relation base/5/16397

HINT: This has been seen to occur with buggy kernels; consider updating your system.

ERROR: duplicate key value violates unique constraint "pgbench_accounts_pkey"

DETAIL: Key (aid)=(8040446) already exists.

WARNING: could not write block 13 of base/5/16404

DETAIL: Multiple failures --- write error might be permanent.

ERROR: xlog flush request 0/98AEE3E0 is not satisfied --- flushed only to 0/648CDC58

CONTEXT: writing block 13 of relation base/5/16404

ERROR: could not access status of transaction 39798

DETAIL: Could not read from file "pg_subtrans/0000" at offset 155648: read too few bytes.

What happened? Since both clusters share the same data files, we managed to start two servers on the same data files. This leads to data corruption.

Creating a corrupt database by manipulating data files

For that, we figure out the file name that belongs to the table pgbench_accounts:

postgres=# SELECTrelfilenode FROMpg_class

WHERErelname = 'pgbench_accounts';

relfilenode

═════════════

16396

(1 row)

Now we stop the server and write some garbage into the first data block:

yes'this is garbage'| ddof=data/base/5/16396bs=1024 seek=2 count=1 conv=notrunc

0+1 records in

0+1 records out

1024 bytes (1.0 kB, 1.0 KiB) copied, 0.00031255 s, 3.3 MB/s

Then we start the server and try to select from the table:

1 2	`postgres=#` `TABLEpgbench_accounts ;` `ERROR: compressed pglz data` `iscorrupt`

What happened? We tampered with the data files, so it’s unsurprising that the table is corrupted.

Creating a corrupt database with catalog modifications

Who needs ALTER TABLE to drop a table column? We can simply run

DELETEFROMpg_attribute

WHEREattrelid = 'pgbench_accounts'::regclass

ANDattname = 'bid';

After that, an attempt to query the table will result in an error:

1	`ERROR: pg_attribute catalog is missing 1 attribute(s) for relation OID 16396`

What took place? We disregarded the fact that deleting a column in the pg attribute really removes the entry instead of setting the drop attribute to TRUE. Additionally, we failed to lock the table against concurrent access and failed to correctly check for dependencies in pg depend. It is not recommended to modify catalog tables because you get to keep both parts of the database if it fails.

Conclusion

There are numerous ways to corrupt a PostgreSQL database, as we have seen. Some of these were evident, but others could surprise a novice. A damaged database is not something you want.

Keep track of the system catalogs.
Don’t ever make changes to the data directory (with the exception of configuration files).
Run without FSYNC turned off.
Avoid calling pg resetwal -f on a wrecked server; instead, use pg upgrade —link to delete the old cluster following an upgrade.
Never remove or forget the backup label.
To avoid known software flaws, run PostgreSQL in a known version on dependable hardware.

With this information, hopefully some databases can be saved! Read my post on join strategies if you want to learn more about PostgreSQL performance problems.

About Enteros

Enteros offers a patented database performance management SaaS platform. It proactively identifies root causes of complex business-impacting database scalability and performance issues across a growing number of clouds, RDBMS, NoSQL, and machine learning database platforms.

The views expressed on this blog are those of the author and do not necessarily reflect the opinions of Enteros Inc. This blog may contain links to the content of third-party sites. By providing such links, Enteros Inc. does not adopt, guarantee, approve, or endorse the information, views, or products available on such sites.

Are you interested in writing for Enteros’ Blog? Please send us a pitch!

Optimizing Database Performance and Cloud FinOps in the Insurance Sector with Enteros and AWS CloudFormation

3 April 2025
Database Performance Management

In the fast-evolving world of finance, where banking and insurance sectors rely on massive data streams for real-time decisions, efficient anomaly man…

Driving Growth and RevOps Efficiency in the Technology Sector with Enteros

Database Performance Management

In the fast-evolving world of finance, where banking and insurance sectors rely on massive data streams for real-time decisions, efficient anomaly man…

Enhancing Healthcare Efficiency with Enteros: AIOps-Driven Database Performance Monitoring

2 April 2025
Database Performance Management

In the fast-evolving world of finance, where banking and insurance sectors rely on massive data streams for real-time decisions, efficient anomaly man…

Optimizing Cost Estimation and RevOps Efficiency with Enteros: Enhancing Financial Stability and Balance Sheet Management

Database Performance Management

In the fast-evolving world of finance, where banking and insurance sectors rely on massive data streams for real-time decisions, efficient anomaly man…

Prerequisites

Creating a corrupt database by setting fsync = off

Creating a corrupt database from a backup

The crucial part: do not create backup_label

Creating a corrupt database with pg_upgrade --link

Creating a corrupt database by manipulating data files

Creating a corrupt database with catalog modifications

About Enteros

RELATED POSTS

Optimizing Database Performance and Cloud FinOps in the Insurance Sector with Enteros and AWS CloudFormation

Driving Growth and RevOps Efficiency in the Technology Sector with Enteros

Enhancing Healthcare Efficiency with Enteros: AIOps-Driven Database Performance Monitoring

Optimizing Cost Estimation and RevOps Efficiency with Enteros: Enhancing Financial Stability and Balance Sheet Management

Creating a corrupt database by setting `fsync = off`

The crucial part: do not create `backup_label`

Creating a corrupt database with `pg_upgrade --link`