Preamble
In the world of PostgreSQL, autovacuum may be one of the most cherished features, but it’s also one of the most misunderstood. In some way, autovacuum or vacuum in general are topics that our PostgreSQL support team receives a lot of questions about. Therefore, it might make sense to take this chance to shed some light on vacuum and a concept known as “wraparound protection.”
The circle of time
Occasionally, PostgreSQL starts an autovacuum worker process in a “special way,” as some users may have noticed. Here’s an example
autovacuum: VACUUM public.x (to prevent wraparound)
This “to prevent wraparound” notice is occasionally added by autovacuum to the process title. Now, the crucial query is: What’s the point?
As you may be aware, PostgreSQL uses the MVCC (MultiVersion Concurrency Control) mechanism to regulate transaction visibility. Why does that matter? Here is an easy illustration:
User 1 | User 2 | Remarks |
BEGIN; | ||
UPDATE tab SET field = 2 WHERE field = 1; |
BEGIN; | |
COMMIT; | SELECT * FROM tab; | ←User 2 still sees “1” because the first transaction has not committed in time |
COMMIT; |
Internally, the main method of managing visibility is by comparing transaction IDs. Every reading transaction records which transactions it can see by taking a snapshot, and every version of every row is marked with the transaction ID that wrote it (for more information on how this works, see tqual.c in the PostgreSQL source code). There is only one issue: There are a total of 4 billion transaction IDs. They will eventually wrap around, with the previous 2 billion values being regarded as the past and the following 2 billion as the future. The database will produce false results because what was once past becomes the present. If this might occur, PostgreSQL will stop accepting new transactions, and autovacuum will engage to stop things from getting to that point.
The meaning of autovacuum_freeze_max_age
For a while, comparing transaction IDs to determine visibility worked, but it was very expensive because you had to check the commit log to see if the transaction had actually committed or not. The result of this check will be cached as “hint bits” in the row header by the initial reader of the row in order to skip this check. Readers in the future will be able to tell if a row was added before or after a snapshot was taken by comparing the transaction ID of the row to the snapshot.
In a similar manner, the row is “frozen” in this process. If a vacuum process finds that a row was added before any currently active transaction, it will use a special combination of hint bits to mark the row as “frozen.” This tells any future readers that this row was added a long time ago, so they don’t need to check the transaction ID. Also, if all rows are frozen before their transaction IDs wrap around, there is no chance of getting wrong results.
Wraparound preventive vacuuming can help in this situation. Typically, vacuum procedures only check pages with dead or unread rows on them by using a tool called a visibility map. Vacuum will scan the entire database once enough time has passed, freeze anything it can, and record the oldest unfrozen row so it will know when to repeat this process in the future. PostgreSQL controls the cleanup process at the autovacuum freeze maximum age control point, which is set by default to 200.000.000 transactions. Most of the time, this gives autovacuum enough time to avoid the dangerous wraparound situation that can happen when a table has more than 2 billion transactions.
MultiXact IDs are protected using the same security method. Most of the time, MultiXact IDs move much more slowly than standard translation IDs, so they rarely cause wraparound vacuums. Nonetheless, they should be mentioned for completeness’s sake.
What this means to PostgreSQL administrators
There is a lot more to say about how VACUUM and autovacuum function internally, but what does it all mean for system administrators who want to keep a system up and running?
The most critical thing is: If you see…
autovacuum: VACUUM public.x (to prevent wraparound)
Don’t panic. The autovacuum does its job as expected. There is no need to panic if autovacuum begins to operate after 200 million transactions (or a little more), as a wraparound problem is still hundreds of millions of transactions away. Autovacuum will merely make sure that your table is okay; no action is required to prevent problems.
It is sometimes necessary to determine how close a table is to wraparound protection. PostgreSQL administrators can use the following query to see when autovacuum will start a wraparound VACUUM:
test=# SELECT oid::regclass::text AS table, age(relfrozenxid) AS xid_age, mxid_age(relminmxid) AS mxid_age, least( (SELECT setting::int FROM pg_settings WHERE name = 'autovacuum_freeze_max_age') - age(relfrozenxid), (SELECT setting::int FROM pg_settings WHERE name = 'autovacuum_multixact_freeze_max_age') - mxid_age(relminmxid) ) AS tx_before_wraparound_vacuum, pg_size_pretty(pg_total_relation_size(oid)) AS size, pg_stat_get_last_autovacuum_time(oid) AS last_autovacuum FROM pg_class WHERE relfrozenxid != 0 AND oid > 16384 ORDER BY tx_before_wraparound_vacuum; -[ RECORD 1 ]---------------+------------------------ table | a xid_age | 139 mxid_age | 0 tx_before_wraparound_vacuum | 199999861 size | 280 MB last_autovacuum | -[ RECORD 2 ]---------------+------------------------ table | b xid_age | 86 mxid_age | 0 tx_before_wraparound_vacuum | 199999914 size | 280 MB last_autovacuum |
The query displays for each table the timing of the relation’s upcoming autovacuum start. In my example, wraparound protection is still more than 199 million transactions away.
For a wraparound vacuum to identify the earliest non-frozen row, the entire table must be inspected. Large tables and numerous transactions might cause this to occur frequently and generate a lot of disk I/O. The visibility map was updated in PostgreSQL 9.6 to include a new bit that informs PostgreSQL which tables have previously been frozen and don’t need to be modified any longer. This significantly accelerates wraparound vacuums for append-mostly tables, which are typical of most enormous tables.
Running closer to the wraparound limit will save overhead on older versions. By setting autovacuum freeze max age, for instance, to 1,000,000,000,000, you may reduce the frequency of full scans by 5 times. Despite the considerably smaller incremental advantages, those with greater levels of courage might wish to push themselves even closer to the edge. If you do decide to raise the value, keep an eye on how well autovacuum is keeping up to avoid experiencing downtime when your transaction rate exceeds autovacuum’s capacity to freeze.
About Enteros
Enteros offers a patented database performance management SaaS platform. It finds the root causes of complex database scalability and performance problems that affect business across a growing number of cloud, RDBMS, NoSQL, and machine learning database platforms.
The views expressed on this blog are those of the author and do not necessarily reflect the opinions of Enteros Inc. This blog may contain links to the content of third-party sites. By providing such links, Enteros Inc. does not adopt, guarantee, approve, or endorse the information, views, or products available on such sites.
Are you interested in writing for Enteros’ Blog? Please send us a pitch!
RELATED POSTS
Enhancing Identity and Access Management in Healthcare with Enteros
- 19 November 2024
- Database Performance Management
In the fast-evolving world of finance, where banking and insurance sectors rely on massive data streams for real-time decisions, efficient anomaly man…
Maximizing Efficiency with Enteros: Revolutionizing Cost Allocation Through a Cloud Center of Excellence
In the fast-evolving world of finance, where banking and insurance sectors rely on massive data streams for real-time decisions, efficient anomaly man…
Driving Efficiency in the Transportation Sector: Enteros’ Cloud FinOps and Database Optimization Solutions
- 18 November 2024
- Database Performance Management
In the fast-evolving world of finance, where banking and insurance sectors rely on massive data streams for real-time decisions, efficient anomaly man…
Empowering Nonprofits with Enteros: Optimizing Cloud Resources Through AIOps Platform
In the fast-evolving world of finance, where banking and insurance sectors rely on massive data streams for real-time decisions, efficient anomaly man…