Preamble
I reasoned that since I’ve only recently become aware of the idea of “killed index tuples,” there may be some other people who haven’t yet heard of this intriguing PostgreSQL concept.
This may help you figure out why execution times for the same PostgreSQL query with the same execution plan can be so different.
Let’s review the life cycle of a table row version (also known as a “heap tuple”) before we take a closer look at the index.
Life, death and visibility in the table heap
It is widely known that the visibility of heap tuples is determined by the system columns xmin
and xmax
. A heap tuple is “dead” if its xmax
is less than the xmin
of all active transactions.
Now xmin
and xmax
are only valid if the respective transactions have been marked committed in the “commit log.” Consequently, any transaction that needs to know if it can see a tuple has to consult the commit log. The first person to look at the commit log will put the details in the “hint bits” of the tuple so that other people don’t have to do that extra work.
Dead tuples are eventually reclaimed by VACUUM
.
All of this is generally known, but how are index entries doing?
Life, death and visibility in the index
The visibility information is not stored in the index in order to avoid duplication and maintain small index tuples.
The status of an index tuple is determined by the heap tuple it points to, and both are removed by VACUUM
at the same time.
So, an index scan needs to look at the heap tuple to see if it can “see” an entry. This is the case even if all the columns needed are in the index tuple itself. Even worse, this “heap access” will lead to random I/O, which on spinning disks is not very effective.
Because of this, PostgreSQL’s index scans are more expensive than those in other database management systems with different architectures. Over time, a number of features have been added to help mitigate that:
- In order to reduce random I/O and avoid visiting the same block multiple times during an index scan, PostgreSQL 8.1 introduced the “bitmp index scan,” a scan method that first creates a list of heap blocks to visit before scanning them sequentially.
- The “index-only scan,” introduced in PostgreSQL 9.2, avoids fetching the heap tuple by requiring that all necessary columns be in the index and that the “visibility map” demonstrate that all tuples in the table block are visible to all users.
However, I want to concentrate on a different feature that was also included in 8.1 but was never mentioned in the release notes.
With this commit, the feature was unveiled:
commit 3f4d48802271126b1343289a9d2267ff1ed3788a Author: Tom Lane <tgl@sss.pgh.pa.us> Date: Fri May 24 18:57:57 2002 +0000 Mark index entries "killed" when they are no longer visible to any transaction, so as to avoid returning them out of the index AM. Saves repeated heap_fetch operations on frequently-updated rows. Also detect queries on unique keys (equality to all columns of a unique index), and don't bother continuing scan once we have found first match. Killing is implemented in the btree and hash AMs, but not yet in rtree or gist, because there isn't an equally convenient place to do it in those AMs (the outer amgetnext routine can't do it without re-pinning the index page). Did some small cleanup on APIs of HeapTupleSatisfies, heap_fetch, and index_insert to make this a little easier.
When an index scan retrieves a heap tuple and discovers that it is dead—or, to be more precise, that the entire “HOT chain” of tuples is dead—it marks the index tuple as “killed.” Afterward, future index scans can just ignore it.
Future index scans can be significantly sped up by using this “hint bit for indexes.”
An example
To illustrate killed index tuples, let’s build a table. To prevent autovacuum from ruining the effect by removing the dead tuples, we must turn it off.
CREATE UNLOGGED TABLE HOLE WITH (autovacuum_enabled = off) AND (id integer PRIMARY KEY, val text NOT NULL); Put into the gap ANALYZE hole; SELECT i, "text number" || i FROM generate_series(1, 1000000) AS i;
By removing rows from the table, we produce several dead tuples:
Delete from the hole where the ID is between 501 and 799500 and analyze the hole.
Spot the differences!
The same query is now executed twice:
SELECT * FROM hole WHERE id IS BETWEEN 1 AND 800000; QUERY PLAN EXPLAIN (ANALYZE, BUFFERS, COSTS OFF, TIMIMG OFF) --------------------------------------------------------------- Index Scan using hole_pkey on hole (actual rows=1000 loops=1) Index Cond: ((id >= 1) AND (id <= 800000)) Buffers: shared hit=7284 Planning Time: 0.346 ms Execution Time: 222.129 ms (5 rows) EXPLAIN (ANALYZE, BUFFERS, OFF COSTS, OFF TIMIMG) QUERY PLAN SELECT * FROM hole WHERE id IS BETWEEN 1 AND 800000; --------------------------------------------------------------- Index Scan on hole with hole_pkey (actual rows = 1000 loops = 1). Index Cond: ((id >= 1) AND (id <= 800000)) Buffers: shared hit=2196 Planning Time: 0.217 ms Execution Time: 14.540 ms (5 rows)
Discussion of the example
Numerous factors exclude them from being responsible for the difference, including:
- However, you can see from the execution plan that in both cases, all blocks were already in the cache (“shared hit”) and none had to be read from disk. Often, the second execution is faster because the first run has to read more data from secondary storage, which are already cached on the second run.
- No hint bits had to be written during the first execution (no buffers were “dirtied”), because that was already done during the sequential scan from the
DELETE
.
The 799000 index tuples that pointed to dead tuples had to be killed during the first execution, which required visiting all of the table blocks.
Because the second execution didn’t have to do that, it was ten times faster.
Because each table block contains many dead tuples, the difference in blocks touched by the two queries is not as great as one might think.
It is also worth noting that even though they are not shown in the EXPLAIN
output, some index pages must have been dirtied in the process, which will cause some disk writes from this read-only query.
Conclusion
The next time you experience enigmatic query run-time variations that cannot be attributed to caching effects, think about killed index tuples as the likely culprit. This could be a sign that autovacuum needs to be set to run more often on the table in question. This will get rid of the dead tuples and make the first query run faster.
About Enteros
Enteros offers a patented database performance management SaaS platform. It finds the root causes of complex database scalability and performance problems that affect business across a growing number of cloud, RDBMS, NoSQL, and machine learning database platforms.
The views expressed on this blog are those of the author and do not necessarily reflect the opinions of Enteros Inc. This blog may contain links to the content of third-party sites. By providing such links, Enteros Inc. does not adopt, guarantee, approve, or endorse the information, views, or products available on such sites.
Are you interested in writing for Enteros’ Blog? Please send us a pitch!
RELATED POSTS
Enteros, Database Performance, and Generative AI in Cloud FinOps for the Education Sector
- 27 February 2025
- Database Performance Management
In the fast-evolving world of finance, where banking and insurance sectors rely on massive data streams for real-time decisions, efficient anomaly man…
Optimizing Pharmaceutical Operations with Enteros: Enhancing Database Performance and Efficiency with AIOps Platforms
In the fast-evolving world of finance, where banking and insurance sectors rely on massive data streams for real-time decisions, efficient anomaly man…
Enteros for Media & Entertainment: Database Performance, Cloud FinOps, and Observability in a High-Demand Industry
- 26 February 2025
- Database Performance Management
In the fast-evolving world of finance, where banking and insurance sectors rely on massive data streams for real-time decisions, efficient anomaly man…
Enhancing Enterprise Performance in Healthcare: RevOps Strategies and Observability Platforms for Optimized Operations
In the fast-evolving world of finance, where banking and insurance sectors rely on massive data streams for real-time decisions, efficient anomaly man…