Article
PostgreSQL is a highly developed relational database system that performs incredible operations. But sophistication also means that there is a level of complexity underneath that may be hard for users to understand. Hints are one thing most people are unaware of. How do hint bits work? They are actually an internal optimization that speeds up the time it takes to do visibility checks. You may already know that PostgreSQL used to have to check each row to see if it was visible to a user. The best way to expedite the procedure is, of course, to prevent that, which is why hint bits were created.
When PostgreSQL finds a row that is visible to all transactions, it sets a bit in the row header to show this. The hint bit indicates that this row must be visible in order to avoid having to compare the row visibility information to their snapshot the next time someone finds it.
the creation of some test data
To demonstrate how things work, some demo data can be created:
test=# CREATE TABLE t_test (a int, b int); CREATE TABLE
An easy Perl script can create the data effectively. Half a billion rows are used in this example because they are both large enough to make the point clear and compact enough to finish the test quickly:
[hs@jacqueline blog] $ cat numbers.pl #!/usr/bin/perl for ($count = 1; $count <= 500000000; $count++) { print "$count $count\n"; }
Autovacuum is turned off during this test to keep it from ruining our day:
test=# ALTER TABLE t_test SET (autovacuum_enabled = off); ALTER TABLE
The data is then loaded into PostgreSQL. Reading data into a pipe while using the “COPY… FROM PROGRAM” command is the simplest way to accomplish this. How does it function?
test=# \timing Timing is om. test=# COPY t_test FROM PROGRAM '/path/numbers.pl'; COPY 500000000 Time: 612716.438 ms
The data loads on our system, which has a 750 GB SATA disk and an Intel i5 processor, in about 10 minutes. It’s important to note that PostgreSQL’s default settings were used, which means, for example, that checkpoint optimizations were not used. The output from “top” is shown in the following listing:
VIRTUAL RES SHR% PID USER PR NI CPU %MEM TIME+ COMMAND 27570 hs 20 0 257m 24m 23m R 48.6 0.2 0:10.33 postgres 27574 hs 20 0 119m 1584 1252 R 31.6 0.0 0:06.66 numbers.pl 27356 hs 20 0 256m 16m 16m S 1.3 0.1 0:00.18 postgres
It’s clear that making numbers takes up a big chunk of the CPU’s time. This is useful because it demonstrates how affordable PostgreSQL is in comparison to the price of data production.
hints: The first interaction…
Let’s try running a simple SELECT to see if anything out of the ordinary has been discovered thus far.
count(*) FROM t_test; test=# SELECT count(*) ----------- 500000000 (1 row) Time: 359669.393 ms
The really interesting part is nicely concealed below the surface and is not the result of the query. Here is vmstat:
[hs@jacqueline blog] $ vmstat 2 procs -----------memory---------- ---swap-- -----io---- --system -- -----cpu----- r b swpd free buff cache si so bi bo in cs us sy id wa st 1 0 164 4968092 46412 10534028 0 0 3486 1903 3 13 1 0 91 7 0 0 0 164 4967960 46412 10534144 0 0 0 0 78 225 0 0 100 0 0 0 0 164 4967960 46420 10534144 0 0 0 8 72 203 0 0 99 1 0 0 0 164 4967960 46420 10534144 0 0 0 0 103 301 0 0 100 0 0 0 1 164 4946260 46420 10555520 0 0 10586 20 246 437 1 0 97 2 0 1 1 164 4732608 46420 10763488 0 0 104000 0 1403 2046 10 2 76 12 0 0 1 164 4517592 46420 10972988 0 0 104704 0 1415 2042 10 3 76 11 0 1 0 164 4301832 46420 11183036 0 0 105024 0 1410 2038 10 3 75 12 0 1 0 164 4082228 46420 11396740 0 0 106880 4 1449 2048 11 3 75 11 0 0 1 164 3868576 46420 11604996 0 0 104064 0 1414 2050 10 3 75 12 0 1 0 164 3650460 46420 11817164 0 0 106176 0 1415 2058 11 2 75 12 0 0 1 164 3433460 46420 12028640 0 0 105664 0 1419 2078 10 2 75 12 0 1 1 164 3262588 46420 12194508 0 0 82944 45356 1199 2209 8 3 66 23 0 1 1 164 3143176 46420 12310644 0 0 58112 43818 949 1738 6 2 75 18 0 1 1 164 3018680 46420 12432100 0 0 60736 36900 940 1760 6 2 75 17 0 1 2 164 2897284 46420 12550404 0 0 59136 29726 917 1727 6 2 75 16 0 0 2 164 2775020 46420 12669348 0 0 59392 36642 912 1746 6 2 76 17 0 0 2 164 2656848 46420 12784480 0 0 57600 36388 883 1687 6 2 75 17 0 1 1 164 2537560 46420 12900636 0 0 58048 36644 937 1755 5 2 73 19 0
Particularly intriguing are these two columns, “bi” (for blocks in) and “bo” (for blocks out). Things appear to be fairly typical at first. We start reading at a speed of about 100 MB/s, which is a reasonable cost for a desktop with a single SATA drive. But after that, everything starts to go wrong. PostgreSQL begins writing data to disk at some point. What’s the reason for that? PostgreSQL can’t tell when a row is loaded if it is visible to others without checking visibility once more. Therefore, when a row is read for the first time, PostgreSQL will mark it as “visible to everyone.” Rows marked as “visible to everyone” cause I/O precisely because of this. This has the benefit of greatly accelerating visibility checks.
The I/O pattern changes completely when the query is run again:
[hs@jacqueline blog] $ vmstat 2 procs -----------memory---------- ---swap-- -----io---- --system -- -----cpu----- r b swpd free buff cache si so bi bo in cs us sy id wa st 0 1 164 191076 1688 15325392 0 0 104256 0 1401 2077 8 2 75 15 0 0 1 164 183140 1688 15334056 0 0 105792 0 1394 2113 8 2 75 15 0 0 1 164 187728 1688 15329596 0 0 106880 0 1429 2121 9 2 75 15 0 0 1 164 190828 1688 15327804 0 0 105472 0 1413 2124 8 2 75 15 0 2 0 164 184008 1688 15335792 0 0 104640 0 1389 2108 8 2 75 15 0 1 0 164 188348 1688 15336880 0 0 106240 0 1466 2107 9 2 75 14 0 1 0 164 191944 1692 15337824 0 0 100226 4 1372 2060 8 2 75 15 0
As you can see, there is only a clean 100+ MB per second, and no more writing is occurring. That is exactly what we want to see. What matters in this case is how much faster the read is now.
test=# SELECT count(*) FROM t_test; count ----------- 500000000 (1 row) Time: 110263.714 ms
Wow, this one took a lot less time to complete, and more importantly, the execution time is fairly consistent:
Time: 110382.362 ms
According to the theory, investing in hint bits makes a lot of sense because it greatly reduces runtimes because reads happen more frequently than writes.
Going to the VACUUM
But what if we vacuum the table to give it a little extra?
Here is vmstat again:
procs -----------memory -----------memory ---------- ---swap-- -----io--- --system --system -- -----cpu----- r b swpd free buff cache si so bi bo in cs us sy id wa st 1 1 164 161936 616 15427528 0 0 50718 356 845 1262 4 2 79 15 0 0 1 164 168136 616 15421384 0 0 87680 636 1267 2517 6 4 75 15 0 0 1 164 169624 616 15419204 0 0 88704 624 1254 2529 6 3 75 15 0 0 1 164 179296 616 15409532 0 0 90304 560 1279 2435 6 4 75 15 0 0 1 164 185868 616 15402548 0 0 89216 628 1319 1898 7 4 74 15 0 1 1 164 178924 616 15409580 0 0 88832 628 1295 1960 7 3 75 15 0 0 1 164 183636 616 15404972 0 0 80166 624 1196 1802 6 3 75 16 0 0 1 164 186240 616 15402652 0 0 91648 568 1354 1951 7 4 75 15 0 0 2 164 180660 624 15408296 0 0 89472 670 1307 1982 6 3 75 16 0 1 1 164 170120 624 15418500 0 0 72260 49908 1080 2035 5 3 70 22 0 0 2 164 164788 624 15423656 0 0 67072 22782 995 1955 4 2 70 23 0 0 2 164 168012 624 15420556 0 0 61888 22492 984 1928 4 2 73 20 0 0 2 164 164044 624 15424496 0 0 60672 29582 959 1887 4 2 73 20 0 0 2 164 172352 624 15416452 0 0 67712 22524 1053 2097 5 3 72 20 0
In this case, the disk severely restricts us, as demonstrated by the next two lines taken from “top”:
PID USER PR NI RES SHR S%CPU%MEM TIME+ COMMAND
27570 hs 20 0 257m 30m 29m R 26.3 0.2 9:34.80 postgres 59 root 20 0 0 0 0 S 1.7 0.0 18:00.13 kswapd0 36 root 20 0 0 0 0 S 1.0 0.0 10:42.29 kblockd/2 27245 root 20 0 0 0 0 D 0.3 0.0 0:03.53 flush-8:16 30861 root 20 0 15036 1196 836 R 0.3 0.0 0:03.75 top
VACUUM takes quite a while as well:
test=# VACUUM t_test; VACUUM Time: 363773.532 ms
The beauty of this is that VACUUM can now determine whether an entire block is visible or not in addition to whether a single row is visible or not.
Once more, we observe an increase in speed, but it is only very slight (only two seconds):
test=# SELECT count(*) FROM t_test; count ----------- 500000000 (1 row) Time: 108715.332 ms
The majority of the benefit, which is sizable, is actually provided by the bits attached to a single line.
About Enteros
Enteros offers a patented database performance management SaaS platform. It finds the root causes of complex database scalability and performance problems that affect business across a growing number of cloud, RDBMS, NoSQL, and machine learning database platforms.
The views expressed on this blog are those of the author and do not necessarily reflect the opinions of Enteros Inc. This blog may contain links to the content of third-party sites. By providing such links, Enteros Inc. does not adopt, guarantee, approve, or endorse the information, views, or products available on such sites.
Are you interested in writing for Enteros’ Blog? Please send us a pitch!
RELATED POSTS
Revolutionizing Healthcare IT: Leveraging Enteros, FinOps, and DevOps Tools for Superior Database Software Management
- 21 November 2024
- Database Performance Management
In the fast-evolving world of finance, where banking and insurance sectors rely on massive data streams for real-time decisions, efficient anomaly man…
Optimizing Real Estate Operations with Enteros: Harnessing Azure Resource Groups and Advanced Database Software
In the fast-evolving world of finance, where banking and insurance sectors rely on massive data streams for real-time decisions, efficient anomaly man…
Revolutionizing Real Estate: Enhancing Database Performance and Cost Efficiency with Enteros and Cloud FinOps
In the fast-evolving world of finance, where banking and insurance sectors rely on massive data streams for real-time decisions, efficient anomaly man…
Enteros in Education: Leveraging AIOps for Advanced Anomaly Management and Optimized Learning Environments
In the fast-evolving world of finance, where banking and insurance sectors rely on massive data streams for real-time decisions, efficient anomaly man…