Dealing With Slow Queries With PostgreSQL
How to deal with queries that run too slowly.
There are always a few slow-running queries in every deployment.
Please continue reading to learn how to identify queries that run too slowly and determine their causes.
Use pg stat statements alone only?
Popular PostgreSQL extension pg stat statements are part of the base distribution and are pre-installed on almost all DBaaS providers. It is priceless and essentially the only way to obtain query statistics without bespoke extensions.
However, there are a few restrictions regarding identifying slow queries.
Compound Statistics
Every server query runs cumulative statistics from the pg stat statements extension. It displays metrics for each question, such as the total number of times it has been done and the overall execution time.
You must routinely retrieve the contents of the pg stat statements view, put it in a time series database, and compare the execution counts to “catch” sluggish queries when they occur. You can choose the questions with a greater execution count at 10.10 AM than at 10.00 AM, for instance, if you have the contents of pg stat statements at 10.00 AM and 10.10 AM. For these queries, you may use the following formula to determine the average execution time for this period:
(total time at 10.10 AM - total time at 10.00 AM) ÷ (total count at 10.10 AM - total count at 10.00 AM)
It does function quite well in actual use, but you’ll need a reliable monitoring infrastructure or a specialized service like pgDash.
Parameters for Query
Pg stat statements do not capture the values of bind parameters supplied to queries.
The amount of rows a condition is expected to filter out is one of the factors that the Postgres query planner calculates before choosing an execution plan. The planner may decide to perform a sequential scan of the entire table for the where clause country = “US” and may decide to use an index scan for country = “The UK” since the first where clause is anticipated to match the majority of the rows in the table. As an illustration, suppose that the value of the indexed column country in most table rows is “US.”
Can achieve a faster diagnosis of slow query issues by knowing the actual value of the parameters for which the query execution was slow.
Query Logging Too Slow
To log slow requests is a simpler solution. Contrary to another DBMS that makes this simple, PostgreSQL offers us several configuration options that are comparable in appearance:
- log statement
- slog min duration statement
- slog min duration sample
- slog statement sample rate
- slog parameter max length
- slog parameter max length on error
- slog duration
The Postgres manual goes into great detail about these. Here is a logical place to start:
# next line says only log queries that take longer 5 seconds log_min_duration_statement = 5s log_parameter_max_length = 1024 log_parameter_max_length_on_error = 1024
Which results in logs like these:
2022-04-14 06:17:11.462 UTC [369399] LOG: duration: 5.060 ms statement: select i.*, t."Name" as track, ar."Name" as artist from "InvoiceLine" as i join "Track" as t on i."TrackId" = t."TrackId" join "Album" as al on al."AlbumId" = t."AlbumId" join "Artist" as ar on ar."ArtistId" = al."ArtistId";
If there are too many logs, you can ask Postgres to log only (say) 50% of queries that run longer than 5 seconds:
log_min_duration_sample = 5s log_statement_sample_rate = 0.5 # 0..1 => 0%..100%
Of course, before adding these arguments to your Postgres settings, you should read the documentation to understand what they represent and imply. Be aware that the settings are peculiar and difficult to understand.
Plans for executing sluggish queries
Knowing that a query took a long time is typically not enough; you also need to understand why it took so long. You’ll typically inspect the query’s execution plan first for this.
Another essential PostgreSQL addition, auto explain, may log the execution plans of queries that have just finished running and is once more available on major DBaaS. It is described in this.
Usually, you would add auto explain to shared preload libraries and restart Postgres to enable it. A model beginning configuration is provided here:
# logs execution plans of queries that take 10s or more to run auto_explain.log_min_duration = 10s auto_explain.log_verbose = on auto_explain.log_settings = on auto_explain.log_format = json auto_explain.log_nested_statements = on # enabling these provide more information, but have a performance cost #auto_explain.log_analyze = on #auto_explain.log_buffers = on #auto_explain.log_wal = on #auto_explain.log_timing = on #auto_explain.log_triggers = on
Plans will then be logged in JSON format and can be seen in tools like these.
Still-Running Query
One thing unites all the methods mentioned above: they don’t give usable results until after a query has run. We cannot use them to handle requests executed so slowly the first time they are still running.
A backend, more precisely a client backend, manages each connection to a PostgreSQL server. Such a backend’s state is active when it is processing a query. In the transaction state, it is also possible that it started a transaction but is currently idle.
The list of all active Postgres backends is provided by the pg stat activity system view, which is described here. To find out which queries are still executing, you can query this view:
SELECT client_addr, query_start, query FROM pg_stat_activity WHERE state IN ('active', 'idle in transaction') AND backend_type = 'client backend';
By the way, it is impossible to determine the execution plan of a query that is currently being processed by a backend without using third-party extensions.
Locks
If a sluggish query’s execution plan does not reveal any apparent problems, contested locks may have caused the backend processing of the question to take longer than expected.
For various reasons, locks are acquired either directly or implicitly during the execution of queries. The Postgres documentation contains an entire chapter on this.
Locks for Logs
The lock timeout option is typically used on the client side to specify a maximum amount of waiting time. If a query has been waiting for a lock for this long, Postgres will stop it from running and log an error:
2021-01-30 09:35:52.415 UTC [67] psql postgres testdb 172.17.0.1 ERROR: canceling statement due to lock timeout
2021-01-30 09:35:52.415 UTC [67] psql postgres testdb 172.17.0.1 STATEMENT: cluster t;
Let’s say you want to set a lock timeout of 1 minute, but log queries that wait for locks for more than, say 30 seconds. You can do this using:
log_lock_waits = on deadlock_timeout = 30s
This will create logs like this:
2021-01-30 09:49:22.331 UTC [70] psql postgres testdb 172.17.0.1 LOG: process 70 still waiting for ShareLock on transaction 493 after 30009.004 ms 2021-01-30 09:49:22.331 UTC [70] psql postgres testdb 172.17.0.1 DETAIL: Process holding the lock: 68. Wait queue: 70. 2021-01-30 09:49:22.331 UTC [70] psql postgres testdb 172.17.0.1 CONTEXT: while locking tuple (0,3) in relation "t" 2021-01-30 09:49:22.331 UTC [70] psql postgres testdb 172.17.0.1 STATEMENT: select * from t for update;
Deadlock timeout is indeed the value that the lock wait logging system uses. Therefore the use of it is not a typo. In a perfect world, there would have been a log-min duration lock wait, but sadly, there isn’t one.
Postgres will abort stalled transactions after the deadlock timeout period in the event of actual deadlocks and log the infringing statements. There is no need for a specific configuration.
2021-01-30 09:55:37.724 UTC [68] psql postgres testdb 172.17.0.1 LOG: process 68 detected deadlock while waiting for ShareLock on transaction 496 after 30007.633 ms 2021-01-30 09:55:37.724 UTC [68] psql postgres testdb 172.17.0.1 DETAIL: Process holding the lock: 70. Wait queue: . 2021-01-30 09:55:37.724 UTC [68] psql postgres testdb 172.17.0.1 CONTEXT: while locking tuple (0,3) in relation "t" 2021-01-30 09:55:37.724 UTC [68] psql postgres testdb 172.17.0.1 STATEMENT: select * from t where a=4 for update; 2021-01-30 09:55:37.725 UTC [68] psql postgres testdb 172.17.0.1 ERROR: deadlock detected 2021-01-30 09:55:37.725 UTC [68] psql postgres testdb 172.17.0.1 DETAIL: Process 68 waits for ShareLock on transaction 496; blocked by process 70. Process 70 waits for ShareLock on transaction 495; blocked by process 68. Process 68: select * from t where a=4 for update; Process 70: select * from t where a=0 for update; 2021-01-30 09:55:37.725 UTC [68] psql postgres testdb 172.17.0.1 HINT: See server log for query details. 2021-01-30 09:55:37.725 UTC [68] psql postgres testdb 172.17.0.1 CONTEXT: while locking tuple (0,3) in relation "t" 2021-01-30 09:55:37.725 UTC [68] psql postgres testdb 172.17.0.1 STATEMENT: select * from t where a=4 for update;
How to Find Current Locks
The pg locks system view provides access to the whole list of locks that have recently been given. However, using the function pg blocking pids with pg stat activity, like in the following example, is frequently simpler:
SELECT state, pid, pg_blocking_pids(pid), query FROM pg_stat_activity WHERE backend_type='client backend';
which can show an output like this:
state | pid | pg_blocking_pids | query ---------------------+--------+------------------+------------------------------------------------- active | 378170 | {} | SELECT state, pid, pg_blocking_pids(pid), query+ | | | FROM pg_stat_activity + | | | WHERE backend_type='client backend'; active | 369399 | {378068} | cluster "Track"; idle in transaction | 378068 | {} | select * from "Track" for update; (3 rows)
This shows that PID 378068 is blocking the backend that is performing the CLUSTER command, the only backend that is now blocked (which has executed a SELECT..FOR UPDATE but is then idling within the transaction).
About Enteros
Enteros offers a patented database performance management SaaS platform. It proactively identifies root causes of complex business-impacting database scalability and performance issues across a growing number of clouds, RDBMS, NoSQL, and machine learning database platforms.
The views expressed on this blog are those of the author and do not necessarily reflect the opinions of Enteros Inc. This blog may contain links to the content of third-party sites. By providing such links, Enteros Inc. does not adopt, guarantee, approve, or endorse the information, views, or products available on such sites.
Are you interested in writing for Enteros’ Blog? Please send us a pitch!
RELATED POSTS
Enteros and CloudTech: Database Performance and RevOps in the BFSI Sector
- 20 February 2025
- Database Performance Management
In the fast-evolving world of finance, where banking and insurance sectors rely on massive data streams for real-time decisions, efficient anomaly man…
Optimizing Cost Attribution and Estimation in Healthcare: Enhancing Database Performance Monitoring with Enteros
In the fast-evolving world of finance, where banking and insurance sectors rely on massive data streams for real-time decisions, efficient anomaly man…
Enteros and Cloud FinOps: Driving Database Performance and Observability in the Financial Sector
- 19 February 2025
- Database Performance Management
In the fast-evolving world of finance, where banking and insurance sectors rely on massive data streams for real-time decisions, efficient anomaly man…
Optimizing Cost Estimation in the Tech Sector: How Enteros Leverages Logical Models and Cloud FinOps for Smarter Database Management
In the fast-evolving world of finance, where banking and insurance sectors rely on massive data streams for real-time decisions, efficient anomaly man…