Preamble
Is there a single sizable, significant, and important database that does not occasionally experience performance issues? There are probably not too many, bet me. So, every DBA (database administrator) in charge of PostgreSQL should be able to spot possible performance problems and figure out why they are happening.
Improving PostgreSQL performance beyond parameter tuning
People often think that changing the parameters in postgresql.conf is the best way to get things to work. This isn’t always the case, though. Yes, good database configuration parameters are frequently very helpful. But the real problems are often caused by a strange query that is buried deep in the logic of an application. It’s even possible that the queries causing real problems aren’t the ones you’re focusing on. The logical question that now arises is: How can we find those queries and determine what is actually happening? The tool I prefer to use for that is pg_stat_statements, which, in my opinion, should always be enabled if you are using PostgreSQL 9.2 or higher (do not use it in earlier versions).
Enabling pg_stat_statements
To turn on pg_stat_statements on your server, you must change the following line in postgresql.conf and then restart PostgreSQL:
“pg_stat_statements” is the shared preload library.
Once this module is loaded into the server, PostgreSQL will start gathering information on its own. The module’s extremely low overhead—which essentially consists of “noise”—is a good thing.
then issue the following command to create the required view for data access:
CREATE EXTENSION pg_stat_statements;
The extension will make the data available and deploy a view called pg_stat_statements.
Detecting slow queries in PostgreSQL
Sorting the results of pg_stat_statements by total_time makes it simple to find the most intriguing queries:
SEARCH pg_stat_statements FOR * ORDER BY total_time DESC;
The best part about this situation is that the type of query that takes up the most time will logically appear at the top of the listing. The best approach is to move down from the top query to, say, the tenth query and observe what is happening there.
In my opinion, it is impossible to tune a system without taking a close look at the database server’s most time-consuming queries.
Taking a deep look at PostgreSQL performance
Beyond just the query and how much time it consumed, pg_stat_statements has a lot more information to offer. The view’s structure is as follows:
test=# \d pg_stat_statements View "public.pg_stat_statements" Column | Type | Collation | Nullable | Default ---------------------+------------------+-----------+----------+--------- userid | oid | | | dbid | oid | | | queryid | bigint | | | query | text | | | calls | bigint | | | total_time | double precision | | | min_time | double precision | | | max_time | double precision | | | mean_time | double precision | | | stddev_time | double precision | | | rows | bigint | | | shared_blks_hit | bigint | | | shared_blks_read | bigint | | | shared_blks_dirtied | bigint | | | shared_blks_written | bigint | | | local_blks_hit | bigint | | | local_blks_read | bigint | | | local_blks_dirtied | bigint | | | local_blks_written | bigint | | | temp_blks_read | bigint | | | temp_blks_written | bigint | | | blk_read_time | double precision | | | blk_write_time | double precision | | |
Examining the stddev time column can also be highly helpful. It will show if queries of a certain type usually take the same amount of time to run. If the standard deviation is big, you may expect that some of these queries will run quickly while others will take longer, which could hurt the user experience.
The “rows” column can be very useful as well. If 1000 calls have returned 1,000,000,000 rows, then each call has, on average, returned 1,000,000 rows. It is simple to understand that returning a lot of data all the time is not a smart idea.
The shared_* will be useful if you want to see if a specific type of query exhibits poor caching performance. In other words, if pg stat statements are enabled, PostgreSQL can give you the cache hit rate of every type of query.
Consider looking at the “temp blacks” (*) fields as well. PostgreSQL needs temporary blocks whenever it needs to read from the disk to sort or materialize data.
BLK READ TIME and BLK WRITE TIME are the last two. In most cases, unless track IO timing is enabled, the fields are empty. The goal is to be able to determine how much time a specific kind of query spends performing I/O. It will enable you to determine whether your system is CPU- or I/O-bound. I/O timing should generally be enabled because it will provide you with important information.
Dealing with Java and Hibernate
Good information is delivered by pg_stat_statements. However, in some cases, it can cut off the query because of a config variable:
test=# SHOW track_activity_query_size; track_activity_query_size --------------------------- 1024 (1 row)
For most applications, 1024 bytes are absolutely enough. However, this is usually not the case if you are running Hibernate or Java. Hibernate tends to send insanely long queries to the database, and thus the SQL code might be cut off long before the relevant parts (e.g., the FROM-clause, etc.) start. Therefore, it makes a lot of sense to increase track_activity_query_size to a higher value (maybe 32.786).
Helpful queries to detect bottlenecks in PostgreSQL
One query stands out to me as particularly helpful in this situation: the following query displays 20 time-consuming statements:
test=# SELECT substring(query, 1, 50) AS short_query, round(total_time::numeric, 2) AS total_time, calls, round(mean_time::numeric, 2) AS mean, round((100 * total_time / sum(total_time::numeric) OVER ())::numeric, 2) AS percentage_cpu FROM pg_stat_statements ORDER BY total_time DESC LIMIT 20; short_query | total_time | calls | mean | percentage_cpu ----------------------------------------------------+------------+-------+------+---------------- SELECT name FROM (SELECT pg_catalog.lower(name) A | 11.85 | 7 | 1.69 | 38.63 DROP SCHEMA IF EXISTS performance_check CASCADE; | 4.49 | 4 | 1.12 | 14.64 CREATE OR REPLACE FUNCTION performance_check.pg_st | 2.23 | 4 | 0.56 | 7.27 SELECT pg_catalog.quote_ident(c.relname) FROM pg_c | 1.78 | 2 | 0.89 | 5.81 SELECT a.attname, + | 1.28 | 1 | 1.28 | 4.18 SELECT substring(query, ?, ?) AS short_query,roun | 1.18 | 3 | 0.39 | 3.86 CREATE OR REPLACE FUNCTION performance_check.pg_st | 1.17 | 4 | 0.29 | 3.81 SELECT query FROM pg_stat_activity LIMIT ?; | 1.17 | 2 | 0.59 | 3.82 CREATE SCHEMA performance_check; | 1.01 | 4 | 0.25 | 3.30 SELECT pg_catalog.quote_ident(c.relname) FROM pg_c | 0.92 | 2 | 0.46 | 3.00 SELECT query FROM performance_check.pg_stat_activi | 0.74 | 1 | 0.74 | 2.43 SELECT * FROM pg_stat_statements ORDER BY total_ti | 0.56 | 1 | 0.56 | 1.82 SELECT query FROM pg_stat_statements LIMIT ?; | 0.45 | 4 | 0.11 | 1.45 GRANT EXECUTE ON FUNCTION performance_check.pg_sta | 0.35 | 4 | 0.09 | 1.13 SELECT query FROM performance_check.pg_stat_statem | 0.30 | 1 | 0.30 | 0.96 SELECT query FROM performance_check.pg_stat_activi | 0.22 | 1 | 0.22 | 0.72 GRANT ALL ON SCHEMA performance_check TO schoenig_ | 0.20 | 3 | 0.07 | 0.66 SELECT query FROM performance_check.pg_stat_statem | 0.20 | 1 | 0.20 | 0.67 GRANT EXECUTE ON FUNCTION performance_check.pg_sta | 0.19 | 4 | 0.05 | 0.62 SELECT query FROM performance_check.pg_stat_statem | 0.17 | 1 | 0.17 | 0.56 (20 rows)
The last column is particularly important because it displays the percentage of total time consumed by each individual query. You can use it to determine whether a particular statement is true or false in terms of performance as a whole.
About Enteros
Enteros offers a patented database performance management SaaS platform. It finds the root causes of complex database scalability and performance problems that affect business across a growing number of cloud, RDBMS, NoSQL, and machine learning database platforms.
The views expressed on this blog are those of the author and do not necessarily reflect the opinions of Enteros Inc. This blog may contain links to the content of third-party sites. By providing such links, Enteros Inc. does not adopt, guarantee, approve, or endorse the information, views, or products available on such sites.
Are you interested in writing for Enteros’ Blog? Please send us a pitch!
RELATED POSTS
Enteros and Cloud FinOps: Driving Database Performance and Observability in the Financial Sector
- 19 February 2025
- Database Performance Management
In the fast-evolving world of finance, where banking and insurance sectors rely on massive data streams for real-time decisions, efficient anomaly man…
Optimizing Cost Estimation in the Tech Sector: How Enteros Leverages Logical Models and Cloud FinOps for Smarter Database Management
In the fast-evolving world of finance, where banking and insurance sectors rely on massive data streams for real-time decisions, efficient anomaly man…
Enteros and Cloud FinOps: Transforming Database Performance and Observability in the Real Estate Sector
- 18 February 2025
- Database Performance Management
In the fast-evolving world of finance, where banking and insurance sectors rely on massive data streams for real-time decisions, efficient anomaly man…
Optimizing Database Performance for EdTech Firms: How Enteros Enhances RevOps with Advanced Database Performance Software
In the fast-evolving world of finance, where banking and insurance sectors rely on massive data streams for real-time decisions, efficient anomaly man…