The operation of the PostgreSQL query optimizer

Home > Enteros’ Blog – Thoughts on Database Technology, Machine / Deep learning, and a Generative AI > Database Performance Management > The operation of the PostgreSQL query optimizer

Preamble

PostgreSQL uses a cost-based query optimizer, just like any high-end relational database, to try to make your SQL queries as effective and quick to execute as possible. We have made the decision to provide users with some insight into what actually occurs behind the scenes because the operation of the optimizer itself continues to be a mystery to many people.

So let’s take a tour of the PostgreSQL optimizer and learn about some of the key strategies it employs to accelerate queries. Please take note that the methods listed here are by no means exhaustive. There is a lot more going on, but it makes sense to start with the fundamentals in order to fully comprehend the procedure.

PostgreSQL constant folding

Constant folding is one of the easier things to describe. The procedure is crucial, though. Watch what transpires as this process continues:

Demonstration: SELECT * FROM generate_series(1, 10) AS x WHERE x = 7 + 1; x --- 8 (1 row) Explaination: SELECT * FROM generate_series(1, 10) AS x WHERE x = 7 + 1; QUERY PLAN: ---------------------------------------------------------------------------------------------------------------

What you can see here is that we add a filter to the query: x = 7 + 1. What the system does is to “fold” the constant and instead do “x = 8”. Why is that important? In case “x” is indexed (assuming it is a table), we can easily look up 8 in the index.

It’s important to note that:

 demo=# explain SELECT * FROM generate_series(1, 10) AS x WHERE x - 1 = 7 ; QUERY PLAN ---------------------------------------------------------------------- Function Scan on generate_series x (cost=0.00..0.15 rows=1 width=4) Filter: ((x - 1) = 7) (2 rows)

In this instance, PostgreSQL does not convert the expression to “x = 8,” as can be seen. Therefore, you should make an effort to position the filter so that it is on the right side and not the column you might want to index.

PostgreSQL query optimizer: Function inlining

The concept of function inlining is another crucial tactic. To the greatest extent possible, function calls should be minimized in order to speed up the query. Let’s build a function that computes a logarithm:

 demo=# CREATE OR REPLACE FUNCTION ld(int) RETURNS numeric AS $$ SELECT log(2, $1); $$ LANGUAGE 'sql'; CREATE FUNCTION demo=# SELECT ld(1024); ld --------------------- 10.0000000000000000 (1 row)

This appears to be correct, so let’s see what happens in a real situation: 210 = 1024.

 demo=# explain SELECT * FROM generate_series(1, 10) AS x WHERE ld(x) = 1000; QUERY PLAN ---------------------------------------------------------------------- Function Scan on generate_series x (cost=0.00..0.18 rows=1 width=4) Filter: (log('2'::numeric, (x)::numeric) = '1000'::numeric) (2 rows)

The WHERE clause in this sentence contains the interesting information. The ld function has been replaced with the underlying log function. Keep in mind that only SQL functions make this possible. The use of PL/pgSQL and other stored procedure languages determines whether or not these things are possible because these languages are opaque to the optimizer.

Here’s an illustration:

 demo=# CREATE OR REPLACE FUNCTION pl_ld(int) RETURNS numeric AS $$ BEGIN RETURN log(2, $1); END; $$ LANGUAGE 'plpgsql'; CREATE FUNCTION demo=# explain SELECT * FROM generate_series(1, 10) AS x WHERE pl_ld(x) = 1000; QUERY PLAN ---------------------------------------------------------------------- Function Scan on generate_series x (cost=0.00..2.63 rows=1 width=4) Filter: (pl_ld(x) = '1000'::numeric) (2 rows)

While the code is basically the same in this case, inlining is not possible due to the programming language.

Function stability: VOLATILE vs. STABLE vs. IMMUTABLE

The idea of function stability is one that is frequently ignored. When creating a function, it makes a difference if a function is created as VOLATILE (default), STABLE, or as IMMUTABLE. If you use indexes, it might even make a significant difference. Let’s generate some test data and analyze these discrepancies:

VOLATILEThat is, a function is not guaranteed to return the same result given the same input parameters within the same transaction.In other words, the function must be executed for each row by the PostgreSQL optimizer, as shown in the example below:

 demo=# CREATE TABLE t_date AS SELECT * FROM generate_series('1900-01-01'::timestamptz, '2021-12-31'::timestamptz, '1 minute') AS x; SELECT 64164961 demo=# CREATE INDEX idx_date ON t_date (x); CREATE INDEX

Since January 1900, we have generated a list of 64 million entries, each containing one row, resulting in 64 million entries.

Let’s run the query using a VOLATILE function:

 demo=# explain analyze SELECT * FROM t_date WHERE x = clock_timestamp(); QUERY PLAN ------------------------------------------------------------------- Gather (cost=1000.00..685947.45 rows=1 width=8) (actual time=2656.961..2658.547 rows=0 loops=1) Workers Planned: 2 Workers Launched: 2 -> Parallel Seq Scan on t_date (cost=0.00..684947.35 rows=1 width=8) (actual time=2653.009..2653.009 rows=0 loops=3) Filter: (x = clock_timestamp()) Rows Removed by Filter: 21388320 Planning Time: 0.056 ms Execution Time: 2658.562 ms (8 rows)

In this case, the query uses a lot of resources and takes 2.6 seconds, which is a long time. The reason is that clock_timestamp() is VOLATILE. What if we try to do the same thing using a STABLE function?

 demo=# explain analyze SELECT * FROM t_date WHERE x = now(); QUERY PLAN --------------------------------------------------------------- Index Only Scan using idx_date on t_date (cost=0.57..4.59 rows=1 width=8) (actual time=0.014..0.015 rows=0 loops=1) Index Cond: (x = now()) Heap Fetches: 0 Planning Time: 0.060 ms Execution Time: 0.026 ms (5 rows)

Since PostgreSQL can now use the index and convert the query into a constant, the query is thousands of times faster. Here is more information on function stability in PostgreSQL if you’re interested.

Equality constraints

The idea of equality constraints is the next optimization on our list. Here, PostgreSQL seeks to infer implicit information about the query. Let’s first generate some sample data:

 demo=# CREATE TABLE t_demo AS SELECT x, x AS y FROM generate_series(1, 1000) AS x; SELECT 1000 demo=# SELECT * FROM t_demo LIMIT 5; x | y ---+--- 1 | 1 2 | 2 3 | 3 4 | 4 5 | 5 (5 rows)

There are 1000 rows in this data set, so let’s run a quick query:

 demo=# explain SELECT * FROM t_demo WHERE x = y AND y = 4; QUERY PLAN ------------------------------------------------------- Seq Scan on t_demo (cost=0.00..20.00 rows=1 width=8) Filter: ((x = 4) AND (y = 4)) (2 rows)

Once more, the execution plan holds the magic. Here, PostgreSQL has determined that x and y just so happen to be four. That opens the door for important optimizations:

 demo=# CREATE INDEX idx_x ON t_demo (x); CREATE INDEX demo=# explain SELECT * FROM t_demo WHERE x = y AND y = 4; QUERY PLAN -------------------------------------------------------------------- Index Scan using idx_x on t_demo (cost=0.28..8.29 rows=1 width=8) Index Cond: (x = 4) Filter: (y = 4) (3 rows)

The index we just created would be completely useless without this optimization. PostgreSQL will figure out automatically that we can use an index here to optimize the query.

Indexes are important!

View inlining and subselect flattening

There is no way to ignore views and subselect handling when discussing the PostgreSQL optimizer and query optimization. Let’s build a basic view and run a query on it.

 demo=# CREATE VIEW v1 AS SELECT * FROM generate_series(1, 10) AS x; CREATE VIEW demo=# explain SELECT * FROM v1 ORDER BY x DESC; QUERY PLAN ---------------------------------------------- Sort (cost=0.27..0.29 rows=10 width=4) Sort Key: x.x DESC -> Function Scan on generate_series x (cost=0.00..0.10 rows=10 width=4) (3 rows)

The view is not present in the execution plan, as we can see. The view was flattened out by PostgreSQL and inlined as a subselect, which is why. How does that function?

 SELECT * FROM v1 ORDER BY x DESC;

This inquiry develops into…

 SELECT * FROM (SELECT * FROM generate_series(1, 10) AS x ) AS v1 ORDER BY x DESC;

The subselect is then flattened out, leaving us with…

 SELECT * FROM generate_series(1, 10) AS x ORDER BY x DESC;

The PostgreSQL query optimizer has a setting that regulates this behavior:

 demo=# SHOW from_collapse_limit; from_collapse_limit --------------------- 8 (1 row)

The purpose of this parameter is to flatten out only the first 8 subselects in the FROM clause. There will be no flattening out of the subselects if there are more than 8 of them. This is usually not a problem in real-world use cases. Only if the SQL statements being used are extremely complex can it become a problem. Visit our blog for more details on joins and join_collapse_limit.

Remember that inlining isn’t always an option. That is understood by developers.

Optimizing joins in PostgreSQL

Joins are used in almost all queries, which is why they are so important to performance. We’ll now concentrate on some methods applicable to joints in general.

Optimizing join orders

The PostgreSQL optimizer’s handling of join orders is the next critical factor on our list. Joins in a PostgreSQL database aren’t always performed in the order that the end user specifies; instead, the query optimizer tries to determine as many join options as it can.
Let’s make some test data to see how this functions.

 demo=# CREATE TABLE a AS SELECT x, x % 10 AS y FROM generate_series(1, 100000) AS x ORDER BY random(); SELECT 100000 demo=# CREATE TABLE b AS SELECT x, x % 10 AS y FROM generate_series(1, 1000000) AS x ORDER BY random(); SELECT 1000000 demo=# CREATE TABLE c AS SELECT x, x % 10 AS y FROM generate_series(1, 10000000) AS x ORDER BY random(); SELECT 10000000

A few indexes are created in the subsequent steps:

 demo=# CREATE INDEX a_x ON a(x); CREATE INDEX demo=# CREATE INDEX b_x ON b(x); CREATE INDEX demo=# CREATE INDEX c_x ON c(x); CREATE INDEX demo=# ANALYZE; ANALYZE

Now that we have three tables, let’s query them to see what happens.

 demo=# explain SELECT * FROM a, b, c WHERE c.x = a.x AND a.x = b.x AND a.x = 10; QUERY PLAN -------------------------------------------------------------------- Nested Loop (cost=1.15..25.23 rows=1 width=24) -> Nested Loop (cost=0.72..16.76 rows=1 width=16) -> Index Scan using a_x on a (cost=0.29..8.31 rows=1 width=8) Index Cond: (x = 10) -> Index Scan using b_x on b (cost=0.42..8.44 rows=1 width=8) Index Cond: (x = 10) -> Index Scan using c_x on c (cost=0.43..8.45 rows=1 width=8) Index Cond: (x = 10) (8 rows)

Note that the query joins “c and a” and then “a and b”. Let’s take a closer look at the strategy, though. PostgreSQL starts with index scans on a and b. The result is then joined with c. What’s also important here is that three indices are used. This happens because of those equality constraints we discussed before.

PostgreSQL implicit vs. explicit joins

It is frequently discussed whether to use explicit joins or implicit joins. Both variations are essentially the same. Let’s examine two queries to better understand what PostgreSQL is actually used for:

 SELECT * FROM a, b WHERE a.id = b.id; vs. SELECT * FROM a JOIN b ON a.id = b.id;

Due to the similarity of the two queries, the planner will treat them in the same way it does the majority of frequently used queries. Keep in mind that you can use explicit joins both with and without parenthesis. However, there is one factor that is crucial in this situation:

 demo=# SHOW join_collapse_limit; join_collapse_limit --------------------- 8 (1 row)

The join_collapse_limit controls how many explicit joins are planned implicitly. In other words, up to the number of joins controlled by this parameter, an implicit join is identical to an explicit join.

To keep things simple, we can assume that it doesn’t matter for the majority of customers and for 95% of all queries.

Establish the joining strategy.

A variety of join methods are available in PostgreSQL.

When you are optimizing a query, it is important to establish a joining strategy. This will determine which tables you will use to join together your data, and it will also determine the order in which the tables will be joined. There are several different types of joins that you can use, and each has its own advantages and disadvantages.

These techniques include nested loops, merge joins, hash joins, and many others. Some of this information has already been covered in earlier posts. You can read more about PostgreSQL join techniques here.

Optimizing outer joins (LEFT JOIN, etc.)

Optimizing outer joins (LEFT JOINRIGHT JOIN, etc.) is an important topic. Usually, the planner has fewer options here than in the case of inner joins. The following optimizations are possible:

 (A leftjoin B on (Pab)) innerjoin C on (Pac) = (A innerjoin C on (Pac)) leftjoin B on (Pab)

where Pac is a predicate referencing A and C, etc (in this case, clearly
Pac cannot reference B, or the transformation is nonsensical).

 (A leftjoin B on (Pab)) leftjoin C on (Pac) = (A leftjoin C on (Pac)) leftjoin B on (Pab) (A leftjoin B on (Pab)) leftjoin C on (Pbc) = A leftjoin (B leftjoin C on (Pbc)) on (Pab)

While this theoretical explanation is correct, most people will have no clue what it means in real life. Therefore, I have compiled a real-world example showing how PostgreSQL actually reorders a real join:

 demo=# explain SELECT * FROM generate_series(1, 10) AS x LEFT JOIN generate_series(1, 100) AS y ON (x = y) JOIN generate_series(1, 10000) AS z ON (y = z) ; QUERY PLAN -------------------------------------------------------------------- Hash Join (cost=1.83..144.33 rows=500 width=12) Hash Cond: (z.z = x.x) -> Function Scan on generate_series z (cost=0.00..100.00 rows=10000 width=4) -> Hash (cost=1.71..1.71 rows=10 width=8) -> Hash Join (cost=0.23..1.71 rows=10 width=8) Hash Cond: (y.y = x.x) -> Function Scan on generate_series y (cost=0.00..1.00 rows=100 width=4) -> Hash (cost=0.10..0.10 rows=10 width=4) -> Function Scan on generate_series x (cost=0.00..0.10 rows=10 width=4) (9 rows)

Here, we can observe that the PostgreSQL optimizer chooses to join x with y before joining x with z. In other words, the SQL statement’s join order was simply followed by the PostgreSQL optimizer.

But what happens if we decide to tweak the parameters a bit?

 demo=# explain SELECT * FROM generate_series(1, 10) AS x LEFT JOIN generate_series(1, 100000) AS y ON (x = y) JOIN generate_series(1, 100) AS z ON (y = z) ; QUERY PLAN -------------------------------------------------------------------- Hash Join (cost=1.83..1426.83 rows=5000 width=12) Hash Cond: (y.y = x.x) -> Function Scan on generate_series y (cost=0.00..1000.00 rows=100000 width=4) -> Hash (cost=1.71..1.71 rows=10 width=8) -> Hash Join (cost=0.23..1.71 rows=10 width=8) Hash Cond: (z.z = x.x) -> Function Scan on generate_series z (cost=0.00..1.00 rows=100 width=4) -> Hash (cost=0.10..0.10 rows=10 width=4) -> Function Scan on generate_series x (cost=0.00..0.10 rows=10 width=4) (9 rows)

This is the same query, but with slightly altered parameters. The difference is that PostgreSQL again starts with x but then joins z first, before adding y.

It should be noted that this optimization occurs automatically. The existence of optimizer support functions, which were recently added to PostgreSQL, is one reason the optimizer is able to make this choice. Support functions give the planner a chance to determine how many rows are returned from which part, which is why the reordering is successful. Support functions are irrelevant if you use tables instead of setting returning functions.

Automatic join pruning in PostgreSQL

Not all joins included in a query are actually carried out by PostgreSQL. The optimizer is able to eliminate unnecessary joins quite effectively because it is familiar with the idea of join pruning. The main query is: Under what circumstances is that feasible, and how can we ascertain what’s taking place?

The following listing illustrates how some appropriate sample data can be generated:

 demo=# CREATE TABLE t_left AS SELECT * FROM generate_series(1, 1000) AS id; SELECT 1000 demo=# CREATE UNIQUE INDEX idx_left ON t_left (id); CREATE INDEX demo=# CREATE TABLE t_right AS SELECT * FROM generate_series(1, 100) AS id; SELECT 100 demo=# CREATE UNIQUE INDEX idx_right ON t_right (id); CREATE INDEX

To ensure that both sides actually have primary keys or some other kind of unique constraint in this situation, we must:

 demo=# \d t_left Table "public.t_left" Column | Type | Collation | Nullable | Default --------+---------+-----------+----------+--------- id | integer | | | Indexes: "idx_left" UNIQUE, btree (id) demo=# \d t_right Table "public.t_right" Column | Type | Collation | Nullable | Default --------+---------+-----------+----------+--------- id | integer | | | Indexes: "idx_right" UNIQUE, btree (id)

We’ll look at two different SQL statements to demonstrate how the PostgreSQL query planner handles join pruning:

 demo=# explain SELECT * FROM t_left AS a LEFT JOIN t_right AS b ON (a.id = b.id); QUERY PLAN -------------------------------------------------------------------- Hash Left Join (cost=3.25..20.89 rows=1000 width=8) Hash Cond: (a.id = b.id) -> Seq Scan on t_left a (cost=0.00..15.00 rows=1000 width=4) -> Hash (cost=2.00..2.00 rows=100 width=4) -> Seq Scan on t_right b (cost=0.00..2.00 rows=100 width=4) (5 rows)

The join needs to be done in this situation. You can see that PostgreSQL chose a hash join. The following illustration only includes a minor modification of the problem:

 demo=# explain SELECT a.* FROM t_left AS a LEFT JOIN t_right AS b ON (a.id = b.id); QUERY PLAN ------------------------------------------------------------ Seq Scan on t_left a (cost=0.00..15.00 rows=1000 width=4) (1 row)

If the right side is unique and we do not read data from it, join pruning may occur. Pruning is only possible if the right side is unique, because if the right side is not unique, the join may actually increase the number of rows returned.

Let’s try it out.

 demo=# DROP INDEX idx_right ; DROP INDEX demo=# explain SELECT a.* FROM t_left AS a LEFT JOIN t_right AS b ON (a.id = b.id); QUERY PLAN -------------------------------------------------------------------- Hash Left Join (cost=3.25..23.00 rows=1000 width=4) Hash Cond: (a.id = b.id) -> Seq Scan on t_left a (cost=0.00..15.00 rows=1000 width=4) -> Hash (cost=2.00..2.00 rows=100 width=4) -> Seq Scan on t_right b (cost=0.00..2.00 rows=100 width=4) (5 rows)

While it is unquestionably a good thing that the PostgreSQL optimizer has joined pruning, you must be aware that the planner is essentially fixing something that shouldn’t exist in the first place. Don’t add needless joins; instead, write efficient queries in the first place.

EXISTS and anti-joins

There is also a very common thing that you will see throughout the SQL EXISTS code. Here’s an example:

 demo=# explain SELECT * FROM a WHERE NOT EXISTS (SELECT 1 FROM b WHERE a.x = b.x AND b.x = 42); QUERY PLAN ------------------------------------------------------------------- Hash Anti Join (cost=4.46..2709.95 rows=100000 width=8) Hash Cond: (a.x = b.x) -> Seq Scan on a (cost=0.00..1443.00 rows=100000 width=8) -> Hash (cost=4.44..4.44 rows=1 width=4) -> Index Only Scan using b_x on b (cost=0.42..4.44 rows=1 width=4) Index Cond: (x = 42) (6 rows)

Consider the alternatives before you dismiss this as a minor issue: PostgreSQL here creates a “hash anti-join.” Compared to a nested loop, this is much more efficient. In other words, a join is used in place of the nested loop, which can result in significant performance improvements.

Making use of sorted output

Sorting is an important part of every database because it is needed to handle many different types of queries and optimize different workloads. One of the most important improvements in this area is how PostgreSQL can cleverly use indexes to optimize ORDER BY.

Let’s take a look at it:

 demo=# CREATE TABLE t_sample AS SELECT * FROM generate_series(1, 1000000) AS id ORDER BY random(); SELECT 1000000 demo=# VACUUM ANALYZE t_sample ; VACUUM

One million entries were produced by the listing, which were then stored on disk in random order. Then, VACUUM was called to make sure that all problems with the PostgreSQL hint bit have been fixed before the test is run. Consider reading our post about hint bits in PostgreSQL if you want to learn what they are and how they work.

 demo=# explain analyze SELECT * FROM t_sample ORDER BY id DESC LIMIT 100; QUERY PLAN ------------------------------------------------------------------- Limit (cost=25516.40..25528.07 rows=100 width=4) (actual time=84.806..86.252 rows=100 loops=1) -> Gather Merge (cost=25516.40..122745.49 rows=833334 width=4) (actual time=84.805..86.232 rows=100 loops=1) Workers Planned: 2 Workers Launched: 2 -> Sort (cost=24516.38..25558.05 rows=416667 width=4) (actual time=65.576..65.586 rows=82 loops=3) Sort Key: id DESC Sort Method: top-N heapsort Memory: 33kB Worker 0: Sort Method: top-N heapsort Memory: 32kB Worker 1: Sort Method: top-N heapsort Memory: 33kB -> Parallel Seq Scan on t_sample (cost=0.00..8591.67 rows=416667 width=4) (actual time=0.428..33.305 rows=333333 loops=3) Planning Time: 0.078 ms Execution Time: <strong>86.286 ms</strong> (12 rows)

But what happens if we create an index? PostgreSQL needs more than one core to process the query in 86 milliseconds, which is a lot.

 demo=# CREATE INDEX idx_sample ON t_sample (id); CREATE INDEX demo=# explain analyze SELECT * FROM t_sample ORDER BY id DESC LIMIT 100; QUERY PLAN ------------------------------------------------------------------- Limit (cost=0.42..3.02 rows=100 width=4) (actual time=0.071..0.125 rows=100 loops=1) -> Index Only Scan Backward using idx_sample on t_sample (cost=0.42..25980.42 rows=1000000 width=4) (actual time=0.070..0.113 rows=100 loops=1) Heap Fetches: 0 Planning Time: 0.183 ms Execution Time: <strong>0.142 ms</strong> (5 rows)

The fact that the query ran in a fraction of a millisecond is the most obvious thing to notice. What is most important here is that there is no sort step. There is no need to sort the data again because PostgreSQL is aware that the index returns the data sorted (by id). Until enough rows are found, PostgreSQL consults the index and can just feed the client the data as it is. Because we are only looking for columns that actually exist in the index in this particular case, an index-only scan is even an option.

Performance and user experience both depend on minimizing the amount of data that needs to be sorted repeatedly. Regarding min and max, here’s one of those occasions that many people might find interesting:

 demo=# explain SELECT min(id), max(id) FROM t_sample; QUERY PLAN ------------------------------------------------------------------- Result (cost=0.91..0.92 rows=1 width=8) InitPlan 1 (returns $0) -> Limit (cost=0.42..0.45 rows=1 width=4) -> Index Only Scan using idx_sample on t_sample (cost=0.42..28480.42 rows=1000000 width=4) Index Cond: (id IS NOT NULL) InitPlan 2 (returns $1) -> Limit (cost=0.42..0.45 rows=1 width=4) -> Index Only Scan Backward using idx_sample on t_sample t_sample_1 (cost=0.42..28480.42 rows=1000000 width=4) Index Cond: (id IS NOT NULL) (9 rows)

The first value in a sorted list that is not NULL is the minimal value. The last value that is not NULL in a list of sorted values is the maximum value. PostgreSQL can take this into account and substitute a subplan that merely consults the index for the conventional method of processing aggregates. Calling “min” is somewhat comparable to…

 demo=# SELECT id FROM t_sample WHERE id IS NOT NULL ORDER BY id LIMIT 1; id ---- 1 (1 row) demo=# explain SELECT id FROM t_sample WHERE id IS NOT NULL ORDER BY id LIMIT 1; QUERY PLAN ------------------------------------------------------------------- Limit (cost=0.42..0.45 rows=1 width=4) -> Index Only Scan using idx_sample on t_sample (cost=0.42..28480.42 rows=1000000 width=4) Index Cond: (id IS NOT NULL) (3 rows)

Fortunately, PostgreSQL takes care of this for you and perfectly optimizes the query.

Partition pruning and constraint exclusion at work

Partitioning is one of the favorite features of many PostgreSQL users. It gives you the option to make tables smaller and divide data into more manageable (though not always faster to query) smaller chunks. However, keep in mind that SOME queries may be faster; however, increased planning time can backfire; this is not an isolated case. We have seen partitioning decrease speed countless times in cases where it wasn’t suitable.

Logically, only the partitions that can actually contain some of the data should be touched by the PostgreSQL optimizer when handling partitioning. An illustration is as follows:

 demo=# CREATE TABLE mynumbers (id int) PARTITION BY RANGE (id); CREATE TABLE demo=# CREATE TABLE negatives PARTITION OF mynumbers FOR VALUES FROM (MINVALUE) TO (0); CREATE TABLE demo=# CREATE TABLE positives PARTITION OF mynumbers FOR VALUES FROM (1) TO (MAXVALUE); CREATE TABLE

A table with two partitions has been created in this instance. If we are only considering positive values, we need to determine what the planner will do:

 demo=# explain SELECT * FROM mynumbers WHERE id > 1000; QUERY PLAN ------------------------------------------ Seq Scan on positives mynumbers (cost=0.00..41.88 rows=850 width=4) Filter: (id > 1000) (2 rows)

Since the data cannot be in one of the partitions, the PostgreSQL optimizer correctly determined that it should be removed from the execution plan. The following SQL query’s execution plan illustrates how all partitions are correctly queried even if we wanted to query all values below 1000:

 demo=# explain SELECT * FROM mynumbers WHERE id < 1000; QUERY PLAN -------------------------------------------------------- Append (cost=0.00..92.25 rows=1700 width=4) -> Seq Scan on negatives mynumbers_1 (cost=0.00..41.88 rows=850 width=4) Filter: (id < 1000) > Seq Scan on positives mynumbers_2 (cost=0.00..41.88 rows=850 width=4) Filter: (id < 1000) (5 rows)

Conclusion…

What PostgreSQL can do for you goes far beyond the optimizations you have already seen on this page. It’s a good place to start to get a sense of what’s happening behind the scenes. Most of the time, database optimizers operate in a mysterious manner, and few people are aware of the actual optimizations that take place. This page aims to provide some insight into the current mathematical transformations. Visit our website if you want to learn more about the PostgreSQL pricing structure.

About Enteros

Enteros offers a patented database performance management SaaS platform. It finds the root causes of complex database scalability and performance problems that affect business across a growing number of cloud, RDBMS, NoSQL, and machine learning database platforms.

The views expressed on this blog are those of the author and do not necessarily reflect the opinions of Enteros Inc. This blog may contain links to the content of third-party sites. By providing such links, Enteros Inc. does not adopt, guarantee, approve, or endorse the information, views, or products available on such sites.

Are you interested in writing for Enteros’ Blog? Please send us a pitch!

Optimizing Database Performance and Cloud FinOps in the Insurance Sector with Enteros and AWS CloudFormation

3 April 2025
Database Performance Management

In the fast-evolving world of finance, where banking and insurance sectors rely on massive data streams for real-time decisions, efficient anomaly man…

Driving Growth and RevOps Efficiency in the Technology Sector with Enteros

Database Performance Management

In the fast-evolving world of finance, where banking and insurance sectors rely on massive data streams for real-time decisions, efficient anomaly man…

Enhancing Healthcare Efficiency with Enteros: AIOps-Driven Database Performance Monitoring

2 April 2025
Database Performance Management

In the fast-evolving world of finance, where banking and insurance sectors rely on massive data streams for real-time decisions, efficient anomaly man…

Optimizing Cost Estimation and RevOps Efficiency with Enteros: Enhancing Financial Stability and Balance Sheet Management

Database Performance Management

In the fast-evolving world of finance, where banking and insurance sectors rely on massive data streams for real-time decisions, efficient anomaly man…

Preamble

PostgreSQL constant folding

PostgreSQL query optimizer: Function inlining

Function stability: VOLATILE vs. STABLE vs. IMMUTABLE

Equality constraints

View inlining and subselect flattening

Optimizing joins in PostgreSQL

Optimizing join orders

PostgreSQL implicit vs. explicit joins

Establish the joining strategy.

Optimizing outer joins (LEFT JOIN, etc.)

Automatic join pruning in PostgreSQL

EXISTS and anti-joins

Making use of sorted output

Partition pruning and constraint exclusion at work

Conclusion…

About Enteros

RELATED POSTS

Optimizing Database Performance and Cloud FinOps in the Insurance Sector with Enteros and AWS CloudFormation

Driving Growth and RevOps Efficiency in the Technology Sector with Enteros

Enhancing Healthcare Efficiency with Enteros: AIOps-Driven Database Performance Monitoring

Optimizing Cost Estimation and RevOps Efficiency with Enteros: Enhancing Financial Stability and Balance Sheet Management