Preamble
A question that some people may still have after years of software development is what a NULL value is. What does it mean in reality, and what does it serve? As a general rule, NULL essentially denotes “undefined.” Numerous books claim that NULL denotes “empty,” but I don’t believe that is the best perspective: Your financial situation is clearly defined (i.e., you are broke) if your wallet is empty. However, “undefined” is distinct. In other words, we are unaware of the value. You might still be a millionaire if we don’t know how much money you have. Therefore, I believe that using the word “unknown” rather than the word “empty” to describe NULL in SQL is a much better choice.
NULL values in PostgreSQL: Basic rules
First of all: NULL is a super useful thing in SQL and people should be aware of the details associated with it. The most fundamental rules must be examined before delving further into NULL. An error that many developers commit frequently is demonstrated by the example below:
test=# SELECT 10 = NULL; ?column? ---------- (1 row)
Many people believe that this query’s output is actually “false,” but this is incorrect. The result is NULL. How come? Consider that you have $10 in your left pocket and that you are unaware of the amount of money in your right pocket. Are you carrying the same amount of money in your pockets? We are unsure. We have no way of knowing, but it might very well be the case. Therefore, the response to this query must be NULL.
Try something different, please:
test=# SELECT NULL = NULL; ?column? ---------- (1 row)
The same applies to this inquiry. The outcome must be NULL. Both the amount of cash in your left pocket and the amount in your right pocket are unknown to us. Is it identical? Again, the outcome is unknown and we have no idea.
Use the syntax below to determine whether two values are actually NULL:
test=# SELECT NULL IS NULL; ?column? ---------- t (1 row)
Because “IS” actually verifies that both values are NULL in this instance, the outcome is correct. The following query will therefore return false:
test=# SELECT 10 IS NULL; ?column? ---------- f (1 row)
But NULL is more than just straightforward operations.
row() and NULL handling
It’s possible that some of my readers have already encountered the row() function, which can be used to instantly create a tuple. In this instance, the general guidelines will still be applicable. Think about the following instance:
test=# SELECT row(NULL, NULL) = row(NULL, NULL); ?column? ---------- (1 row)
As expected the result is NULL because all values on both sides are “undefined” and therefore there is no way the output of this query can ever be true.
The fact that a row can be compared to a single NULL value is crucial to understand. In essence, PostgreSQL interprets the entire tuple as NULL:
test=# SELECT row(NULL, NULL) IS NULL; ?column? ---------- t (1 row)
For row(10, NULL), this is not true; in this instance, the query returns false. Only when all fields are NULL is True returned. There is one thing, though, that may come as a shock to some. If you are comparing the results of two “row” functions, the “IS” keyword won’t work:
test=# SELECT row(NULL, NULL) IS row(NULL, NULL); ERROR: syntax error at or near "row" LINE 1: SELECT row(NULL, NULL) IS row(NULL, NULL);
A syntax error will be raised right away by PostgreSQL.
NULL handling in LIMIT clauses
I once observed some individuals using NULL in LIMIT and OFFSET clauses. Even though it’s a little frightening, it’s still an interesting problem to consider. Think about the following instance:
test=# CREATE TABLE demo (id int); CREATE TABLE test=# INSERT INTO demo VALUES (1), (2), (3); INSERT 0 3
There are only 3 rows in the table. What LIMIT NULL does is as follows:
test=# SELECT * FROM demo LIMIT NULL; id ---- 1 2 3 (3 rows)
The entire resultset will be returned, as you can see. Given that PostgreSQL does not really know when to stop returning rows, that makes sense. Consequently, the query is the same as “SELECT * FROM demo”. The “proper” way to limit the results of a query in PostgreSQL is to use FETCH FIRST… ROWS ONLY. This method complies with ANSI SQL. In PostgreSQL 11, “FETCH FIRST ROWS ONLY” will function similarly to LIMIT NULL and accept NULL values as well. Here’s an illustration:
test=# SELECT * FROM demo FETCH FIRST NULL ROWS ONLY; id ---- 1 2 3 (3 rows)
Mind that this was not always the case. In earlier versions of PostgreSQL, a NULL value was not permitted.
NULL handling in ORDER BY clauses
If you want to sort data, NULL values are particularly challenging. In a sorted list, NULL values typically appear at the conclusion. An illustration is shown in the list below:
test=# INSERT INTO demo VALUES (NULL); INSERT 0 1 test=# SELECT * FROM demo ORDER BY id DESC; id ---- 3 2 1 (4 rows)
The key idea is that you might want to order goods according to price. The most expensive, or, more likely, those without a price, As a result, if you are ordering descendingly, it is usually a good idea to place NULL values at the end of the list.
Here is how it works:
test=# SELECT * FROM demo ORDER BY id DESC NULLS LAST; id ---- 3 2 1 (4 rows)
It is more logical to place the NULL values at the end and typically provides a better user experience.
NULL and sum, count, avg, etc.
If you want to run a workload that is more analytical in nature, how NULLs are handled is also crucial. Aggregate functions will typically ignore NULL values, which is a straightforward general rule. Only count(*) is an exception to the rule. Here’s an illustration:
test=# SELECT count(*), count(id) FROM demo; count | count -------+------- 4 | 3 (1 row)
count(*) returns the total number of rows, regardless of their content. count(column) will only count the non-NULL values within a column, which is not the same as counting everything. Let’s look at the following illustration:
test=# SELECT sum(id), avg(id) FROM demo; sum | avg -----+-------------------- 6 | 2.0000000000000000 (1 row)
As I’ve previously stated, the aggregates do not account for NULL values, so the average of those four rows will be 2, not 1.5.
When used in an outer join, the fact that count(*) counts all rows can cause subtle bugs. Think about the following instance:
SELECT name, count(*) FROM person AS a LEFT JOIN house AS b ON a.id = b.person_id GROUP BY name;
Even if the person on the list doesn’t have a house, every count in this scenario will be at least 1. The LEFT JOIN will add NULL values to the right side of the join, so keep that in mind. count(*) will count those NULL values and therefore even the poorest fellow will end up with at least one house. Care should be taken when handling count(*) and outer joins as they frequently serve as alarm signals.
About Enteros
Enteros offers a patented database performance management SaaS platform. It finds the root causes of complex database scalability and performance problems that affect business across a growing number of cloud, RDBMS, NoSQL, and machine learning database platforms.
The views expressed on this blog are those of the author and do not necessarily reflect the opinions of Enteros Inc. This blog may contain links to the content of third-party sites. By providing such links, Enteros Inc. does not adopt, guarantee, approve, or endorse the information, views, or products available on such sites.
Are you interested in writing for Enteros’ Blog? Please send us a pitch!
RELATED POSTS
Revolutionizing Healthcare IT: Leveraging Enteros, FinOps, and DevOps Tools for Superior Database Software Management
- 21 November 2024
- Database Performance Management
In the fast-evolving world of finance, where banking and insurance sectors rely on massive data streams for real-time decisions, efficient anomaly man…
Optimizing Real Estate Operations with Enteros: Harnessing Azure Resource Groups and Advanced Database Software
In the fast-evolving world of finance, where banking and insurance sectors rely on massive data streams for real-time decisions, efficient anomaly man…
Revolutionizing Real Estate: Enhancing Database Performance and Cost Efficiency with Enteros and Cloud FinOps
In the fast-evolving world of finance, where banking and insurance sectors rely on massive data streams for real-time decisions, efficient anomaly man…
Enteros in Education: Leveraging AIOps for Advanced Anomaly Management and Optimized Learning Environments
In the fast-evolving world of finance, where banking and insurance sectors rely on massive data streams for real-time decisions, efficient anomaly man…