Preamble
If you are an SQL developer, you probably already know that joins are the fundamental building blocks of the language. In addition to lateral and left joins, full joins, natural joins, self joins, and inner joins are just a few of the various types of joins. But the difference between implicit and explicit joins is one of the most important ones. Flame wars have broken out over this topic over the years. However, not many people are aware of the true situation. Therefore, I hope that my post will help to clarify the situation.
Using implicit joins
It is necessary to create some tables that we can later use to perform our joins before diving into concrete examples:
test=# CREATE TABLE a (id int, aid int); CREATE TABLE test=# CREATE TABLE b (id int, bid int); CREATE TABLE
The following step involves adding some rows to those tables:
test=# INSERT INTO a VALUES (1, 1), (2, 2), (3, 3); INSERT 0 3 test=# INSERT INTO b VALUES (2, 2), (3, 3), (4, 4); INSERT 0 3
The simplest way to join data is with an implicit join. The example below illustrates an implicit join:
test=# SELECT * FROM a, b WHERE a.id = b.id; id | aid | id | bid ----+-----+----+----- 2 | 2 | 2 | 2 3 | 3 | 3 | 3 (2 rows)
The FROM clause in this instance lists all tables, and the WHERE clause connects them all. In my experience, the most typical method of joining two tables is an implicit join. However, because I frequently use implicit joins in my daily work, my observation may be significantly biased.
Using explicit joins
The explicit join is demonstrated in the example below. Some people like explicit join syntax better than implicit join syntax because it is easier to read or for some other reason:
test=# SELECT * FROM a JOIN b ON (aid = bid); id | aid | id | bid ----+-----+----+----- 2 | 2 | 2 | 2 3 | 3 | 3 | 3 (2 rows)
Tables are connected directly in this instance using an ON-clause. The only conditions we want to use to join those tables together are contained in the ON-clause.
On-clauses and USING-clauses are two categories of syntax constructs that explicit joins support. If you want to connect different columns together, an ON-clause is ideal. An using clause is distinct from: Although it has the same meaning, it is only applicable if the columns on both sides are named similarly. If not, a syntax error is displayed:
test=# SELECT * FROM a JOIN b USING (aid = bid); ERROR: syntax error at or near "=" LINE 1: SELECT * FROM a JOIN b USING (aid = bid);
Using is frequently used to connect keys, as shown in the following example:
test=# SELECT * FROM a JOIN b USING (id); id | aid | bid ----+-----+----- 2 | 2 | 2 3 | 3 | 3 (2 rows)
It is possible to use USING in this situation because both of my tables have a column called “id” in them. Remember that USING mostly just serves as syntactic sugar; there is no deeper meaning intended.
An explicit join is frequently used to join more than two tables. I’ve added one more table to demonstrate how that functions:
test=# CREATE TABLE c (id int, cid int); CREATE TABLE
Add some information to this table now:
test=# INSERT INTO c VALUES (3, 3), (4, 4), (5, 5); INSERT 0 2
Simply add more JOIN and USING clauses (respectively ON clauses) to the statement to perform an explicit join. Here’s an illustration:
test=# SELECT * FROM a INNER JOIN b USING (id) JOIN c USING (id); id | aid | bid | cid ----+-----+-----+----- 3 | 3 | 3 | 3 (1 row)
Of course, using an implicit join will accomplish the same thing.
test=# SELECT * FROM a, b, c WHERE a.id = b.id AND b.id = c.id; id | aid | id | bid | id | cid ----+-----+----+-----+----+----- 3 | 3 | 3 | 3 | 3 | 3 (1 row)
But as you can see, there is a slight variation. Verify how many columns the query returned. You’ll see that the implicit join produces more results. Because of how the implicit join handles the column list in this instance, the “id” column will appear more frequently.
The column list is obviously an annoying detail because it is better to explicitly list every column in a real application. But it’s important to remember this tiny detail.
join_collapse_limit: What the optimizer does
People frequently ask me if there is a performance difference between implicit and explicit joins when I am on the road working as a PostgreSQL consultant or PostgreSQL support guy. The response is “Typically not.” Let’s examine the following assertion:
test=# explain SELECT * FROM a INNER JOIN b USING (id); QUERY PLAN ----------------------------------------------------------------- Merge Join (cost=317.01..711.38 rows=25538 width=12) Merge Cond: (a.id = b.id) -> Sort (cost=158.51..164.16 rows=2260 width=8) Sort Key: a.id -> Seq Scan on a (cost=0.00..32.60 rows=2260 width=8) -> Sort (cost=158.51..164.16 rows=2260 width=8) Sort Key: b.id -> Seq Scan on b (cost=0.00..32.60 rows=2260 width=8) (8 rows)
The implicit plan shown below is exactly what the explicit join produces:
test=# explain SELECT * FROM a, b WHERE a.id = b.id; QUERY PLAN ----------------------------------------------------------------- Merge Join (cost=317.01..711.38 rows=25538 width=16) Merge Cond: (a.id = b.id) -> Sort (cost=158.51..164.16 rows=2260 width=8) Sort Key: a.id -> Seq Scan on a (cost=0.00..32.60 rows=2260 width=8) -> Sort (cost=158.51..164.16 rows=2260 width=8) Sort Key: b.id -> Seq Scan on b (cost=0.00..32.60 rows=2260 width=8) (8 rows)
An implicit join therefore performs the same function as an explicit join in the vast majority of situations.
However, this is no always the case. A variable called join_collapse_limit exists in PostgreSQL:
test=# SHOW join_collapse_limit; join_collapse_limit --------------------- 8 (1 row)
Why does that matter? Regardless of the join order you used within the query, the planner will always plan the first few joins automatically if you prefer explicit over implicit joins. Simply put, the optimizer will reorder joins based on which order seems most promising. However, if you continue to add joins, those that exceed join_collapse_limit will be planned according to how you have included them in the query. We are already discussing fairly complex queries, as you can probably guess. In most cases, joining nine or more tables is quite a bit and goes beyond the usual operation.
There is another parameter called from_collapse_limit that does the same thing for implicit joins and has the same default value. If a query lists more than from_collapse_limit tables in its FROM clause, the ones exceeding the limit will not be reordered, but joined in the order they appear in the statement.
Which type of join you prefer is irrelevant for the typical, “normal” query in terms of performance or execution plans.
About Enteros
Enteros offers a patented database performance management SaaS platform. It finds the root causes of complex database scalability and performance problems that affect business across a growing number of cloud, RDBMS, NoSQL, and machine learning database platforms.
The views expressed on this blog are those of the author and do not necessarily reflect the opinions of Enteros Inc. This blog may contain links to the content of third-party sites. By providing such links, Enteros Inc. does not adopt, guarantee, approve, or endorse the information, views, or products available on such sites.
Are you interested in writing for Enteros’ Blog? Please send us a pitch!
RELATED POSTS
Optimizing Database Performance with Enteros and AWS Resource Groups: A RevOps Approach to Streamlined Efficiency
- 13 November 2024
- Database Performance Management
In the fast-evolving world of finance, where banking and insurance sectors rely on massive data streams for real-time decisions, efficient anomaly man…
Enhancing Healthcare Data Integrity: How Enteros, Logical Models, and Database Security Transform Healthcare Operations
In the fast-evolving world of finance, where banking and insurance sectors rely on massive data streams for real-time decisions, efficient anomaly man…
Optimizing Budgeting and Cost Allocation in the Finance Sector with Enteros: A Smarter Approach to Financial Efficiency
- 12 November 2024
- Database Performance Management
In the fast-evolving world of finance, where banking and insurance sectors rely on massive data streams for real-time decisions, efficient anomaly man…
Enteros and Cloud FinOps: Unleashing Big Data Potential for eCommerce Profitability
In the fast-evolving world of finance, where banking and insurance sectors rely on massive data streams for real-time decisions, efficient anomaly man…