Preamble
Recently, I was able to assist a customer in resolving a “mysterious” performance issue. The problem was that the type of query parameter wasn’t chosen well, which led to poor query performance. I decided to write this article after working on this customer’s case and realizing that not everyone understands what a query parameter’s data type means.
What are query parameters?
In SQL, it’s common to see queries that only have constant value differences.
choose the value from the large collection where the id is 42, the id is 4711, and the id is 1001;
It makes sense to consider them as different expressions of the same problem.
SELECT val FROM large WITH id equal to $1;
Here, $1
is a placeholder for an arbitrary value. Such a placeholder is known as a query parameter, also known as a bind variable. You must supply a real parameter value as an argument for the parameter when running the query.
A placeholder offers the following advantages:
- The database can reuse the execution plan multiple times with different parameter values and only parse and optimize the statement once to speed up processing (read this article to learn how PostgreSQL does that).
- Separating the query and parameters eliminates SQL injection, a serious threat to database applications.
Support for query parameters in PostgreSQL
There are two methods for sending an SQL statement using the PostgreSQL frontend/backend protocol:
- The whole SQL statement is sent as a single string using the simple query protocol.
- The SQL statement and its parameters are separated by the extended query protocol:
- the “Prepare” message sends the statement with the placeholders
$1
,$1
etc. - the “Bind” message sends the parameter values
- the “Execute” message executes the query
- the “Prepare” message sends the statement with the placeholders
Keep in mind that query parameters are only applicable to constants. Table, column, and schema names cannot be used as parameters for identifiers. Also, parameters are only supported in SELECT
, INSERT
, UPDATE
and DELETE
.
Prepared statements
You can name a statement when you set it up so that it can be executed with various inputs. Every single database client API supports a prepared statement.
Using Java and JDBC, this would operate as follows:
java.sql.PreparedStatement stmt = conn.prepareStatement("SELECT * FROM large WHERE id = ?"); stmt.setLong(1, 42L); java.sql.ResultSet rs = stmt.executeQuery();
The placeholder is always ?
, and the first argument of the set*
methods specifies the parameter number.
In PostgreSQL, you can also use prepared statements in SQL. PREPARE
creates a prepared statement, and EXECUTE
executes it:
PREPARE stmt(bigint) AS SELECT val FROM large WHERE id = $1; EXECUTE stmt(42);
Query parameter data types and type conversion
You may have noticed that in both of the examples above, the argument had a data type. In Java, it was long, and in SQL, it was bigint. You do not need to define a data type, though. PostgreSQL will then attempt to determine the data type on its own. The data type “unknown” in PostgreSQL is used to represent such an “untyped value.”
For data types other than unknown, PostgreSQL will also take type conversions into account. For instance, PostgreSQL lacks an equality operator to compare bigint and numeric values:
SELECT o.operatorname FROM pg_operator AS o ON JOIN pg_amop AS ao o.oid = ao.amopopr JOIN pg_am AS a ON a.oid = ao.amopmethod WHERE /* an equality operator as used in B-tree indexes */ ao.amopstrategy = 3 AND a.amname = 'btree' AND (o.oprleft, o.oprright) = ('bigint'::regtype, 'numeric'::regtype); oprname ═════════ (0 rows)
But a numeric is the best data type for numbers, so PostgreSQL will automatically change a bigint to a numeric. As a result, a bigint and a numeric can be compared. The documentation contains the precise guidelines that apply to these conversions. Due to PostgreSQL’s capability for function and operator overloading, these regulations are extremely complex.
An example of the performance impact of query parameter types
To demonstrate that, let’s look at this straightforward case:
CREATE TABLE LARGE (VALUES DOUBLE PRECISION NOT NULL, ID BIGINT GENERATED ALWAYS AS IDENTITY PRIMARY KEY); INSERT INTO large (val) SELECT random() FROM generate_series (1, 100000), then vacuum (analyze) large.
Then, we develop a Java program that queries the database using the snippet of code from above. However, we set the parameter in three different ways. We record the server’s execution plan in each instance using the auto_explain extension.
Using integer
as the parameter type
The parameter is set with
stmt.setInt(1, 42);
and the resulting execution plan is
Query Text: SELECT * FROM large WHERE id = $1 Index Scan on large (actual time = 0.013..0.013 rows = 1 loops = 1) Index Cond: (id = 42) Buffers: shared hit=3
This works fine, since setInt
marks the parameter as type integer
, and there is an equality operator to compare integer
and bigint
.
Using unknown
as the parameter type
The parameter is set as type java.sql.Types.OTHER
with
stmt.setObject(1, 42);
and the resulting execution plan is
Query Text: SELECT * FROM large WHERE id = $1 Index Scan using large_pkey on large (...) (actual time=0.005..0.006 rows=1 loops=1) Index Cond: (id = 42) Buffers: shared hit=3
This works just as well, because a parameter of type unknown
is inferred to have the same type as id
, which is bigint
.
Using numeric
as the parameter type
The parameter is set with
stmt.setBigDecimal(1, java.math.BigDecimal.valueOf(42));
and the resulting execution plan is
SELECT * FROM large WHERE id = $1 Seq Scan on large (...) (actual time = 0.012..11.123 rows = 1 loops = 1) Filter: ((id)::number = '42'::number)) Rows Removed by Filter: 99999 Buffers: shared hit=935
Unexpectedly, PostgreSQL used a sequential scan, and the results are substantially worse. The JDBC driver maps java.math, which is why. BigDecimal -> numeric, resulting in a numeric data type for the parameter. Bigint and numeric do not have an equality operator, as we saw previously, thus PostgreSQL converts them both to numeric. The execution schedule shown above makes this very clear. It cannot be utilized in this situation since the main key index is defined on “id” rather than “id::numeric.”
How to avoid problems with query parameter data types
We can get the following conclusion from the foregoing:
- Pick the appropriate data type for the query parameters, ideally. Given that the JDBC driver converts long to bigint in the example above, using setLong would have been the appropriate choice.
- Select an untyped parameter (type unknown) if you’re unsure and let PostgreSQL determine the proper data type.
Sometimes neither of these approaches will work since PostgreSQL makes a bad estimate because there is no data type in your host language that corresponds to the PostgreSQL data type. This example would be a good one:
SELECT id, name FROM person WHERE birthday <= current_timestamp - $1;
We want to supply an interval like “1 year
” or “1-0
”. Java does not have a data type for intervals, so we would have to send an unknown
parameter. However, PostgreSQL infers that we mean the operator
timestamp with time zone - timestamp with time zone
rather than
timestamp with time zone - interval
As a result, we will receive error messages stating that “1 year” is an invalid timestamp. The answer in this situation is to explicitly type cast the statement.
SELECT id, name FROM person WHERE birthday <= current_timestamp - CAST ($1 AS interval);
As a result, we must add a third item to the list above:
- Add an explicit type cast to the statement if automatic type resolution fails.
Conclusion
It is simple to forget that query performance might be affected by the data type of a query parameter. Avoid issues by either not specifying a type and letting PostgreSQL handle it correctly or by explicitly specifying the right type.
About Enteros
Enteros offers a patented database performance management SaaS platform. It proactively identifies root causes of complex business-impacting database scalability and performance issues across a growing number of clouds, RDBMS, NoSQL, and machine learning database platforms.
The views expressed on this blog are those of the author and do not necessarily reflect the opinions of Enteros Inc. This blog may contain links to the content of third-party sites. By providing such links, Enteros Inc. does not adopt, guarantee, approve, or endorse the information, views, or products available on such sites.
Are you interested in writing for Enteros’ Blog? Please send us a pitch!
RELATED POSTS
Enteros and Generative AI in the Insurance Sector: Performance Monitoring and Enterprise Efficiency
- 21 February 2025
- Database Performance Management
In the fast-evolving world of finance, where banking and insurance sectors rely on massive data streams for real-time decisions, efficient anomaly man…
Enhancing Real Estate IT Efficiency with Enteros: Leveraging Observability, Cloud FinOps, and Database Performance Optimization
In the fast-evolving world of finance, where banking and insurance sectors rely on massive data streams for real-time decisions, efficient anomaly man…
Enteros and CloudTech: Database Performance and RevOps in the BFSI Sector
- 20 February 2025
- Database Performance Management
In the fast-evolving world of finance, where banking and insurance sectors rely on massive data streams for real-time decisions, efficient anomaly man…
Optimizing Cost Attribution and Estimation in Healthcare: Enhancing Database Performance Monitoring with Enteros
In the fast-evolving world of finance, where banking and insurance sectors rely on massive data streams for real-time decisions, efficient anomaly man…