Preamble
“Aggregate functions” are something that many people who have used PostgreSQL or SQL in general professionally or semi-professionally have run into. The most fundamental aggregate functions, like count, min, max, sum, and so forth, are typically offered by database engines like PostgreSQL. These only really serve the most fundamental requirements and needs, though. These fundamental operations are frequently insufficient, so it makes sense to write your own code to ensure that calculations can still be performed on the server side.
Why server-side code in PostgreSQL makes sense
Those who prefer application-level solutions may now object: Why not remove the business logic from the database? Well, the solution may be simple in many instances. Imagine you have a PostgreSQL database with 10 billion rows and you need to perform a straightforward calculation. For the computation to happen on the client (maybe in a Java app or something), the data must be moved from the database to the app. Assume that each row only contains 100 bytes. You had to move an incredible amount of data:
test=# SELECT pg_size_pretty(10000000000 * 100); pg_size_pretty ---------------- 931 GB (1 row)
To satisfy a Java design pattern, you must transfer close to 1 TB of data. Because you can avoid moving all the data over the network, it can be advantageous to perform calculations on the server side. Making decisions using SQL is perfectly acceptable and will guarantee that only the results need to be transmitted over the network, which is a significant advantage over making decisions using the client side. The results of a calculation are frequently not very large anyway.
Creating your own aggregates
Because PostgreSQL is so adaptable and lets users build their own aggregation functions, it can be useful for moving your business logic to PostgreSQL. To create various aggregations, use the “CREATE AGGREGATE” command. This blog post doesn’t explain everything, but it does show how to use the simplest example. Maybe in a later blog post, we’ll go over a few more specifics. What you see here, however, ought to assist you in getting going.
I made the decision to include a very basic example to demonstrate how things work: Let’s say we want to figure out the total cost of a taxi ride. The cost of taking a taxi is EUR 3.50, plus EUR 2.20 per additional kilometer. We want to round up the total to give the taxi driver at the end of the day. The cost of each ride is what we want to figure out.
Here are a few examples of data:
CREATE TABLE t_taxi ( trip_id int, km numeric ); COPY t_taxi FROM stdin DELIMITER ';'; 1;3.4 1;5.3 1;2.9 2;9.3 2;1.6 2;4.3 \.
I have included information for two trips in this example. Let’s assume, for simplicity’s sake, that every trip is just made up of a couple of parts. Now that those segments have been added up, we want to determine the total cost for each step.
We basically need two functions to accomplish that: The “SFUNC” function is called once for each row, and the “FINALFUNC” function is called once for each group:
CREATE FUNCTION taxi_accum (numeric, numeric, numeric) RETURNS numeric AS $$ SELECT $1 + $2*$3; $$ LANGUAGE 'sql' STRICT;
The function takes the intermediate value from all previous calls (the first parameter) as well as data from the current row. Each row triggers one call to the function.
The “FINALFUNC” is referred to as the group’s conclusion.
CREATE FUNCTION taxi_final(numeric) RETURNS numeric AS $$ SELECT round($1 + 5, -1); $$ LANGUAGE 'sql' STRICT;
This means, in our example:
x = taxi_accum(INITCOND = 3.5, 3.4, 2.20) x = taxi_accum(x, 5.3, 2.20) x = taxi_accum(x, 2.9, 2.20) result_value_1 = taxi_final(x) x = taxi_accum(INITCOND = 3.5, 9.3, 2.20) x = taxi_accum(x, 1.6, 2.20) x = taxi_accum(x, 4.3, 2.20) result_value_2 = taxi_final(x)
It will return two rows, one for each group.
We can already create the aggregate once the fundamental logic has been introduced. In my example, the aggregate requires two parameters: one for the price per kilometer and one for the number of kilometers per segment. The INITCOND (= start value of each group) indicates that the cost to hire the taxi is EUR 3.50:
CREATE AGGREGATE taxi(numeric, numeric) ( INITCOND = 3.50, STYPE = numeric, SFUNC = taxi_accum, FINALFUNC = taxi_final );
Basically, it’s simple to create the aggregate: It must understand the input parameters and how to proceed. We must also inform it of the data type of the intermediate result and the functions that must be called (for each line and at the conclusion of the group).
Once the aggregate has been set up in your PostgreSQL database, you can already perform the following straightforward search:
test=# SELECT trip_id, taxi(km, 2.20), 3.50 + sum(km)*2.2 AS manual FROM t_taxi GROUP BY 1; trip_id | taxi | manual ---------+------+-------- 2 | 40 | 36.94 1 | 30 | 29.02 (2 rows)
In order to confirm the accuracy of the data, the custom aggregate has been called in this case along with some manual magic.
Managing grouping sets, etc.
The great thing is that grouping sets and other similar situations can be handled using the same system. Here’s an illustration:
SELECT trip_id, taxi(km, 2.20), 3.50 + sum(km)*2.2 AS manual FROM t_taxi GROUP BY ROLLUP(1); trip_id | taxi | manual ---------+------+-------- 1 | 30 | 29.02 2 | 40 | 36.94 | 70 | 62.46
The value is provided in the final row under the assumption that there was only one trip overall rather than two. So in essence, the query returned 3 aggregates. There are no special safety measures required. The total just functions. The same is true for analytics and windowing features.
Of course, custom aggregates are much more complex than what has been described here. This, however, is outside the purview of this article.
About Enteros
Enteros offers a patented database performance management SaaS platform. It finds the root causes of complex database scalability and performance problems that affect business across a growing number of cloud, RDBMS, NoSQL, and machine learning database platforms.
The views expressed on this blog are those of the author and do not necessarily reflect the opinions of Enteros Inc. This blog may contain links to the content of third-party sites. By providing such links, Enteros Inc. does not adopt, guarantee, approve, or endorse the information, views, or products available on such sites.
Are you interested in writing for Enteros’ Blog? Please send us a pitch!
RELATED POSTS
Revolutionizing Healthcare IT: Leveraging Enteros, FinOps, and DevOps Tools for Superior Database Software Management
- 21 November 2024
- Database Performance Management
In the fast-evolving world of finance, where banking and insurance sectors rely on massive data streams for real-time decisions, efficient anomaly man…
Optimizing Real Estate Operations with Enteros: Harnessing Azure Resource Groups and Advanced Database Software
In the fast-evolving world of finance, where banking and insurance sectors rely on massive data streams for real-time decisions, efficient anomaly man…
Revolutionizing Real Estate: Enhancing Database Performance and Cost Efficiency with Enteros and Cloud FinOps
In the fast-evolving world of finance, where banking and insurance sectors rely on massive data streams for real-time decisions, efficient anomaly man…
Enteros in Education: Leveraging AIOps for Advanced Anomaly Management and Optimized Learning Environments
In the fast-evolving world of finance, where banking and insurance sectors rely on massive data streams for real-time decisions, efficient anomaly man…