Preamble
One of the most well-liked features of the more recent PostgreSQL improvements is table partitioning. Automatic partition creation is not yet supported, though. What you can do to fix that is demonstrated in this article.
Use cases for automatic partition creation
In essence, there are only two use cases:
- At the end of the current month, create time-triggered partitions, such as those for the following month.
- Create new partitions as needed if a row is inserted and it cannot fit in any of the existing ones.
On-demand partitioning and time-triggered partitioning are the two options that will be discussed here.
Automatic partition creation for time-triggered partitioning
At worst, it’s a hassle that PostgreSQL isn’t supported here. The restriction can be overcome in a number of ways:
Making use of the operating system scheduler
You can use the operating system scheduler (cron
on Unix-like systems or “Task Scheduler” on Windows). The job would connect to the database using psql
and run the appropriate CREATE TABLE
and ALTER TABLE ... ATTACH PARTITION
statements.
Such a job should typically be scheduled directly on the database server machine. That might not always be possible, like if you are using a hosted database and don’t have access to the operating system. The job can then be scheduled for a different machine in that situation.
using the database’s job scheduler
There is no built-in job scheduling system in PostgreSQL. You can schedule database jobs, though, thanks to extensions like pg_timetable and pg_cron. Adding a third-party extension to your software adds another dependency, and it may not be possible on a hosted database. On the other hand, it will remove your task from the operating system scheduler and make it less likely that it will be forgotten when you move the database.
Using pg_partman
A PostgreSQL extension with a focus on partition management is called pg_partman. Before “declarative partitioning” was added in version 10, it was important because it helped you set up triggers and constraints. Most of these features aren’t needed anymore, but being able to make automatic partitions can still be useful.
Automatic partition creation for on-demand partitioning
This is more challenging, and the pain of PostgreSQL core’s lack of support is more acute here.
The naïve approach to try is to have a BEFORE INSERT
trigger that creates and attaches a new partition. But that will result in one of the subsequent “object in use” errors:
ERROR: cannot CREATE TABLE .. PARTITION OF "tab" because it is being used by active queries in this session
or
ERROR: cannot ALTER TABLE "tab" because it is being used by active queries in this session
Essentially, by the time the INSERT
has started, you cannot modify its definition any more.
LISTEN
and NOTIFY
as a workaround
We need processing to happen at different times, so the new partition must be made in a separate transaction. Since we certainly cannot create the partition before we beginINSERT
, it has to be afterwards. However, the partition has to exist at the time of the treatyINSERT
, so we would seem to be at an impasse.
But there is an alternative: we can first create a table and then attach it later, as a partition of the partitioned table, rather than creating the partition in a single step. The subsequent series of events could then go as follows:
- in the
BEFORE INSERT
trigger, create a new table that is not yet a partition (skipped if the table already exists) - insert the new row into the prospective partition instead of into the partitioned table
- at some later time, attach the newly-created table as a partition
How to execute the attachment “at some later time” is still up for debate. There is a convenient feature in PostgreSQL for asynchronous processing: LISTEN
and NOTIFY
. A session can register for notifications on a channel (often a table name) with LISTEN
and will then be notified asynchronously whenever another backend calls NOTIFY
for that channel. The sending backend can add a message string to the notification.
We can use this together with a row level BEFORE INSERT
trigger as follows:
- create a new table that will become a new partition (if it does not yet exist)
- insert the new row into that table
- if a new table was created in the first step,
NOTIFY
with the new partition key as the argument - skip the original
INSERT
, since we already inserted the row in the partition
Keep in mind that doing this requires a default partition to exist. If there isn’t one, trying to add a row that doesn’t match an existing partition will fail.
Before adding the new table to the partitioned table, a daemon process waits for notifications to come in from a different session.
A sample implementation of automatic partition creation on demand
The definition of a partitioned table is as follows:
PARTITION BY LIST ((ts::date)); CREATE TABLE tab (id bigint GENERATED ALWAYS AS IDENTITY, ts timestamp NOT NULL, data text) CREATE TABLE tab_def PARTITION OF tab DEFAULT;
List partitioning is optional, but I chose to use it to make the listener code simpler.
The initiating action
The trigger appears as follows:
ADD FUNCTION part_trig() TO RETURNS plpgsql trigger language AS $$BEGIN BEGIN /* attempt to create a table for the new partition */ EXECUTE format('CREATE TABLE%I (LIKE TAB INCLUDING INDEXES)', 'tab_' || to_char(NEW.ts, 'YYYY-MM-DD')); /* instruct the listener to attach the partition * (only if a new table was created) */ EXECUTE format('NOTIFY tab,% -- ignore END; /* insert into the new partition */ EXECUTE format( 'INSERT INTO %I VALUES ($1.*)', 'tab_' || to_char(NEW.ts, 'YYYY-MM-DD') ) USING NEW; /* skip insert into the partitioned table */ RETURN NULL; END;$$; CREATE TRIGGER part_trig BEFORE INSERT ON TAB FOR EACH ROW WHEN (pg_trigger_depth() < 1) EXECUTE FUNCTION part_trig();
To understand why the trigger function returnsNULL
, read my blog post on the topic. The WHEN
clause in the trigger definition avoids infinite recursion; without it, inserting a row into a partition would call the trigger function again.
The listener
We need to use an asynchronous notification-supporting PostgreSQL client API for the listener, so I went with C:
#include <libpq-fe.h> #include <stdio.h> #include <stdlib.h> #include <string.h> #include <sys/select.h> #include <errno.h> static int attach_partition(PGconn *conn, char *table, char *datestr) { PGresult *res; char *esc_tab, *esc_part, *esc_date; /* identifiers are at most 63 bytes long */ char stmt[400], part[64]; /* escape table and partition name */ esc_tab = PQescapeIdentifier(conn, table, 63); snprintf(part, 63, "%s_%s", table, datestr); esc_part = PQescapeIdentifier(conn, part, 63); esc_date = PQescapeLiteral(conn, datestr, 10); /* construct statement */ sprintf(stmt, "ALTER TABLE %s ATTACH PARTITION %s FOR VALUES IN (%s)", esc_tab, esc_part, esc_date); free(esc_tab); free(esc_part); free(esc_date); /* execute statement */ if ((res = PQexec(conn, stmt)) == NULL) { fprintf(stderr, "Out of memory sending statement\n"); return 0; } if (PQresultStatus(res) != PGRES_COMMAND_OK) { fprintf(stderr, "Error attaching partition: %s\n", PQresultErrorMessage(res)); return 0; } PQclear(res); return 1; } int main (int argc, char **argv) { PGconn *conn; int sock; PGresult *res; /* connect to PostgreSQL with default parameters */ if ((conn = PQconnectdb("application_name=listener")) == NULL) { fprintf(stderr, "Out of memory connecting to PostgreSQL\n"); return 1; } if (PQstatus(conn) != CONNECTION_OK) { fprintf(stderr, "Error connecting to PostgreSQL: %s\n", PQerrorMessage(conn)); PQfinish(conn); return 1; } /* get network socket for the connection */ if ((sock = PQsocket(conn)) < 0) { fprintf(stderr, "Error getting connection network socket\n"); PQfinish(conn); return 1; } /* listen on a channel */ if ((res = PQexec(conn, "LISTEN tab")) == NULL) { fprintf(stderr, "Out of memory sending LISTEN\n"); PQfinish(conn); return 1; } if (PQresultStatus(res) != PGRES_COMMAND_OK) { fprintf(stderr, "Error listening on channel: %s\n", PQresultErrorMessage(res)); PQfinish(conn); return 1; } PQclear(res); while(1) { fd_set sockets; struct pgNotify *note; /* block waiting for activity on the network socket */ FD_ZERO(&sockets); FD_SET(sock, &sockets); errno = 0; if (select(sock + 1, &sockets, NULL, NULL, NULL) < 0) { perror("Error waiting for notification"); PQfinish(conn); return 1; } /* consume the input */ if (PQconsumeInput(conn) == 0) { fprintf(stderr, "Error receiving data: %s\n", PQerrorMessage(conn)); PQfinish(conn); return 1; } /* check for notifications */ note = PQnotifies(conn); if (note != NULL) { if (!attach_partition(conn, note->relname, note->extra)) { PQfinish(conn); return 1; } PQfreemem(note); } } }
Talking about performance and robustness
Race conditions shouldn’t exist in the aforementioned sample code. However, the performance won’t be stellar. The speed of 100,000 inserts with and without the trigger will be compared.
test=# TRUNCATE tab; TRUNCATE TABLE test=# \timing Timing is on. test=# INSERT INTO tab (ts, data) SELECT clock_timestamp(), 'something' FROM generate_series(1, 100000); INSERT 0 0 Time: 3354.275 ms (00:03.354) test=# ALTER TABLE tab DISABLE TRIGGER part_trig; ALTER TABLE test=# TRUNCATE tab; TRUNCATE TABLE Time: 20.005 ms test=# INSERT INTO tab (ts, data) SELECT clock_timestamp(), 'something' FROM generate_series(1, 100000); INSERT 0 100000 Time: 120.869 ms
Even though the code runs correctly, the trigger takes about 32 milliseconds per row to process. There are no indexes on the table, so the overall processing time is only 28 times longer than it would be without the trigger. Even though writing the trigger function in C would increase complexity, it would be possible to reduce the overhead.
Conclusion
Even though it is relatively easy to work around the lack of time-triggered partition creation, creating partitions on demand when rows are inserted is more difficult. We found a solution using triggers, LISTEN,
and NOTIFY
, but the solution was complicated and the performance impact was severe. For this, it would be beneficial to receive better support from the PostgreSQL core.
About Enteros
Enteros offers a patented database performance management SaaS platform. It finds the root causes of complex database scalability and performance problems that affect business across a growing number of cloud, RDBMS, NoSQL, and machine learning database platforms.
The views expressed on this blog are those of the author and do not necessarily reflect the opinions of Enteros Inc. This blog may contain links to the content of third-party sites. By providing such links, Enteros Inc. does not adopt, guarantee, approve, or endorse the information, views, or products available on such sites.
Are you interested in writing for Enteros’ Blog? Please send us a pitch!
RELATED POSTS
Enhancing Identity and Access Management in Healthcare with Enteros
- 19 November 2024
- Database Performance Management
In the fast-evolving world of finance, where banking and insurance sectors rely on massive data streams for real-time decisions, efficient anomaly man…
Maximizing Efficiency with Enteros: Revolutionizing Cost Allocation Through a Cloud Center of Excellence
In the fast-evolving world of finance, where banking and insurance sectors rely on massive data streams for real-time decisions, efficient anomaly man…
Driving Efficiency in the Transportation Sector: Enteros’ Cloud FinOps and Database Optimization Solutions
- 18 November 2024
- Database Performance Management
In the fast-evolving world of finance, where banking and insurance sectors rely on massive data streams for real-time decisions, efficient anomaly man…
Empowering Nonprofits with Enteros: Optimizing Cloud Resources Through AIOps Platform
In the fast-evolving world of finance, where banking and insurance sectors rely on massive data streams for real-time decisions, efficient anomaly man…