Preamble
One of the most well-liked features of the more recent PostgreSQL improvements is table partitioning. Automatic partition creation is not yet supported, though. This article demonstrates how to fix that.
Use cases for automatic partition creation
In essence, there are two use cases:
- Make partitions that are time-triggered, such as those for the following month at the end of the current month.
- If a row is inserted that does not fit in an existing partition, create new ones on demand.
The first option will be referred to as time-triggered partitioning, and the second as on-demand partitioning.
Automatic partition creation for time-triggered partitioning
At worst, the absence of PostgreSQL support in this area is a hassle. There are numerous strategies to get around the restriction:
Using the scheduler in the operating system
You can use the operating system scheduler (cron
on Unix-like systems or “Task Scheduler” on Windows). The job would connect to the database using psql
and run the appropriate CREATE TABLE
and ALTER TABLE ... ATTACH PARTITION
statements.
Usually, you would want to schedule this kind of job right on the machine that runs the database server. Sometimes you cannot do that, for example, because you are using a hosted database and have no access to the operating system. In that case, you can schedule the job on a different machine.
using the database’s job scheduler
There is no built-in job scheduling system in PostgreSQL. However, you can schedule database jobs using extensions like pg_timetable or pg_cron. A hosted database might not allow the installation of a third-party extension because it adds another dependency to your software architecture. On the other hand, it will free your task from the operating system scheduler and make it less likely that it will be overlooked when you move the database.
Pg_Partman usage
A PostgreSQL extension with a focus on partition management is called pg_partman. Before version 10’s introduction of “declarative partitioning,” it served a purpose and was essential because it helped you create triggers and constraints. Although the majority of this functionality is no longer required, the ability to create automatic partitions can still be useful.
Automatic partition creation for on-demand partitioning
This is more difficult, and it hurts more acutely that PostgreSQL core isn’t supported.
The naïve approach to try is to have a BEFORE INSERT
trigger that creates and attaches a new partition. That will, however, result in one of the following “object in use” errors:
ERROR: cannot CREATE TABLE .. PARTITION OF "tab" because it is being used by active queries in this session
or
ERROR: cannot ALTER TABLE "tab" because it is being used by active queries in this session
Essentially, by the time the INSERT
has started, you cannot modify the table definition any more.
LISTEN
and NOTIFY
as a workaround
The new partition needs to be generated in a separate transaction since we want asynchronous processing. The partition must be built after the INSERT because it cannot be built before. This division must have been present at the time of the INSERT, therefore, it seems like we’ve reached a dead end.
Instead of creating the partition all at once, there is an alternative: we can first create a table and then attach it as a partition of the partitioned table. The following sequence of actions could then take place:
- in the
BEFORE INSERT
trigger, create a new table that is not yet a partition (skipped if the table already exists) - insert the new row into the prospective partition instead of into the partitioned table
- at some later time, attach the newly-created table as a partition
The question of how to carry out the attachment “at some later time” is still open. There is a convenient feature in PostgreSQL for asynchronous processing: LISTEN
and NOTIFY
. A session can register for notifications on a channel (often a table name) with LISTEN
and will then be notified asynchronously whenever another backend calls NOTIFY
for that channel. The notification can have a message string added by the sending backend.
We can use this together with a row level BEFORE INSERT
trigger as follows:
- if it does not already exist, create a new table that will serve as a new partition.
- Add a new row to that table.
- if a new table was created in the first step,
NOTIFY
with the new partition key as the argument - skip the original
INSERT
, since we already inserted the row in the partition
Note that this requires that we have a default partition, otherwise, inserting a row that does not match an existing partition will fail.
In a different session, a daemon process waits for notifications and attaches the new table to the partitioned table.
A sample implementation of automatic partition creation on demand
Here is the definition of the partitioned table:
CREATE TABLE tab ( id bigint GENERATED ALWAYS AS IDENTITY, ts timestamp NOT NULL, data text ) PARTITION BY LIST ((ts::date)); CREATE TABLE tab_def PARTITION OF tab DEFAULT;
Although it is not necessary, I decided to use list partitioning to streamline the listener code.
The triggering process
The trigger appears as follows:
CREATE FUNCTION part_trig() RETURNS trigger LANGUAGE plpgsql AS $$BEGIN BEGIN /* try to create a table for the new partition */ EXECUTE format( 'CREATE TABLE %I (LIKE tab INCLUDING INDEXES)', 'tab_' || to_char(NEW.ts, 'YYYY-MM-DD') ); /* * tell listener to attach the partition * (only if a new table was created) */ EXECUTE format( 'NOTIFY tab, %L', to_char(NEW.ts, 'YYYY-MM-DD') ); EXCEPTION WHEN duplicate_table THEN NULL; -- ignore END; /* insert into the new partition */ EXECUTE format( 'INSERT INTO %I VALUES ($1.*)', 'tab_' || to_char(NEW.ts, 'YYYY-MM-DD') ) USING NEW; /* skip insert into the partitioned table */ RETURN NULL; END;$$; CREATE TRIGGER part_trig BEFORE INSERT ON TAB FOR EACH ROW WHEN (pg_trigger_depth() < 1) EXECUTE FUNCTION part_trig();
To understand why the trigger function returns NULL
, read my blog post on the topic. The WHEN
clause in the trigger definition avoids infinite recursion: without it, inserting a row into a partition would call the trigger function again.
The audience
We need to use an asynchronous notification-supporting PostgreSQL client API for the listener, so I went with C:
#include <libpq-fe.h> #include <stdio.h> #include <stdlib.h> #include <string.h> #include <sys/select.h> #include <errno.h> static int attach_partition(PGconn *conn, char *table, char *datestr) { PGresult *res; char *esc_tab, *esc_part, *esc_date; /* identifiers are at most 63 bytes long */ char stmt[400], part[64]; /* escape table and partition name */ esc_tab = PQescapeIdentifier(conn, table, 63); snprintf(part, 63, "%s_%s", table, datestr); esc_part = PQescapeIdentifier(conn, part, 63); esc_date = PQescapeLiteral(conn, datestr, 10); /* construct statement */ sprintf(stmt, "ALTER TABLE %s ATTACH PARTITION %s FOR VALUES IN (%s)", esc_tab, esc_part, esc_date); free(esc_tab); free(esc_part); free(esc_date); /* execute statement */ if ((res = PQexec(conn, stmt)) == NULL) { fprintf(stderr, "Out of memory sending statement\n"); return 0; } if (PQresultStatus(res) != PGRES_COMMAND_OK) { fprintf(stderr, "Error attaching partition: %s\n", PQresultErrorMessage(res)); return 0; } PQclear(res); return 1; } int main (int argc, char **argv) { PGconn *conn; int sock; PGresult *res; /* connect to PostgreSQL with default parameters */ if ((conn = PQconnectdb("application_name=listener")) == NULL) { fprintf(stderr, "Out of memory connecting to PostgreSQL\n"); return 1; } if (PQstatus(conn) != CONNECTION_OK) { fprintf(stderr, "Error connecting to PostgreSQL: %s\n", PQerrorMessage(conn)); PQfinish(conn); return 1; } /* get network socket for the connection */ if ((sock = PQsocket(conn)) < 0) { fprintf(stderr, "Error getting connection network socket\n"); PQfinish(conn); return 1; } /* listen on a channel */ if ((res = PQexec(conn, "LISTEN tab")) == NULL) { fprintf(stderr, "Out of memory sending LISTEN\n"); PQfinish(conn); return 1; } if (PQresultStatus(res) != PGRES_COMMAND_OK) { fprintf(stderr, "Error listening on channel: %s\n", PQresultErrorMessage(res)); PQfinish(conn); return 1; } PQclear(res); while(1) { fd_set sockets; struct pgNotify *note; /* block waiting for activity on the network socket */ FD_ZERO(&sockets); FD_SET(sock, &sockets); errno = 0; if (select(sock + 1, &sockets, NULL, NULL, NULL) < 0) { perror("Error waiting for notification"); PQfinish(conn); return 1; } /* consume the input */ if (PQconsumeInput(conn) == 0) { fprintf(stderr, "Error receiving data: %s\n", PQerrorMessage(conn)); PQfinish(conn); return 1; } /* check for notifications */ note = PQnotifies(conn); if (note != NULL) { if (!attach_partition(conn, note->relname, note->extra)) { PQfinish(conn); return 1; } PQfreemem(note); } } }
Discussion of performance and robustness
The sample code above should be free from race conditions. However, performance will not be good. The speed of 100,000 inserts with and without the trigger will be compared.
test=# TRUNCATE tab; TRUNCATE TABLE test=# \timing Timing is on. test=# INSERT INTO tab (ts, data) SELECT clock_timestamp(), 'something' FROM generate_series(1, 100000); INSERT 0 0 Time: 3354.275 ms (00:03.354) test=# ALTER TABLE tab DISABLE TRIGGER part_trig; ALTER TABLE test=# TRUNCATE tab; TRUNCATE TABLE Time: 20.005 ms test=# INSERT INTO tab (ts, data) SELECT clock_timestamp(), 'something' FROM generate_series(1, 100000); INSERT 0 100000 Time: 120.869 ms
Although the code executes correctly, the trigger has a significant overhead of about 32 milliseconds per row. The entire process takes 28 times as long as it would without the trigger, but this factor can only be so high because there are no table-based indexes. Writing the trigger function in C as well would reduce the overhead, but that would only increase the level of complexity.
Conclusion
It isn’t hard to work around the fact that partitions aren’t made when rows are added, but it is harder to make partitions when rows are inserted. We found a solution using triggers, LISTEN
and NOTIFY
, but the solution was complicated and the performance impact was severe. It would be good to get better support from PostgreSQL core for this!
About Enteros
Enteros offers a patented database performance management SaaS platform. It proactively identifies root causes of complex business-impacting database scalability and performance issues across a growing number of clouds, RDBMS, NoSQL, and machine learning database platforms.
The views expressed on this blog are those of the author and do not necessarily reflect the opinions of Enteros Inc. This blog may contain links to the content of third-party sites. By providing such links, Enteros Inc. does not adopt, guarantee, approve, or endorse the information, views, or products available on such sites.
Are you interested in writing for Enteros’ Blog? Please send us a pitch!
RELATED POSTS
Enhancing Identity and Access Management in Healthcare with Enteros
- 19 November 2024
- Database Performance Management
In the fast-evolving world of finance, where banking and insurance sectors rely on massive data streams for real-time decisions, efficient anomaly man…
Maximizing Efficiency with Enteros: Revolutionizing Cost Allocation Through a Cloud Center of Excellence
In the fast-evolving world of finance, where banking and insurance sectors rely on massive data streams for real-time decisions, efficient anomaly man…
Driving Efficiency in the Transportation Sector: Enteros’ Cloud FinOps and Database Optimization Solutions
- 18 November 2024
- Database Performance Management
In the fast-evolving world of finance, where banking and insurance sectors rely on massive data streams for real-time decisions, efficient anomaly man…
Empowering Nonprofits with Enteros: Optimizing Cloud Resources Through AIOps Platform
In the fast-evolving world of finance, where banking and insurance sectors rely on massive data streams for real-time decisions, efficient anomaly man…