POSTGRESQL AUTOMATIC PARTITION CREATION

Home > Enteros’ Blog – Thoughts on Database Technology, Machine / Deep learning, and a Generative AI > Deep learning > POSTGRESQL AUTOMATIC PARTITION CREATION

Preamble

One of the most well-liked features of the more recent PostgreSQL improvements is table partitioning. Automatic partition creation is not yet supported, though. This article demonstrates how to fix that.

Use cases for automatic partition creation

In essence, there are two use cases:

Make partitions that are time-triggered, such as those for the following month at the end of the current month.
If a row is inserted that does not fit in an existing partition, create new ones on demand.

The first option will be referred to as time-triggered partitioning, and the second as on-demand partitioning.

Automatic partition creation for time-triggered partitioning

At worst, the absence of PostgreSQL support in this area is a hassle. There are numerous strategies to get around the restriction:

Using the scheduler in the operating system

You can use the operating system scheduler (cron on Unix-like systems or “Task Scheduler” on Windows). The job would connect to the database using psql and run the appropriate CREATE TABLE and ALTER TABLE ... ATTACH PARTITION statements.

Usually, you would want to schedule this kind of job right on the machine that runs the database server. Sometimes you cannot do that, for example, because you are using a hosted database and have no access to the operating system. In that case, you can schedule the job on a different machine.

using the database’s job scheduler

There is no built-in job scheduling system in PostgreSQL. However, you can schedule database jobs using extensions like pg_timetable or pg_cron. A hosted database might not allow the installation of a third-party extension because it adds another dependency to your software architecture. On the other hand, it will free your task from the operating system scheduler and make it less likely that it will be overlooked when you move the database.

Pg_Partman usage

A PostgreSQL extension with a focus on partition management is called pg_partman. Before version 10’s introduction of “declarative partitioning,” it served a purpose and was essential because it helped you create triggers and constraints. Although the majority of this functionality is no longer required, the ability to create automatic partitions can still be useful.

Automatic partition creation for on-demand partitioning

This is more difficult, and it hurts more acutely that PostgreSQL core isn’t supported.

The naïve approach to try is to have a BEFORE INSERT trigger that creates and attaches a new partition. That will, however, result in one of the following “object in use” errors:

ERROR: cannot CREATE TABLE .. PARTITION OF "tab" because it is being used by active queries in this session

ERROR: cannot ALTER TABLE "tab" because it is being used by active queries in this session

Essentially, by the time the INSERT has started, you cannot modify the table definition any more.

`LISTEN` and `NOTIFY` as a workaround

The new partition needs to be generated in a separate transaction since we want asynchronous processing. The partition must be built after the INSERT because it cannot be built before. This division must have been present at the time of the INSERT, therefore, it seems like we’ve reached a dead end.

Instead of creating the partition all at once, there is an alternative: we can first create a table and then attach it as a partition of the partitioned table. The following sequence of actions could then take place:

in the BEFORE INSERT trigger, create a new table that is not yet a partition (skipped if the table already exists)
insert the new row into the prospective partition instead of into the partitioned table
at some later time, attach the newly-created table as a partition

The question of how to carry out the attachment “at some later time” is still open. There is a convenient feature in PostgreSQL for asynchronous processing: LISTEN and NOTIFY. A session can register for notifications on a channel (often a table name) with LISTEN and will then be notified asynchronously whenever another backend calls NOTIFY for that channel. The notification can have a message string added by the sending backend.

We can use this together with a row level BEFORE INSERT trigger as follows:

if it does not already exist, create a new table that will serve as a new partition.
Add a new row to that table.
if a new table was created in the first step, NOTIFY with the new partition key as the argument
skip the original INSERT, since we already inserted the row in the partition

Note that this requires that we have a default partition, otherwise, inserting a row that does not match an existing partition will fail.

In a different session, a daemon process waits for notifications and attaches the new table to the partitioned table.

A sample implementation of automatic partition creation on demand

Here is the definition of the partitioned table:

CREATE TABLE tab ( id bigint GENERATED ALWAYS AS IDENTITY, ts timestamp NOT NULL, data text ) PARTITION BY LIST ((ts::date)); CREATE TABLE tab_def PARTITION OF tab DEFAULT;

Although it is not necessary, I decided to use list partitioning to streamline the listener code.

The triggering process

The trigger appears as follows:

CREATE FUNCTION part_trig() RETURNS trigger LANGUAGE plpgsql AS $$BEGIN BEGIN /* try to create a table for the new partition */ EXECUTE format( 'CREATE TABLE %I (LIKE tab INCLUDING INDEXES)', 'tab_' || to_char(NEW.ts, 'YYYY-MM-DD') ); /* * tell listener to attach the partition * (only if a new table was created) */ EXECUTE format( 'NOTIFY tab, %L', to_char(NEW.ts, 'YYYY-MM-DD') ); EXCEPTION WHEN duplicate_table THEN NULL; -- ignore END; /* insert into the new partition */ EXECUTE format( 'INSERT INTO %I VALUES ($1.*)', 'tab_' || to_char(NEW.ts, 'YYYY-MM-DD') ) USING NEW; /* skip insert into the partitioned table */ RETURN NULL; END;$$; CREATE TRIGGER part_trig BEFORE INSERT ON TAB FOR EACH ROW WHEN (pg_trigger_depth() < 1) EXECUTE FUNCTION part_trig();

To understand why the trigger function returns NULL, read my blog post on the topic. The WHEN clause in the trigger definition avoids infinite recursion: without it, inserting a row into a partition would call the trigger function again.

The audience

We need to use an asynchronous notification-supporting PostgreSQL client API for the listener, so I went with C:

 #include <libpq-fe.h> #include <stdio.h> #include <stdlib.h> #include <string.h> #include <sys/select.h> #include <errno.h> static int attach_partition(PGconn *conn, char *table, char *datestr) { PGresult *res; char *esc_tab, *esc_part, *esc_date; /* identifiers are at most 63 bytes long */ char stmt[400], part[64]; /* escape table and partition name */ esc_tab = PQescapeIdentifier(conn, table, 63); snprintf(part, 63, "%s_%s", table, datestr); esc_part = PQescapeIdentifier(conn, part, 63); esc_date = PQescapeLiteral(conn, datestr, 10); /* construct statement */ sprintf(stmt, "ALTER TABLE %s ATTACH PARTITION %s FOR VALUES IN (%s)", esc_tab, esc_part, esc_date); free(esc_tab); free(esc_part); free(esc_date); /* execute statement */ if ((res = PQexec(conn, stmt)) == NULL) { fprintf(stderr, "Out of memory sending statement\n"); return 0; } if (PQresultStatus(res) != PGRES_COMMAND_OK) { fprintf(stderr, "Error attaching partition: %s\n", PQresultErrorMessage(res)); return 0; } PQclear(res); return 1; } int main (int argc, char **argv) { PGconn *conn; int sock; PGresult *res; /* connect to PostgreSQL with default parameters */ if ((conn = PQconnectdb("application_name=listener")) == NULL) { fprintf(stderr, "Out of memory connecting to PostgreSQL\n"); return 1; } if (PQstatus(conn) != CONNECTION_OK) { fprintf(stderr, "Error connecting to PostgreSQL: %s\n", PQerrorMessage(conn)); PQfinish(conn); return 1; } /* get network socket for the connection */ if ((sock = PQsocket(conn)) < 0) { fprintf(stderr, "Error getting connection network socket\n"); PQfinish(conn); return 1; } /* listen on a channel */ if ((res = PQexec(conn, "LISTEN tab")) == NULL) { fprintf(stderr, "Out of memory sending LISTEN\n"); PQfinish(conn); return 1; } if (PQresultStatus(res) != PGRES_COMMAND_OK) { fprintf(stderr, "Error listening on channel: %s\n", PQresultErrorMessage(res)); PQfinish(conn); return 1; } PQclear(res); while(1) { fd_set sockets; struct pgNotify *note; /* block waiting for activity on the network socket */ FD_ZERO(&sockets); FD_SET(sock, &sockets); errno = 0; if (select(sock + 1, &sockets, NULL, NULL, NULL) < 0) { perror("Error waiting for notification"); PQfinish(conn); return 1; } /* consume the input */ if (PQconsumeInput(conn) == 0) { fprintf(stderr, "Error receiving data: %s\n", PQerrorMessage(conn)); PQfinish(conn); return 1; } /* check for notifications */ note = PQnotifies(conn); if (note != NULL) { if (!attach_partition(conn, note->relname, note->extra)) { PQfinish(conn); return 1; } PQfreemem(note); } } }

Discussion of performance and robustness

The sample code above should be free from race conditions. However, performance will not be good. The speed of 100,000 inserts with and without the trigger will be compared.

test=# TRUNCATE tab; TRUNCATE TABLE test=# \timing Timing is on. test=# INSERT INTO tab (ts, data) SELECT clock_timestamp(), 'something' FROM generate_series(1, 100000); INSERT 0 0 Time: 3354.275 ms (00:03.354) test=# ALTER TABLE tab DISABLE TRIGGER part_trig; ALTER TABLE test=# TRUNCATE tab; TRUNCATE TABLE Time: 20.005 ms test=# INSERT INTO tab (ts, data) SELECT clock_timestamp(), 'something' FROM generate_series(1, 100000); INSERT 0 100000 Time: 120.869 ms

Although the code executes correctly, the trigger has a significant overhead of about 32 milliseconds per row. The entire process takes 28 times as long as it would without the trigger, but this factor can only be so high because there are no table-based indexes. Writing the trigger function in C as well would reduce the overhead, but that would only increase the level of complexity.

Conclusion

It isn’t hard to work around the fact that partitions aren’t made when rows are added, but it is harder to make partitions when rows are inserted. We found a solution using triggers, LISTEN and NOTIFY, but the solution was complicated and the performance impact was severe. It would be good to get better support from PostgreSQL core for this!

About Enteros

Enteros offers a patented database performance management SaaS platform. It proactively identifies root causes of complex business-impacting database scalability and performance issues across a growing number of clouds, RDBMS, NoSQL, and machine learning database platforms.

The views expressed on this blog are those of the author and do not necessarily reflect the opinions of Enteros Inc. This blog may contain links to the content of third-party sites. By providing such links, Enteros Inc. does not adopt, guarantee, approve, or endorse the information, views, or products available on such sites.

Are you interested in writing for Enteros’ Blog? Please send us a pitch!

Enteros and CloudTech: Database Performance and RevOps in the BFSI Sector

20 February 2025
Database Performance Management

In the fast-evolving world of finance, where banking and insurance sectors rely on massive data streams for real-time decisions, efficient anomaly man…

Optimizing Cost Attribution and Estimation in Healthcare: Enhancing Database Performance Monitoring with Enteros

Database Performance Management

In the fast-evolving world of finance, where banking and insurance sectors rely on massive data streams for real-time decisions, efficient anomaly man…

Enteros and Cloud FinOps: Driving Database Performance and Observability in the Financial Sector

19 February 2025
Database Performance Management

In the fast-evolving world of finance, where banking and insurance sectors rely on massive data streams for real-time decisions, efficient anomaly man…

Optimizing Cost Estimation in the Tech Sector: How Enteros Leverages Logical Models and Cloud FinOps for Smarter Database Management

Database Performance Management

In the fast-evolving world of finance, where banking and insurance sectors rely on massive data streams for real-time decisions, efficient anomaly man…

Preamble

Use cases for automatic partition creation

Automatic partition creation for time-triggered partitioning

Using the scheduler in the operating system

using the database’s job scheduler

Pg_Partman usage

Automatic partition creation for on-demand partitioning

LISTEN and NOTIFY as a workaround

A sample implementation of automatic partition creation on demand

The triggering process

The audience

Discussion of performance and robustness

Conclusion

About Enteros

RELATED POSTS

Enteros and CloudTech: Database Performance and RevOps in the BFSI Sector

Optimizing Cost Attribution and Estimation in Healthcare: Enhancing Database Performance Monitoring with Enteros

Enteros and Cloud FinOps: Driving Database Performance and Observability in the Financial Sector

Optimizing Cost Estimation in the Tech Sector: How Enteros Leverages Logical Models and Cloud FinOps for Smarter Database Management

`LISTEN` and `NOTIFY` as a workaround