Preamble
We came to the conclusion that we shouldn’t worry about it too much and should just make those triggers for the most common “audit fields” type of use case. But under what circumstances should one begin to worry a little?
I then created some more complex trigger use cases and again measured transaction latencies on them over a long period of time to gain additional insights. Please continue reading to learn more about the tests that were performed, or you can skip to the article’s final results table.
Default pgbench vs audit triggers for all updated tables
The triggers will just set the last modification timestamp to the current time and the username to the current user, if not already specified in the incoming row.
Single row update use case
In practice, multi-statement transactions require a lot of network communication. The next test only required one update to the pgbench_accounts table (again, adding two audit columns to the schema) to get rid of that issue. Then the same thing happens with a PL/pgSQL auditing trigger enabled, which sets the username and modification timestamp if they are left blank.
Results: 4% penalty for the trigger version, 0.390ms vs. 0.405ms. Already visible, but easily dismissed, in my opinion.
/* script file used for pgbench */ \set aid random(1, 100000 * :scale) \set delta random(-5000, 5000) UPDATE pgbench_accounts SET abalance = abalance + :delta WHERE aid = :aid;
Single row update with a trigger written in C
But what if we actually touch a dozen tables (approximately 60% hit) and the above 4% performance degradation is not acceptable? Can we reduce the time by a few microseconds?
You could try writing triggers in C, the language native to Postgres! It should aid in the optimization of triggers in addition to regular functions. But, “C,” do you think… sounds intimidating? Sure, it won’t all be fun and games, but the Postgres source code contains a good number of examples to get you started.
As a result, after some experimenting (I’m more of a Python / Go guy), I arrived at the following figures: Only 1% faster at 0.405ms for the PL/pgSQL trigger compared to 0.401ms for the “C” version! In conclusion, it is not time well spent for such basic trigger functionality. If compared to an interpreted PL language, though, you might be wondering why there isn’t much of a speedup. Yes, PL/pgSQL is somewhat interpretive, but it has the useful property that execution plans and the prepared statements they produce actually remain cached within a single session. Therefore, I’m confident that we would see some very different numbers if we used pgbench in “re-connect” mode.
... // audit field #1 - last_modified_on attnum = SPI_fnumber(tupdesc, "last_modified_on"); if (attnum <= 0) ereport(ERROR, (errcode(ERRCODE_TRIGGERED_ACTION_EXCEPTION), errmsg("relation \"%d\" has no attribute \"%s\"", rel->rd_id, "last_modified_on"))); valbuf = (char*)SPI_getvalue(rettuple, tupdesc, attnum); if (valbuf == NULL) { newval = GetCurrentTimestamp(); rettuple = heap_modify_tuple_by_cols(rettuple, tupdesc, 1, &attnum, &newval, &newnull); } ...
Single row update with a trigger doing “logging insert”
At this point, things start to become less comparable because we are introducing fresh data that wasn’t present in the “un-triggered” version. So, from the trigger, I was essentially performing the same action as the insert phase (into pgbench_history) of the default pgbench transaction. Despite the slowdown, it is most likely still faster than inserting from a user transaction because we can space a couple of network bytes + the parsing (in our default pgbench case, statements are always re-parsed from text vs. pl/pgsql code that is only parsed once (think “prepared statements”). Set the “protocol” parameter to “prepared” to see how pgbench handles prepared statements, which are mostly used to test the maximum IO throughput.
0.436 ms vs. 0.390 ms, a 12% difference Not bad given that we doubled the amount of data!
Default pgbench vs 3 “logging insert” triggers
All updated tables receive a logging entry, including pgbench_accounts, which already receives an insert as part of a regular transaction. This effectively doubles the amount of data written. Results – 1.173 vs 1.285 ~ 10%. Again, very tolerable penalties—in this case, nearly doubling the dataset at a fraction of the cost! This once again demonstrates how important transaction mechanics, communication latency, and the expensive but necessary fsync during commit actually have more of an impact than the extra data itself (provided, of course, that there aren’t many indexes on the data).
Summary table
Use Case | Latencies (ms) | Penalty per TX (%) |
For all three updated tables, compare PgBench default and with audit triggers. | 1.173 vs 1.178 | 0.4% |
pgbench_accounts table update versus one audit trigger | 0.390 vs 0.405 | 3.9% |
pgbench_accounts table update vs. 1 audit trigger written in “C” | 0.390 vs 0.401 | 2.8% |
One “insert logging” trigger as opposed to one single table update | 0.390 vs 0.436 | 11.8% |
Pgbench default in comparison to updated tables with three “insert logging” triggers | 1.173 vs 1.285 | 9.6% |
Bonus track – trigger trivia!
* Did you know that DDL triggers can be created in Postgres to capture, reject, and log structural changes for all types of database objects? During business hours, checking for full table rewrites might be the most common use case.
- In Postgres 10, they were also expanded with the “transition tables” feature, allowing you to inspect all rows changed by the statement to potentially do some summary aggregations or validation. Additionally, there are “statement level” triggers that are executed only once per SQL and are actually the default even if you don’t specify the level.
- When there are many triggers on a table, the trigger names are executed in alphabetical order, and in the case of BEFORE and INSTEAD OF triggers, the possibly modified row returned by each trigger is used as the input to the subsequent trigger.
- Although the situation can usually be improved with some sane WHEN conditions in trigger declarations, row level BEFORE triggers are much more “cheaper” than AFTER triggers when updating a large number of rows because they fire immediately vs. at the end of the statement in which case Postgres needs to temporarily store the row state information.
- But of course I’d advise you to keep triggers as simple as possible. For instance, it is possible to create an insert trigger that inserts again into the same table that triggered the trigger: Will there then be an infinite loop, eating up all your disk space? Yes, there would be…if max_stack_depth wouldn’t kick in.
- You aren’t actually limited to using the most well-known trigger languages, PL/pgSQL and the aforementioned “C,” to write triggers; at the very least, PL/Python and PL/Perl support triggers, and there may be others.
- Triggers on partitioned tables will be supported in Postgres 11, making it possible to declare them only once rather than separately for each sub-partition.
About Enteros
Enteros offers a patented database performance management SaaS platform. It finds the root causes of complex database scalability and performance problems that affect business across a growing number of cloud, RDBMS, NoSQL, and machine learning database platforms.
The views expressed on this blog are those of the author and do not necessarily reflect the opinions of Enteros Inc. This blog may contain links to the content of third-party sites. By providing such links, Enteros Inc. does not adopt, guarantee, approve, or endorse the information, views, or products available on such sites.
Are you interested in writing for Enteros’ Blog? Please send us a pitch!
RELATED POSTS
Revolutionizing Healthcare IT: Leveraging Enteros, FinOps, and DevOps Tools for Superior Database Software Management
- 21 November 2024
- Database Performance Management
In the fast-evolving world of finance, where banking and insurance sectors rely on massive data streams for real-time decisions, efficient anomaly man…
Optimizing Real Estate Operations with Enteros: Harnessing Azure Resource Groups and Advanced Database Software
In the fast-evolving world of finance, where banking and insurance sectors rely on massive data streams for real-time decisions, efficient anomaly man…
Revolutionizing Real Estate: Enhancing Database Performance and Cost Efficiency with Enteros and Cloud FinOps
In the fast-evolving world of finance, where banking and insurance sectors rely on massive data streams for real-time decisions, efficient anomaly man…
Enteros in Education: Leveraging AIOps for Advanced Anomaly Management and Optimized Learning Environments
In the fast-evolving world of finance, where banking and insurance sectors rely on massive data streams for real-time decisions, efficient anomaly man…