Preamble
In SQL, sequences are a fundamental characteristic. Some users, however, are inclined to employ sequences to produce bills. That is risky and ought to be avoided. The central query is: why? What’s wrong with assigning distinct invoice numbers to clients using database-side sequences? Let’s get started and find out.
Getting started with CREATE SEQUENCE
It makes sense to look at the internals of CREATE SEQUENCE before attempting to determine the best technique to handle invoice numbers. The syntactic specification is pretty simple, and it lets us do a lot of things:
test=# \h CREATE SEQUENCE Command: CREATE SEQUENCE Description: define a new sequence generator Syntax: CREATE [ { TEMPORARY | TEMP } | UNLOGGED ] SEQUENCE [ IF NOT EXISTS ] name [ AS data_type ] [ INCREMENT [ BY ] increment ] [ MINVALUE minvalue | NO MINVALUE ] [ MAXVALUE maxvalue | NO MAXVALUE ] [ START [ WITH ] start ] [ CACHE cache ] [ [ NO ] CYCLE ] [ OWNED BY { table_name.column_name | NONE } ] URL: https://www.postgresql.org/docs/15/sql-createsequence.html
I have built a straightforward sequence without any elaborate parameters. When finished, use the nextval function to advance the sequence and return the result:
test=# CREATE SEQUENCE seq_a; CREATE SEQUENCE test=# SELECT nextval('seq_a'); nextval --------- 1 (1 row) test=# SELECT nextval('seq_a'); nextval --------- 2 (1 row)
It’s vital to note that the sequence produces a growing number in this case. The currval function can be used to determine which value has already been used:
test=# SELECT currval('seq_a'); currval --------- 2 (1 row)
A value made most recently by YOUR session will be returned thanks to currval. It is the last value consumed by your session, but it is not the last value issued by the sequence in your database, as PostgreSQL ensures. This is crucial because we can be sure that we will never obtain someone else’s value.
If we reconnect and make another call to currval, the behavior just described is clear:
test=# SELECT currval('seq_a'); ERROR: currval of sequence "seq_a" is not yet defined in this session
As you can see, an error has been raised because this session has not yet generated any values. This is logical because the sequence’s overall maximum value is useless in any case.
Sequences and transactions
So far, the A series appears to be a reasonable way to handle invoice IDs. Let’s examine what a sequence does in the event of unsuccessful transactions, though:
test=# SELECT nextval('seq_a'); nextval --------- 3 (1 row) test=# BEGIN; BEGIN test=*# SELECT nextval('seq_a'); nextval --------- 4 (1 row) test=*# ROLLBACK; ROLLBACK test=# SELECT nextval('seq_a'); nextval --------- 5 (1 row)
Be aware that 5 rather than 4 comes after the ROLLBACK statement. Most people anticipate that the sequence will also rollback, but this is not the case. A series makes sure that the numbers it makes always go up, so it can’t be used to fill in gaps in the list of numbers.
You can read this article to have a clearer understanding of how sequences and transactions interact.
Invoice IDs done the wrong way
Sequences are not always a possibility, so we can try a variety of other strategies:
test=# CREATE TABLE t_invoice ( inv_id int PRIMARY KEY, whatever text ); CREATE TABLE test=# INSERT INTO t_invoice VALUES (1, 'whatever invoice data'); INSERT 0 1
An invoice ID and a few other fields in our invoice table are unrelated to the main issue we’re trying to fix here.
Keep in mind that invoice IDs must be strictly unique, ascending numbers without any gaps (by law). So many folks attempt a straightforward strategy:
test=# INSERT INTO t_invoice SELECT max(inv_id) + 1, 'more invoice data' FROM t_invoice; INSERT 0 1
Why is this strategy flawed? One term best describes the issue: concurrency. What if the same business is run by two people? The results of both queries will be the same. A primary key violation will logically follow as max(id) + 1 will be the same.
One solution to this issue is to incorporate a retry-loop into the application, which, in the event that a key violation occurs, simply executes the query again. This is fine if there aren’t too many invoices. But if you need to handle thousands of purchases per second, you might not want to use a retry loop because too many key collisions are likely to happen.
Using brute force locking
How can we resolve such disputes? Locking the table is one option. The issue is that we require a lock that blocks both writes and reads. We must make sure that only one transaction can compute the prior maximum value at once. See if we can make this work:
test=# BEGIN; BEGIN test=*# LOCK TABLE t_invoice IN ACCESS EXCLUSIVE MODE; LOCK TABLE test=*# INSERT INTO t_invoice SELECT max(inv_id) + 1, 'high value client' FROM t_invoice; INSERT 0 1 test=*# COMMIT; COMMIT
Technically speaking, this is true, but locking the entire table hurts because it will affect other transactions. Imagine a reporting task that wants to summarize yesterday’s sales; it has to read the database, but it is unable to do so since invoice production keeps brutally locking up the table in PostgreSQL. This is obviously not a choice.
Using SERIALIZABLE transactions
The alternative to a table lock is to use a SERIALIZABLE
transaction:
test=# BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE; BEGIN test=*# INSERT INTO t_invoice SELECT max(inv_id) + 1, 'fancy support client' FROM t_invoice; INSERT 0 1 test=*# COMMIT; COMMIT
The appeal of a serializable transaction is that concurrency and locking are completely irrelevant. Things will be organized for you by the PostgreSQL core. Remember that serializable does have some overhead, but it also provides solutions to a few frequent issues. Generally speaking, it is preferable to have poorer single thread performance in order to defeat the brute force table lock by increasing system concurrency. Also keep in mind that waiting is the only way to execute at a slower pace.
Using an ID table to minimize locking
There is another solution to the issue if serializable is not what you desire. One option is to add a separate table that only has the most recent information.
test=# CREATE TABLE t_invoice_id AS SELECT 1 AS id; SELECT 1
It’s fascinating. We can perform an UPDATE statement on this new table in order to retrieve the new id. This prevents a lock from forming on the invoice table, but it also makes sure that only one number can be generated at a time, ensuring that those numbers are both unique and ascending. Basically, we use this one line in the invoice ID to centralize the lock in a good way.
The invoice table may be filled out as follows:
test=# WITH x AS (UPDATE t_invoice_id SET id = id + 1 RETURNING * ) INSERT INTO t_invoice SELECT x.id, 'cool client' FROM x; INSERT 0 1
One effective method to complete this action in a single statement is to use a CTE (=Common Table Expression).
About Enteros
Enteros offers a patented database performance management SaaS platform. It proactively identifies root causes of complex business-impacting database scalability and performance issues across a growing number of clouds, RDBMS, NoSQL, and machine learning database platforms.
The views expressed on this blog are those of the author and do not necessarily reflect the opinions of Enteros Inc. This blog may contain links to the content of third-party sites. By providing such links, Enteros Inc. does not adopt, guarantee, approve, or endorse the information, views, or products available on such sites.
Are you interested in writing for Enteros’ Blog? Please send us a pitch!
RELATED POSTS
Enhancing Identity and Access Management in Healthcare with Enteros
- 19 November 2024
- Database Performance Management
In the fast-evolving world of finance, where banking and insurance sectors rely on massive data streams for real-time decisions, efficient anomaly man…
Maximizing Efficiency with Enteros: Revolutionizing Cost Allocation Through a Cloud Center of Excellence
In the fast-evolving world of finance, where banking and insurance sectors rely on massive data streams for real-time decisions, efficient anomaly man…
Driving Efficiency in the Transportation Sector: Enteros’ Cloud FinOps and Database Optimization Solutions
- 18 November 2024
- Database Performance Management
In the fast-evolving world of finance, where banking and insurance sectors rely on massive data streams for real-time decisions, efficient anomaly man…
Empowering Nonprofits with Enteros: Optimizing Cloud Resources Through AIOps Platform
In the fast-evolving world of finance, where banking and insurance sectors rely on massive data streams for real-time decisions, efficient anomaly man…