COLUMNS FOR POSTGRESQL AUTO-GENERATED PRIMARY KEYS THAT ARE UUID, SERIAL, OR IDENTITY?

Home > Enteros’ Blog – Thoughts on Database Technology, Machine / Deep learning, and a Generative AI > Deep learning > COLUMNS FOR POSTGRESQL AUTO-GENERATED PRIMARY KEYS THAT ARE UUID, SERIAL, OR IDENTITY?

Preamble

Customers occasionally ask me which option is best for automatically generated primary keys. I’ll discuss the options and offer suggestions in this article.

A primary key is necessary for every table. It’s crucial to be able to recognize a specific table row in a relational database. If you’re curious as to why, look up any of the countless questions requesting assistance in deleting duplicate entries from a table on the internet.

You would be wise to select a primary key that is both distinct and immutable over the course of a table row. This is due to the fact that primary keys are frequently referenced by foreign key constraints, which makes it difficult or time-consuming to change a primary key that is used elsewhere.

Nevertheless, a table can occasionally have a natural primary key, such as a citizen’s social security number. However, in most cases, this attribute is absent, so you must create a fake primary key. Even though there is a natural primary key, some people argue that you should use a synthetic one instead. I won’t get into that “holy war.”

Techniques for auto-generated primary keys in PostgreSQL

There are two fundamental methods:

Key generation using a sequence

A sequence is a database object whose sole function is to generate unique numbers. It does this by increasing a built-in counter.

Sequences are designed to work well with multiple users at the same time, and they will never give out the same number twice. Still, accessing a sequence from many concurrent SQL statements could become a bottleneck, so there is the CACHE option that makes the sequence hand out several values at once to database sessions.

Sequences don’t follow the usual rules for transactions in that when a transaction is rolled back, the counter does not start over. It is not a problem because this is necessary for effective performance. If you want to make a list of numbers without any gaps, a sequence is not the best way to do it. Instead, you have to use less effective and more difficult methods.

To fetch the next value from a sequence, you use the nextval function like this:

"sequence_name"); SELECT nextval;

For more information on how to manipulate sequences, consult the documentation.

producing UUIDs

UUID stands for “universally unique identifier.” It is a 128-bit number that is made by an algorithm that makes sure it is unique. For that, there are several widely used algorithms. There are several functions in PostgreSQL that produce UUIDs:

The uuid-ossp extension offers functions to generate UUIDs. Note that because of the hyphen in the name, you have to quote the name of the extension (CREATE EXTENSION "uuid-ossp";).
From PostgreSQL v13 on, you can use the core function gen_random_uuid() to generate version-4 (random) UUIDs.

Note that you should always use the PostgreSQL data type uuid for UUIDs. Don’t try to convert them to strings, or numeric — you will waste space and lose performance.

Defining auto-generated primary keys

A column with automatically generated values can be defined in one of four ways.

Using the `DEFAULT` clause

You can use this method with sequences and UUIDs. Here are some examples:

CREATE TABLE has_integer_pkey (id: bigint, nextval("integer_id_seq") as default, primary key,...); CREATE TABLE has_uuid_pkey ( id uuid DEFAULT gen_random_uuid() PRIMARY KEY, ... );

PostgreSQL uses the DEFAULT value whenever theINSERT statement doesn’t explicitly insert that column.

Using the `serial` and `bigserial` pseudo-types

This method is a shortcut for defining a sequence and setting a DEFAULT clause as above. With this technique, a table is defined as follows:

 CREATE TABLE uses_serial ( id bigserial PRIMARY KEY, ... );

That amounts to the following:

CREATE TABLE (id bigint PRIMARY KEY,...); OWNED BY uses_serial.id; CREATE SEQUENCE uses_serial_id_seq; Nextval('uses_serial_id_seq'): ALTER TABLE uses_serial ALTER id SET DEFAULT;

The “OWNED BY” clause adds a dependency between the column and the sequence, so that dropping the column automatically drops the sequence.

Using serial will create an integer column, while bigserial will create a bigint column.

Using identity columns

Since identity columns are made by PostgreSQL “behind the scenes,” this is another use for a sequence.

(id bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY, CREATE TABLE USES_IDENTITY);

There is also “GENERATED BY DEFAULT AS IDENTITY”, which is the same except that you won’t get an error message if you try to explicitly insert a value for the column (much like with a DEFAULT clause). See below for more!

You can specify sequence options for identity columns:

 CREATE TABLE uses_identity ( id bigint GENERATED ALWAYS AS IDENTITY (MINVALUE 0 START WITH 0 CACHE 20) PRIMARY KEY, ... );

Using `BEFORE INSERT` triggers

This is similar toDEFAULT values, but it allows you to unconditionally override a value inserted by the user with a generated value. The big disadvantage of a trigger is the performance impact.

Should I use integer(serial) or bigint(bigserial) for my auto-generated primary key?

You should always use bigint.

True, an integer occupies four bytes, while a bigint needs eight. But:

If you have a small table, where integer would suffice, the four wasted bytes won’t matter much. Also, not every table that you designed to be small will remain small!
If you have a big table, you might exceed the maximum for integer, which is 2147483647. Note that that could also happen if your table contains fewer rows than that: you might delete rows, and some sequence values can get “lost” by transactions that are rolled back.
Now it is quite complicated to change the primary key column from integer to bigint in a big table inside an active database without causing excessive down time, so you should save yourself that pain.

With bigint, you are certain to never exceed the maximum of 9223372036854775807: even if you insert 10000 rows per second without any pause, you have almost 30 million years before you reach the limit.

Should I use bigserial or an identity column for my auto-generated primary key?

You should use an identity column, unless you have to support old PostgreSQL versions.

Identity columns were introduced in PostgreSQL v11, and they have two advantages overbigserial:

They comply with the SQL standard, while bigserial is proprietary PostgreSQL syntax. This will make your code more portable.
If you use GENERATED ALWAYS AS IDENTITY, you will get an error message if you try to override the generated value by explicitly inserting a number. This avoids the common problem that manually entered values will conflict with generated values later on, causing surprising application errors.

So unless you have to support PostgreSQL v10 or below, there is no reason to use bigserial.

Should I use bigint or uuid for an auto-generated primary key?

Use a sequence unless you use database sharding or have another reason to make primary keys in a “decentralized” way (outside of a single database).

Real differences

The advantages of bigint are clear:

bigint uses only eight bytes, while uuid uses 16
It is less expensive to retrieve a value from a sequence than to create a UUID.

The fact that a sequence is a single object in a single database is one drawback of using one. So, you can’t use a sequence if you use sharding, which is when you split your data up and put it in different databases. UUIDs are an obvious solution in this scenario. (You could use sequences defined with an INCREMENT greater than 1 and different START values, but that might lead to problems when you add additional shards.)

Of course, you will also want to use UUIDs if your primary key was made in an application that was spread across multiple application servers instead of being generated automatically by the database.

Imaginary differences

Some argue that UUIDs are superior because they distribute writes across multiple pages of the primary key index. This is meant to cut down on competition and make the index more balanced, or less broken up. The first statement is true, but it might work against you because it needs to cache the whole index to work well. The second is definitely wrong, since B-tree indices are always balanced. Additionally, a change in PostgreSQL v11 made sure that monotonically increasing values would fill an index more effectively than random inserts ever could (but obviously, subsequent deletes would result in fragmentation). In short, any such advantages are either marginal or nonexistent, and they are more than balanced by the fact that uuid uses twice as much storage, which will make the index bigger, causing more writes and occupying more of your cache.

Aspects of security

People have said that sequence-generated primary keys can leak information because they make it possible to figure out the general order in which rows were added to a table (see, for example, the comments below). That is true, even though I have a hard time imagining a situation in which this would really be a security risk. If that worries you, use UUIDs and put your worries to rest.

Benchmarking bigint versus uuid

My co-worker Kaarel ran a small performance test a while ago and found that uuid was slower thanbigint when it came to bigger joins.

I decided to run a small insert-only benchmark with these two tables:

make a test_bigint table called "unlogged" with the primary key "id bigint" generated with a 200-cell cache; Create an unlogged table with the following formula: test_uuid (id uuid DEFAULT gen_random_uuid() PRIMARY KEY);

I performed the benchmark on my laptop (SSD, 8 cores) with a pgbench custom script that had 6 concurrent clients repeatedly run transactions of 1000 prepared INSERT statements for five minutes:

 INSERT INTO test_bigint /* or test_uuid */ DEFAULT VALUES;

Performance comparison bigint versus uuid autogenerated primary keys

	`bigint`	`uuid`
inserts per second	107090	74947
index growth per row	30.5 bytes	41.7 bytes

Usingbigint clearly wins, but the difference is not spectacular.

Conclusion

As auto-generated primary keys, you can use both UUIDs and numbers made by a sequence. Use identity columns unless you need to generate primary keys outside a single database, and make sure all your primary key columns are of type bigint.

About Enteros

Enteros offers a patented database performance management SaaS platform. It finds the root causes of complex database scalability and performance problems that affect business across a growing number of cloud, RDBMS, NoSQL, and machine learning database platforms.

The views expressed on this blog are those of the author and do not necessarily reflect the opinions of Enteros Inc. This blog may contain links to the content of third-party sites. By providing such links, Enteros Inc. does not adopt, guarantee, approve, or endorse the information, views, or products available on such sites.

Are you interested in writing for Enteros’ Blog? Please send us a pitch!

Enteros and Cloud FinOps: Transforming Database Performance and Observability in the Real Estate Sector

18 February 2025
Database Performance Management

In the fast-evolving world of finance, where banking and insurance sectors rely on massive data streams for real-time decisions, efficient anomaly man…

Optimizing Database Performance for EdTech Firms: How Enteros Enhances RevOps with Advanced Database Performance Software

Database Performance Management

In the fast-evolving world of finance, where banking and insurance sectors rely on massive data streams for real-time decisions, efficient anomaly man…

Enteros, Balance Sheet Analysis, Generative AI, and Performance Management in the Insurance Sector

17 February 2025
Database Performance Management

In the fast-evolving world of finance, where banking and insurance sectors rely on massive data streams for real-time decisions, efficient anomaly man…

Enhancing Enterprise Performance Management in the IT Sector with Enteros: Leveraging Cloud FinOps and AIOps for Optimal Efficiency

Database Performance Management

In the fast-evolving world of finance, where banking and insurance sectors rely on massive data streams for real-time decisions, efficient anomaly man…

Preamble

Techniques for auto-generated primary keys in PostgreSQL

Key generation using a sequence

producing UUIDs

Defining auto-generated primary keys

Using the DEFAULT clause

Using the serial and bigserial pseudo-types

Using identity columns

Using BEFORE INSERT triggers

Should I use integer(serial) or bigint(bigserial) for my auto-generated primary key?

Should I use bigserial or an identity column for my auto-generated primary key?

Should I use bigint or uuid for an auto-generated primary key?

Real differences

Imaginary differences

Aspects of security

Benchmarking bigint versus uuid

Conclusion

About Enteros

RELATED POSTS

Enteros and Cloud FinOps: Transforming Database Performance and Observability in the Real Estate Sector

Optimizing Database Performance for EdTech Firms: How Enteros Enhances RevOps with Advanced Database Performance Software

Enteros, Balance Sheet Analysis, Generative AI, and Performance Management in the Insurance Sector

Enhancing Enterprise Performance Management in the IT Sector with Enteros: Leveraging Cloud FinOps and AIOps for Optimal Efficiency

Using the `DEFAULT` clause

Using the `serial` and `bigserial` pseudo-types

Using `BEFORE INSERT` triggers