TABLE CHANGE: HIGH-AVAILABILITY CARED FOR

Home > Enteros’ Blog – Thoughts on Database Technology, Machine / Deep learning, and a Generative AI > Deep learning > TABLE CHANGE: HIGH-AVAILABILITY CARED FOR

Preamble

The data structure may occasionally need to be changed when an application is being used in production. Changing the number of columns, adding columns, removing columns, etc. Changes to data structures should not be made carelessly, though; there are some considerations to make.

The main problem with DDLs is that locks are sometimes held for a very long time, which can have serious repercussions if PostgreSQL is running on crucial production systems. I’m hoping that by using this blog, readers will manage DDLs more carefully.

Creating some demo data

To show how things work, the typical demo data is needed. In the event that the script below just generates 10 million simple integer values:

test=# CREATE TABLE data (id int); CREATE TABLE test=# INSERT INTO data SELECT * FROM generate_series(1, 10000000); INSERT 0 10000000

Be aware of locking: Indexing

The creation of indexes is one of the DDLs’ most frequent problems. When you create an index in PostgreSQL, writes are “normally” blocked while the index is being created. This can be very detrimental when dealing with very large tables. A large index simply takes time to build, and concurrent writes may suffer as illustrated in the following example:

Connection 1	Connection 2

BEGIN;
CREATE UNIQUE INDEX idx_id ON data (id);	BEGIN;
— running	INSERT INTO data VALUES (0);
— running	— waiting
COMMIT;	— will proceed
	COMMIT;

If the data is extremely large, the INSERT may have to wait an inordinate amount of time. This is especially important if there are many concurrent requests. Your web server may experience connection problems or run out of connections. In other words, a fix is required.

Because PostgreSQL will only stop writing while the index is being built, reading is still possible. This needs to be mentioned because it is crucial.

CREATE INDEX CONCURRENTLY is the best option if you cannot afford to lock a table while creating an index. CREATE INDEX CONCURRENTLY takes longer than a “normal” CREATE INDEX, but it lets you write to the index at the same time and keeps the table from getting locked up too much. Note that CREATE INDEX CONCURRENTLY is not guaranteed to work. If it doesn’t, you might end up with an invalid index that you have to drop by hand. However, if you are indexing a 1TB or larger table, there is no way around concurrent indexing if you are constantly under load, which is often the case.

Here’s an illustration: As you can see, there are no conflicts and no one has to wait an unreasonable amount of time.

Connection 1	Connection 2

CREATE UNIQUE INDEX CONCURRENTLY idx_id2 ON data (id);
— running	INSERT INTO data VALUES (-1);
— running	INSERT INTO data VALUES (-2);
— done	INSERT INTO data VALUES (-3);

Adding columns to tables in PostgreSQL

The most frequently encountered issue with PostgreSQL is probably adding a new column to a table. The fact that there are actually two cases is crucial in this situation:

A column being added without a default value
A column with a default value being added

What do those two cases have in store for us, shall we see?

Connection 1	Connection 2

BEGIN;
ALTER TABLE data ADD COLUMN x int;
Time: 1.241 ms	INSERT INTO data VALUES (-4);
COMMIT;	— waits a VERY short time

BEGIN;
ALTER TABLE data ADD COLUMN y int DEFAULT 0;
— we require time	INSERT VALUES OF (-5) INTO the data;
Time: 11888.54 ms	— holding out for a while
COMMIT;	— finally proceeds

As you can see, the ALTER TABLE… ADD COLUMN command completes in essentially 1 millisecond without the default value, which is perfectly acceptable. The concurrent write is not required to wait long. All will be well as long as the DDL can briefly obtain the table lock it requires. There is no need to rewrite the data files when adding a column in PostgreSQL that has no default value; instead, the metadata in the catalog is simply changed. Be aware that not all other database engines follow this rule, especially some pricy commercial ones.

If you require a default value, the narrative is slightly different. In this case it has to be written to disk. We generally advise staying away from operations of this nature. Rewriting things is not very appealing if your tables are large.

Making use of tablespaces

The use of tablespaces is a further crucial factor to take into account. Moving tables requires money, even though tablespaces are generally a nice way to scale I/O. Remember that ALTER TABLE… SET TABLESPACE locks the table and prevents writing.

As an illustration:

Connection 1	Connection 2

BEGIN;
ALTER TABLE data SET TABLESPACE some_storage;
— usually time consuming	INSERT INTO data VALUES (-6);
— usually time consuming	— waiting for a long time
COMMIT;	— finally done

The good news in this situation is that there is a feasible workaround for the table. The tool pg_squeeze is made to constrict a bloated table. It is capable of much more, including the “abuse” of moving a table from one tablespace to another. By using pg_squeeze, you can elegantly avoid heavy locking in addition to reducing your storage footprint. Check it out right away.

Adding foreign keys

Constraints and foreign keys are additional significant factors. PostgreSQL will check the table’s content to ensure the key’s accuracy if you want to add foreign keys or CHECK constraints. The table will once more be locked by PostgreSQL as a result of those checks:

 test=# ALTER TABLE data ADD CONSTRAINT constname FOREIGN KEY (id) REFERENCES data(id);

It is impossible to lock the table without adding keys ex post. As a result, it’s crucial to plan your actions and think ahead.

Dropping columns in PostgreSQL

Let us move on to something more relaxing: DROPPING A COLUMN. DROP TABLE in PostgreSQL does not rewrite the table. It merely renders it undetectable. VACUUM will handle cleanup asynchronously. Drop Column is thus extremely fast. To start a DROP COLUMN, there must be no concurrent reads, so keep that in mind:

Connection 1	Connection 2

	BEGIN;
	SELECT count(*) FROM data
ALTER TABLE data DROP COLUMN y;	— takes a while
— has to wait	— takes a while
— has to wait	— takes a while
— finally proceeds	COMMIT;

Otherwise, DROP COLUMN will immediately reappear.

There is always more…

Always, there is more to say. However, the examples provided in this blog cover some of the most prevalent problems that people encounter globally. Please feel free to add your thoughts and provide feedback.

About Enteros

Enteros offers a patented database performance management SaaS platform. It finds the root causes of complex database scalability and performance problems that affect business across a growing number of cloud, RDBMS, NoSQL, and machine learning database platforms.

The views expressed on this blog are those of the author and do not necessarily reflect the opinions of Enteros Inc. This blog may contain links to the content of third-party sites. By providing such links, Enteros Inc. does not adopt, guarantee, approve, or endorse the information, views, or products available on such sites.

Are you interested in writing for Enteros’ Blog? Please send us a pitch!

Enteros for SaaS: Database Performance Tracking and Cloud FinOps for Scalable Growth

25 February 2025
Database Performance Management

In the fast-evolving world of finance, where banking and insurance sectors rely on massive data streams for real-time decisions, efficient anomaly man…

Enhancing Database Performance in the Education Sector with Enteros: Leveraging RevOps and AIOps for Efficiency and Scalability

Database Performance Management

In the fast-evolving world of finance, where banking and insurance sectors rely on massive data streams for real-time decisions, efficient anomaly man…

Enteros for Pharma: Cost Estimation, AWS EC2 Performance, and Cloud FinOps for Scalable Operations

Database Performance Management

In the fast-evolving world of finance, where banking and insurance sectors rely on massive data streams for real-time decisions, efficient anomaly man…

Optimizing Database Performance for RevOps Success in the BFSI Sector with Enteros

Database Performance Management

In the fast-evolving world of finance, where banking and insurance sectors rely on massive data streams for real-time decisions, efficient anomaly man…

Preamble

Creating some demo data

Be aware of locking: Indexing

Adding columns to tables in PostgreSQL

Making use of tablespaces

Adding foreign keys

Dropping columns in PostgreSQL

There is always more…

About Enteros

RELATED POSTS

Enteros for SaaS: Database Performance Tracking and Cloud FinOps for Scalable Growth

Enhancing Database Performance in the Education Sector with Enteros: Leveraging RevOps and AIOps for Efficiency and Scalability

Enteros for Pharma: Cost Estimation, AWS EC2 Performance, and Cloud FinOps for Scalable Operations

Optimizing Database Performance for RevOps Success in the BFSI Sector with Enteros