SCALING SUGGESTIONS FOR MULTI-TERABYTE AND BEYOND POSTGRESQL

Home > Enteros’ Blog – Thoughts on Database Technology, Machine / Deep learning, and a Generative AI > Technology Business > Machine learning > SCALING SUGGESTIONS FOR MULTI-TERABYTE AND BEYOND POSTGRESQL

Preamble

I had the thought that there aren’t enough articles on PostgreSQL features and “hacks” to achieve scalability to get the most out of the hardware, to safely accommodate decent multi-terabyte size databases after having a recent conversation with a DBA who was taking the first steps toward migrating a large database away from Oracle to Postgres. I was very surprised that there was so much concern that PostgreSQL is somehow very limited in scaling issues, especially given the wide range of options available. Well, maybe that was the case in the past (I started using Postgres in 2011), but as of 2018, things are actually pretty stable. Please continue reading for some tips on how to manage terabytes with ease.

Standard Postgres facilities

The safest way to scale would be to stick with tested out-of-the-box features of Postgres; therefore, first I’d recommend taking a look at the following keywords with some brief explanations. Maybe it’s all you need.

Special-purpose, light-weight indices

It is very common for the indexes to actually take up much more disk space than the table files containing the data for a complex OLTP system that supports hundreds of bizarre queries. To improve this (especially for infrequently used indexes), index sizes can be drastically reduced by using partial, BRIN, GIN, or even a bit experimental BLOOM indexes. There are 7 different index types supported in total, but the majority of users only use the default B-tree, which is a big mistake in a multi-TB environment.

Partial indices only permit a portion of the data; for instance, in a sales system, we might not be interested in quick access to orders with the status “FINISHED” (some nightly reports typically deal with that, and they can take their time), so why should we index such rows?

The most well-known non-default index type, GIN, has been around for a very long time (full-text search), and in short, it is ideal for indexing columns with many repeating values, such as statuses of all kinds or the classic Mr./Mrs./Miss. As for the default B-tree, you’ll have, for example, 1 million leaf nodes with the integer “1” in it, GIN only stores every unique column value once.

While still very effective on ordered values, BRIN (block-range, also known as min-max index) is a newer and very different index type. It has a very small disk footprint and only indexes the largest and smallest values for a range of rows (1 MB section of a table by default). For example, it is ideal for time series data or other “log” type of tables.

BLOOM may be unusual, but if you can come up with a good use case (“bitmap/matrix search”) for it, it can be up to 20 times more effective than traditional indexing. If this seems too abstract to you, see here for an example use case.

But why did I rank indexing as the first on the list, considering how unoriginal it is? Since the DBA can easily make it work behind the scenes as a one-time effort, this solution has the big benefit of not requiring any application changes. Perfect.

partitioning a table

In a “get your hands dirty” sort of way, Postgres has been partitioning for at least 15 years. To attach partitions, add check constraints, and insert rows directly into the right child tables, or route them there using insert triggers on the parent table, required some low-level management. All of this is past tense because declarative partitioning first appeared in Postgres version 10 and only improves with version 11, when overarching indexes that support primary and foreign keys make the functionality feature complete.

Why then bother? Partitioning has the following advantages: it allows us to cleanly separate “cold data” and “hot data”—which gives us some nice options like compacting old data maximally with VACUUM FULL or placing it on another medium (see “tablespaces” below), and as a side effect, we have smaller indexes, which take a lot less space in the “shared_buffers,” so we have more space for data there. The impact is greatest for uniformly accessed data scenarios (by name, email, or hash), in which every component of a large index must still be traversed, read, and cached even though only a small portion of it is actually used. Similar to indexes, partitioning can be implemented in the background by DBAs without requiring any code changes when the application conditions are favorable.

Tablespaces

As was already mentioned, tablespaces allow for the selective moving of tables and indexes to various disk media. Here, one can accomplish a variety of objectives: to simply save money by using slower/cheaper disk partitions for “cold” data; to keep only the most recent/important data on fast/expensive media; to use special compressed file systems for data that has a lot of repetitions; to use some network shares; or even to use in-memory file systems on remote nodes for massive non-persistent data. The only transfer of existing tables or indexes during live operation can be problematic due to full locking, but managing tablespaces is actually quite simple.

maximizing multi-process features

Some common operations on data can be parallelized starting with Postgres 9.6. The appropriate parameter “max_parallel_workers_per_gather” is also enabled by default in Postgres 10/11 with the value of 2, so max. Two background processes are employed. Though it might be sensible to increase it even further (along with some related parameters) for “Big Data” use cases. Additionally, it is reasonable to anticipate that, as with previous major releases, support for operations that can be parallelized will continue to grow. Version 11 will, for example, support parallel hash joins, index scans, and UNION-s.

using replicas to balance the load of queries

We are now moving beyond the “single node” or “scaling up” realm, but everyone should be able to implement it given the very low hardware costs and the availability of helpful Postgres cluster management software (Patroni is our favorite). This type of scaling can, of course, only be used if you’re mostly just reading data, because currently (and in the coming years), only a single “master” or “primary” node in a cluster can accept writes. Additionally, depending on the technical stack you choose, going down this road may require you to deal with some technical details (especially connection routing), but Postgres 10 did add support for the multi-host use case at the driver level, so the work is already done! Additionally, starting with Postgres 9.6, the replicas can be operated in “mirrored” mode, making it irrelevant on which node you are running SELECT! Just a friendly reminder that this only functions effectively if the read queries are only OLTP, or extremely quick.

Approaches with some compromises

We’ve finished the conventional material, but if you’re ready to venture off the beaten path, keep reading.
One may very well be able to squeeze every last bit of performance from single node hardware by tweaking the application you’re using a little bit, and perhaps trying out some extensions with amusing names. What I mean is this:

Foreign or hybrid tables

What I refer to as “hybrid tables” are actually based on Postgres’ excellent SQL MED standard implementation, also known as Foreign Data Wrappers, and they essentially look like normal Postgres tables for read queries, but the data may reside in or be piped over from literally anywhere — it may come from Twitter, LDAP, or Amazon S3; see here for the full list of crazy datasources supported. Foreign Data Wrappers (FDWs) are probably most commonly used to expose regular (properly formatted) files as tables, for example, to expose the server log as a table to facilitate monitoring.

But where is the scaling part, you may wonder? The FDW approach works very well in that it allows you to reduce the amount of data by using clever file formats or simply compression, which typically reduces the data size by 10x-20x so that it fits on the node! This works great for “cold” data and frees up more disk and cache space for actual tables with “hot” data. There is sample code here. It is also very simple to implement since Postgres 10.

Utilizing the columnar data storage format (ORC) is another extremely promising use case; for more information, see the “c_store” extension project. With tables that are up to 10 times smaller and queries that are up to 100% faster, it is especially well suited for aiding in the scaling of large Data Warehouses.

Though Foreign Data Wrapper infrastructure is firmly ingrained in Postgres, why didn’t I add this feature to the aforementioned “Standard Postgres facilities” section? The drawback is that its use is somewhat constrained because you typically cannot add indexes or constraints or change data via SQL.

Sharding, also known as foreign table inheritance!

Essentially the same as the previous point, but with table partitioning and child tables located on distant nodes! The information could be sent to a nearby Postgres server and automatically retrieved over the network as required. Furthermore, they don’t even have to be Postgres tables! Any popular server that performs well for a specific subset of queries could be used, including MySQL, Oracle, MS SQL, and others. How awesome is that, now? Though only “postgres_fdw” supports all write operations, transactions, and clever filter push-downs so that data amounts passed over the wire are minimized, Postgres-to-Postgres interactions are expected to produce the best results.

About Enteros

Enteros offers a patented database performance management SaaS platform. It finds the root causes of complex database scalability and performance problems that affect business across a growing number of cloud, RDBMS, NoSQL, and machine learning database platforms.

The views expressed on this blog are those of the author and do not necessarily reflect the opinions of Enteros Inc. This blog may contain links to the content of third-party sites. By providing such links, Enteros Inc. does not adopt, guarantee, approve, or endorse the information, views, or products available on such sites.

Are you interested in writing for Enteros’ Blog? Please send us a pitch!

Enteros for Media & Entertainment: Database Performance, Cloud FinOps, and Observability in a High-Demand Industry

26 February 2025
Database Performance Management

In the fast-evolving world of finance, where banking and insurance sectors rely on massive data streams for real-time decisions, efficient anomaly man…

Enhancing Enterprise Performance in Healthcare: RevOps Strategies and Observability Platforms for Optimized Operations

Database Performance Management

In the fast-evolving world of finance, where banking and insurance sectors rely on massive data streams for real-time decisions, efficient anomaly man…

Enteros for SaaS: Database Performance Tracking and Cloud FinOps for Scalable Growth

25 February 2025
Database Performance Management

In the fast-evolving world of finance, where banking and insurance sectors rely on massive data streams for real-time decisions, efficient anomaly man…

Enhancing Database Performance in the Education Sector with Enteros: Leveraging RevOps and AIOps for Efficiency and Scalability

Database Performance Management

In the fast-evolving world of finance, where banking and insurance sectors rely on massive data streams for real-time decisions, efficient anomaly man…

Preamble

Standard Postgres facilities

Approaches with some compromises

About Enteros

RELATED POSTS

Enteros for Media & Entertainment: Database Performance, Cloud FinOps, and Observability in a High-Demand Industry

Enhancing Enterprise Performance in Healthcare: RevOps Strategies and Observability Platforms for Optimized Operations

Enteros for SaaS: Database Performance Tracking and Cloud FinOps for Scalable Growth

Enhancing Database Performance in the Education Sector with Enteros: Leveraging RevOps and AIOps for Efficiency and Scalability