Introduction to Connection Pooling: PGBouNCER and PGPOOL-II

Home > Enteros’ Blog – Thoughts on Database Technology, Machine / Deep learning, and a Generative AI > Technology Business > Machine learning > Introduction to Connection Pooling: PGBouNCER and PGPOOL-II

Preamble

During recent PostgreSQL consulting, the subject of connection pools came up. Specifically, what strategies and products are widely used and work well? The subject itself is quite broad, but older people tend to be more familiar with it. However, it is worthwhile to briefly discuss some fundamental ideas and compare the two “near Postgres” products that you should be aware of: PGBouncer and pgpool-II.

To begin, connection pools are middleware that speak the database protocol and cache database connections so that clients can save time when opening a new connection by negotiating the connection, performing authentication, and setting client defaults (encoding, work_mem), as well as relieving the database server from storing too much client state in memory. Applications connect to the pool, mistaking it for the database.

Common methods for setting up pools:

Integrated – using native language libraries that are used in-process, such as Python’s Pycopg or Java’s HikariCP
Both methods make it more difficult to restrict the total number of connections to the database, especially application co-location, where the pooling server or product is situated on the same node as the application.
The most flexible option, allowing for transparent switching of the underlying DB, is independent – the pool is on a separate machine.
Pool and DB are co-located; normally, this would reduce high availability because the client would notice the DB disappearing.
mixed approaches – for example, one could combine the aforementioned methods with HAProxy to increase availability

The usual suspects – PgBouncer and pgpool-II

Two products stand out when discussing separate pooling servers in the Postgres context: PgBouncer and pgpool-II. Both products are written in C and appear to be actively maintained, though pgpool-II seems to get more attention because it has more ambitious features. In addition to source code, packages for popular Linux distributions are also offered. Configuration file startup and deployment are both very straightforward. You should send your clients to the pools instead of the actual DB, since both use non-standard ports by default (6432 and 9999, respectively). Based on the available online documentation (PgBouncer in its current version 1.7.2 and pgpool-II in version 3.5.4, the latter with sadly some outdated parts), I compiled the following feature outline so that you can decide for yourself whether it is suitable for your needs.

pgpool-II features

pooling of connections
incoming connection queueing
balancing the load on SELECT statements
* A built-in Postgres query parser with advanced features, including the use of ‘/*NO LOAD BALANCE*/’ as an Oracle-style hint.
* Various weights for various servers
* Function white-listing and black-listing (because functions are called via SELECT)
* Configurable replication delay threshold; in the event of an overflow, the master will be used.
Statement-level replication using multiple (master) servers to distribute the same query
* Compares the quantity of impacted rows
* Default handling for the timestamp column when performing inserts
* Constants will replace CURRENT_TIMESTAMP, CURRENT_DATE, and now().
* A method (table locking) for guaranteeing identical IDs for inserts into tables with serial numbers
HA attributes
* Auto failover (customizable scripts for callbacks)
* Gently add and remove replicas
* A watchdog that can move a virtual IP address in the event that Pgpool itself fails.
* Provisioning of replicas with a single click or command (example scripts provided)
cached querying
* Relay in-memory or “memcached”
* Time-based invalidation schemes and DML
complex options for pool management
* fundamental pool status using standard SHOW commands
* for starting, stopping, etc., use the command line utilities pcp_
* the pgpool_adm extension for SQL-based pool management
* the pgpoolAdmin web interface.
SSL assistance
Optional access control and authentication layer (pg_hba.conf format)
Most settings have an online configuration reload.

pgpool-II caught out

only session-based pooling
pooling for just one cluster
no queries with multiple statements
a failover will occur if you stop a backend using pg_terminate_backend()!
no translations for multi-byte encoding; client must be aware of server encoding

PgBouncer features

lightweight connection pooling (event-based architecture)
3 modes of pooling
* session (the default)
* the transaction
* declaration
graceful connection re-direction to a new node (for non SSL connections, *nix only)
can combine numerous clusters and databases
for example, pausing the pool and queuing incoming connections to restart a database without the clients being aware
when connecting to the unique “pgbouncer” database, a straightforward management interface
* compilations of data
supports SSL
optional access control and authentication layer (pg_hba.conf format)
most settings have an online configuration reload

PgBouncer gets caught

not automated
real connection limits to the underlying database (max_client_conn, default_pool_size, max_db_connections, max_user_connections, min_pool_size, reserve_pool_size) that are not immediately apparent to the user
errors in the connect_query (performed before the connection was handed to the client, such as encoding settings, work_mem, etc.) are not taken into account.

Testing performance

For skepticism’s sake, I put my colleague Ants’ claims that PgBouncer is significantly faster than pgpool-II to the test. I also made the decision to perform a quick set of tests using my laptop and every component. A small 13MB “pgbench” in-memory dataset in “select-only” mode was used for the test setup because we only wanted to test the connection overhead here. Pools were set up without SSL so that the tested number of 8 concurrent connections were always cached and no connection re-establishing occurred during the test. “Session pooling” was applied to PgBouncer by default.

 pgbench -i -s 1 bench	# init the bench schema ~13MB for port in 5432 6432 9999 ; do for i in {1..3} ; do 	pgbench --select-only --connect -T300 -c8 -j2 -p $port bench done done

A side note: Before I could really start testing, I encountered a distro-specific issue where connections started to break after a while, necessitating the modification of some kernel parameters.

Results were as follows, with the usual YMMV caveat:

no pooling, average TPS 356
TPS average for Pgpool-II: 3939 (10x overall improvement)
PgBouncer’s average TPS of 6626 represents a 17x improvement over Pgpool.

Summary

PgBouncer and pgpool-II, two well-known and tried-and-true products, offer a good way to take advantage of performance’s low-hanging fruit (a very noticeable difference when performing very brief and simple transactions), as well as to add some flexibility to your setup by hiding the database from direct access, making it simpler to perform minor maintenance. However, due to its lightweight architecture and superior performance, PgBouncer would be my choice for the majority of use cases (no replicas or using external HA solutions).

About Enteros

Enteros offers a patented database performance management SaaS platform. It finds the root causes of complex database scalability and performance problems that affect business across a growing number of cloud, RDBMS, NoSQL, and machine learning database platforms.

The views expressed on this blog are those of the author and do not necessarily reflect the opinions of Enteros Inc. This blog may contain links to the content of third-party sites. By providing such links, Enteros Inc. does not adopt, guarantee, approve, or endorse the information, views, or products available on such sites.

Are you interested in writing for Enteros’ Blog? Please send us a pitch!

Enteros, Database Performance, and Generative AI in Cloud FinOps for the Education Sector

27 February 2025
Database Performance Management

In the fast-evolving world of finance, where banking and insurance sectors rely on massive data streams for real-time decisions, efficient anomaly man…

Optimizing Pharmaceutical Operations with Enteros: Enhancing Database Performance and Efficiency with AIOps Platforms

Database Performance Management

In the fast-evolving world of finance, where banking and insurance sectors rely on massive data streams for real-time decisions, efficient anomaly man…

Enteros for Media & Entertainment: Database Performance, Cloud FinOps, and Observability in a High-Demand Industry

26 February 2025
Database Performance Management

In the fast-evolving world of finance, where banking and insurance sectors rely on massive data streams for real-time decisions, efficient anomaly man…

Enhancing Enterprise Performance in Healthcare: RevOps Strategies and Observability Platforms for Optimized Operations

Database Performance Management

In the fast-evolving world of finance, where banking and insurance sectors rely on massive data streams for real-time decisions, efficient anomaly man…

Preamble

The usual suspects – PgBouncer and pgpool-II

pgpool-II features

PgBouncer features

Testing performance

Summary

About Enteros

RELATED POSTS

Enteros, Database Performance, and Generative AI in Cloud FinOps for the Education Sector

Optimizing Pharmaceutical Operations with Enteros: Enhancing Database Performance and Efficiency with AIOps Platforms

Enteros for Media & Entertainment: Database Performance, Cloud FinOps, and Observability in a High-Demand Industry

Enhancing Enterprise Performance in Healthcare: RevOps Strategies and Observability Platforms for Optimized Operations