Preamble
During recent PostgreSQL consulting, the subject of connection pools came up. Specifically, what strategies and products are widely used and work well? The subject itself is quite broad, but older people tend to be more familiar with it. However, it is worthwhile to briefly discuss some fundamental ideas and compare the two “near Postgres” products that you should be aware of: PGBouncer and pgpool-II.
To begin, connection pools are middleware that speak the database protocol and cache database connections so that clients can save time when opening a new connection by negotiating the connection, performing authentication, and setting client defaults (encoding, work_mem), as well as relieving the database server from storing too much client state in memory. Applications connect to the pool, mistaking it for the database.
Common methods for setting up pools:
- Integrated – using native language libraries that are used in-process, such as Python’s Pycopg or Java’s HikariCP
- Both methods make it more difficult to restrict the total number of connections to the database, especially application co-location, where the pooling server or product is situated on the same node as the application.
- The most flexible option, allowing for transparent switching of the underlying DB, is independent – the pool is on a separate machine.
- Pool and DB are co-located; normally, this would reduce high availability because the client would notice the DB disappearing.
- mixed approaches – for example, one could combine the aforementioned methods with HAProxy to increase availability
The usual suspects – PgBouncer and pgpool-II
Two products stand out when discussing separate pooling servers in the Postgres context: PgBouncer and pgpool-II. Both products are written in C and appear to be actively maintained, though pgpool-II seems to get more attention because it has more ambitious features. In addition to source code, packages for popular Linux distributions are also offered. Configuration file startup and deployment are both very straightforward. You should send your clients to the pools instead of the actual DB, since both use non-standard ports by default (6432 and 9999, respectively). Based on the available online documentation (PgBouncer in its current version 1.7.2 and pgpool-II in version 3.5.4, the latter with sadly some outdated parts), I compiled the following feature outline so that you can decide for yourself whether it is suitable for your needs.
pgpool-II features
- pooling of connections
- incoming connection queueing
- balancing the load on SELECT statements
* A built-in Postgres query parser with advanced features, including the use of ‘/*NO LOAD BALANCE*/’ as an Oracle-style hint.
* Various weights for various servers
* Function white-listing and black-listing (because functions are called via SELECT)
* Configurable replication delay threshold; in the event of an overflow, the master will be used. - Statement-level replication using multiple (master) servers to distribute the same query
* Compares the quantity of impacted rows
* Default handling for the timestamp column when performing inserts
* Constants will replace CURRENT_TIMESTAMP, CURRENT_DATE, and now().
* A method (table locking) for guaranteeing identical IDs for inserts into tables with serial numbers - HA attributes
* Auto failover (customizable scripts for callbacks)
* Gently add and remove replicas
* A watchdog that can move a virtual IP address in the event that Pgpool itself fails.
* Provisioning of replicas with a single click or command (example scripts provided) - cached querying
* Relay in-memory or “memcached”
* Time-based invalidation schemes and DML - complex options for pool management
* fundamental pool status using standard SHOW commands
* for starting, stopping, etc., use the command line utilities pcp_
* the pgpool_adm extension for SQL-based pool management
* the pgpoolAdmin web interface. - SSL assistance
- Optional access control and authentication layer (pg_hba.conf format)
- Most settings have an online configuration reload.
pgpool-II caught out
- only session-based pooling
- pooling for just one cluster
- no queries with multiple statements
- a failover will occur if you stop a backend using pg_terminate_backend()!
- no translations for multi-byte encoding; client must be aware of server encoding
PgBouncer features
- lightweight connection pooling (event-based architecture)
- 3 modes of pooling
* session (the default)
* the transaction
* declaration - graceful connection re-direction to a new node (for non SSL connections, *nix only)
- can combine numerous clusters and databases
- for example, pausing the pool and queuing incoming connections to restart a database without the clients being aware
- when connecting to the unique “pgbouncer” database, a straightforward management interface
* compilations of data - supports SSL
- optional access control and authentication layer (pg_hba.conf format)
- most settings have an online configuration reload
PgBouncer gets caught
- not automated
- real connection limits to the underlying database (max_client_conn, default_pool_size, max_db_connections, max_user_connections, min_pool_size, reserve_pool_size) that are not immediately apparent to the user
- errors in the connect_query (performed before the connection was handed to the client, such as encoding settings, work_mem, etc.) are not taken into account.
Testing performance
For skepticism’s sake, I put my colleague Ants’ claims that PgBouncer is significantly faster than pgpool-II to the test. I also made the decision to perform a quick set of tests using my laptop and every component. A small 13MB “pgbench” in-memory dataset in “select-only” mode was used for the test setup because we only wanted to test the connection overhead here. Pools were set up without SSL so that the tested number of 8 concurrent connections were always cached and no connection re-establishing occurred during the test. “Session pooling” was applied to PgBouncer by default.
pgbench -i -s 1 bench # init the bench schema ~13MB for port in 5432 6432 9999 ; do for i in {1..3} ; do pgbench --select-only --connect -T300 -c8 -j2 -p $port bench done done
A side note: Before I could really start testing, I encountered a distro-specific issue where connections started to break after a while, necessitating the modification of some kernel parameters.
Results were as follows, with the usual YMMV caveat:
- no pooling, average TPS 356
- TPS average for Pgpool-II: 3939 (10x overall improvement)
- PgBouncer’s average TPS of 6626 represents a 17x improvement over Pgpool.
Summary
PgBouncer and pgpool-II, two well-known and tried-and-true products, offer a good way to take advantage of performance’s low-hanging fruit (a very noticeable difference when performing very brief and simple transactions), as well as to add some flexibility to your setup by hiding the database from direct access, making it simpler to perform minor maintenance. However, due to its lightweight architecture and superior performance, PgBouncer would be my choice for the majority of use cases (no replicas or using external HA solutions).
About Enteros
Enteros offers a patented database performance management SaaS platform. It finds the root causes of complex database scalability and performance problems that affect business across a growing number of cloud, RDBMS, NoSQL, and machine learning database platforms.
The views expressed on this blog are those of the author and do not necessarily reflect the opinions of Enteros Inc. This blog may contain links to the content of third-party sites. By providing such links, Enteros Inc. does not adopt, guarantee, approve, or endorse the information, views, or products available on such sites.
Are you interested in writing for Enteros’ Blog? Please send us a pitch!
RELATED POSTS
Enhancing Identity and Access Management in Healthcare with Enteros
- 19 November 2024
- Database Performance Management
In the fast-evolving world of finance, where banking and insurance sectors rely on massive data streams for real-time decisions, efficient anomaly man…
Maximizing Efficiency with Enteros: Revolutionizing Cost Allocation Through a Cloud Center of Excellence
In the fast-evolving world of finance, where banking and insurance sectors rely on massive data streams for real-time decisions, efficient anomaly man…
Driving Efficiency in the Transportation Sector: Enteros’ Cloud FinOps and Database Optimization Solutions
- 18 November 2024
- Database Performance Management
In the fast-evolving world of finance, where banking and insurance sectors rely on massive data streams for real-time decisions, efficient anomaly man…
Empowering Nonprofits with Enteros: Optimizing Cloud Resources Through AIOps Platform
In the fast-evolving world of finance, where banking and insurance sectors rely on massive data streams for real-time decisions, efficient anomaly man…