Preamble
Can and should we run production Postgres workloads in a Docker? is a major query we frequently get. Does it function? Short version of the answer: If you really want it to, it will work. or if it’s all just for fun and games, or for quick tasks like testing.
Containers, which are frequently referred to as Docker by themselves, have undoubtedly existed for a while. (There are other widely used container runtimes available; this is not a proprietary technology per se; however, for the sake of typing efficiency, let’s just say Docker.) A growing number of people are “jumping on the container-ship” and either want to try Docker out or have already done so. But containers were originally made more as a way to transport code; they were meant to make deployment easy and worry-free by including batteries. It is supposed to “just work” everywhere and be essentially unchangeable. In this manner, consistency in quality can be assured and easily tested.
All of these are extremely desirable properties for developers. What if, however, you work in the database and data management industry? As we all know, databases do not actually maintain an immutable state; rather, they maintain a state so that code can remain largely “dumb” and need not “worry” about state. By simply adding more containers, statelessness enables push-button scaling as well as quick feature development and deployment.
Should I use Postgres with Docker?
If your hearing is half-decent, you may have detected some apprehensive tones in that last statement, indicating that there are some “buts” – as usual. So why not fully commit to this fantastic modern technology? Particularly given that I already stated that it definitely works.
The reason is that there are some factors you should at the very least consider in order to prevent future sweats and expletives. In conclusion, only if you’re prepared to take the following actions will you see significant benefits for your production-grade use cases:
a) live entirely on a container framework such as Kubernetes or OpenShift.
b) rely on additional software development initiatives from third parties that are not directly related to the PostgreSQL Global Development Group.
c) Maintain either your own Docker images, which include some commonly needed extensions, or some scripts that do common operational tasks like upgrading between major versions.
To reiterate, containers are a great technology in general, and this type of stuff is interesting and would probably look good on your CV… However, persistent use cases were not the inspiration for the development of container technologies. A standard PostgreSQL instance on version X can be launched quickly and conveniently thanks to the PostgreSQL project, but that is about all it can do for you in this situation.
A testers’ dream
To avoid sounding too depressing, testing of all kinds, especially integration and smoke testing, is at least one completely legitimate application for Docker and containers.
Because containers are essentially implemented as very light-weight “mini VMs,” you can start and stop them quickly. That, however, assumes that the image has already been downloaded. If not, the initial launch could take up to two minutes, depending on how strong your internet connection is.
In fact, I frequently use Docker to run all of the most recent (9.0+) Postgres versions on my workstation in the background! All of those versions, even though I don’t use them frequently, don’t bother me because they don’t consume a lot of my time or resources when they are “idling.” They are also always accessible to me whenever I need to test Postgres statistic fetching queries for our Postgres monitoring tool, pgwatch2. The only annoying thing that might nag you a little is that “in container” processes appear and kind of “litter” the picture if you also run Postgres on the host machine and want to look at a process listing to see what it’s doing (e.g. “ps -efH | grep postgres”).
Slonik in a box – a quickstart
So, let’s say I want to launch one of those “all-inclusive” light-weight pre-built database images that everyone is talking about. Where do I begin? Which images should I use?
You can never go wrong with official products, and fortunately, the PostgreSQL project offers all current major versions via the official Docker Hub (up to version 8.4, which was released in 2009!). Of course, you also need to be familiar with “Docker foo.” You typically want something akin to what is shown in the code below for a straightforward test run.
NB! The Docker runtime / engine must first be installed (if it has not already been done so). I won’t go into detail on how to do this because it should be a straightforward process if you follow the official documentation line by line.
Also, keep in mind that when we launch images, we must always expose or “remap” the default Postgres port to a free port of our choice. We actually don’t need to worry about how the service is implemented internally because ports are the “service interface” for Docker images, over which all communication typically occurs.
# Note that the first run could take a few minutes due to the image being downloaded… docker run -d --name pg13 -p 5432:5432 -e POSTGRES_HOST_AUTH_METHOD=trust postgres:13 # Connect to the container that’s been started and display the exact server version psql -U postgres -h localhost -p 5432 -c "show server_version" postgres server_version ──────────────────────────────── 13.1 (Debian 13.1-1.pgdg100+1) (1 row)
Keep in mind that you are not required to use “trust” authentication; instead, you can use the POSTGRES_PASSWORD environment variable to set a password for the default “postgres” superuser.
Simply dispose of the container and all stored tables, files, etc. once you’ve had enough of Slonik’s services for the time being by using the following code:
# Let’s stop the container / instance docker stop pg13 # And let’s also throw away any data generated and stored by our instance docker rm pg13
It couldn’t be any easier!
NB! Keep in mind that I could also designate the launched container as “temporary” explicitly with the ‘-rm’ flag when launching the container, ensuring that any data left over would be automatically deleted upon stopping.
Peeking inside the container
Now that we have seen how fundamental container usage operates, complete Docker newbies may be wondering how it actually works. What exactly is flowing inside the container box there?
We should probably start by defining the two ideas that people frequently initially conflate:
- A Docker image is an immutable software package with “batteries (libraries) included” that you can build yourself, download from a public or private Docker registry, and then “instantiate,” or launch.
- A Docker container: once an image has been launched, we’re dealing with a “live clone” that should really be called a container, and now its files can be modified, though in theory this freedom should not be overused – or at least not in a direct manner without volumes (see below)).
Let’s visualize this to make sense of it:
# Let’s take a look at available Postgres images on my workstation # that can be used to start a database service (container) in the snappiest way possible docker images | grep ^postgres | sort -k2 -n postgres 9.0 cd2eca8588fb 5 years ago 267MB postgres 9.1 3a9dca7b3f69 4 years ago 261MB postgres 9.2 18cdbca56093 3 years ago 261MB postgres 9.4 ed5a45034282 12 months ago 251MB postgres 9.5 693ab34b0689 2 months ago 197MB postgres 9.6 ebb1698de735 6 months ago 200MB postgres 10 3cfd168e7b61 3 months ago 200MB postgres 11.5 5f1485c70c9a 16 months ago 293MB postgres 11 e07f0c129d9a 3 months ago 282MB postgres 12 386fd8c60839 2 months ago 314MB postgres 13 407cece1abff 14 hours ago 314MB # List all running containers docker ps CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 042edf790362 postgres:13 "docker-entrypoint.s…" 11 hours ago Up 11 hours 0.0.0.0:5432->5432/tcp pg13
Other typical tasks when using Docker include:
* Examining the logs of a particular container, for instance, to learn more about query errors
# Get all log entries since initial launch of the instance docker logs pg13 # “Tail” the logs limiting the initial output to last 10 minutes docker logs --since "10m" --follow pg13
including a list of the image’s IP address
Because all Docker containers are assigned to the 172.17.0.0/16 default subnet, it should be noted that they can communicate with one another by default. If you don’t like that, you can also isolate some containers with custom networks so they can still communicate with each other by using the container name.
# Simple ‘exec’ into container approach docker exec -it pg13 hostname -I 172.17.0.2 # A more sophisticated way via the “docker inspect” command docker inspect -f '{{range.NetworkSettings.Networks}}{{.IPAddress}}{{end}}' pg13
* Using the container’s custom commands
It should be noted that this should be a rare occurrence and is usually only required for troubleshooting purposes. You should try not to install new programs and change the files directly, because that goes against the idea of immutability. Fortunately, since Postgres runs as “root” and the Debian repositories are still connected, which many images remove to avoid all manner of maintenance nightmares, it is simple to accomplish in the case of official Postgres images.
Here is a demonstration of how to add a third-party extension. Only the “contrib extensions” that are a part of the official Postgres project are provided by default.
docker exec -it pg13 /bin/bash # Now we’re inside the container! # Refresh the available packages listing apt update # Let’s install the extension that provides some Oracle compatibility functions... apt install postgresql-13-orafce # Let’s exit the container (can also be done with CTRL+D) exit
* Modifying PostgreSQL’s settings
When testing an application, it’s common to want to gauge how long the actual queries actually take. To do this, use the indispensable “pg_stat_statements” extension to gauge performance from the DB engine side. Without entering the container, this is fairly simple to accomplish! Starting with Postgres version 9.5, specifically…
# Connect with our “Dockerized” Postgres instance psql -h localhost -U postgres postgres=# ALTER SYSTEM SET shared_preload_libraries TO pg_stat_statements; ALTER SYSTEM postgres=# ALTER SYSTEM SET track_io_timing TO on; ALTER SYSTEM # Exit psql via typing “exit” or pressing CTRL+D # and restart the container docker restart pg13
Don’t forget about the volumes
“Ideally, very little data is written to a container’s writable layer, and you use Docker volumes to write data,” according to the Docker documentation.
The data layer of containers is peculiar in that it’s not really designed to be altered! Keep in mind that containers ought to be somewhat immutable. Internally, it operates using “copy-on-write.” Then, various historical versions of the Docker runtime are used with a variety of different storage drivers. Additionally, there are some differences brought about by various host OS iterations. The “virtualized” file access layer can make things quite complicated and, more importantly, slow at the disk access level! It’s best to follow the documentation’s instructions and create volumes for your data first.
Oh, but what exactly are volumes? They are directly related persistent OS folders where Docker makes an effort to minimize its presence. You avoid actually losing out on file system functionality and features in this way. The latter, however, is not really guaranteed and may vary depending on the platform. Especially on Windows (as usual), where one nice issue comes to mind, things could get a little hairy. The word “persistent,” which means that volumes remain visible even after a container is deleted, may be the most crucial one here. They can therefore be used to “migrate” from one software version to another.
How should you actually use volumes? Volumes can be used in two different ways: implicitly and explicitly. By the way, you can find the “fine print” here.
Also take note that in order to “volumize” a path, we actually need to know what paths should be accessed directly first. How do you discover such routes? You could begin by visiting the Docker Hub’s “postgres” page, or you could look for the “VOLUME” keyword in the Dockerfiles used to create the Postgres images. The latter is available here for Postgres version 13.
# Implicit volumes: Docker will automatically create the left side folder if it is not already there docker run -d --name pg13 -p5432:5432 -e POSTGRES_HOST_AUTH_METHOD=trust \ -v /mydatamount/pg-persistent-data:/var/lib/postgresql/data \ postgres:13 # Explicit volumes: need to be pre-initialized via Docker docker volume create pg13-data docker run -d --name pg13 -p5432:5432 -e POSTGRES_HOST_AUTH_METHOD=trust \ -v pg13-data:/var/lib/postgresql/data \ postgres:13 # Let’s inspect where our persistent data actually “lives” docker volume inspect pg13-data # To drop the volume later if the container is not needed anymore use the following command docker volume rm pg13-data
Some drops of tar – big benefits possible, with some drawbacks
To wrap up this post, feel free to run some PostgreSQL services if you’re a fan of containers in general. Once everything has been automated, containers can actually make life much easier and more standardized for larger organizations running hundreds of PostgreSQL services. The containers won’t bite you very often.
However, you should also be mindful of the dangers:
- Databases may suffer a catastrophe if the proper precautions (volumes) are not taken because Docker images and the entire concept of containers are actually optimized for the blazing-fast and slim startup experience rather than properly separating even the data into a separate persistence unit by default.
- This usually comes provided by the container framework – either via some simple “stateful sets”, or more sophisticated “operators”, or via cleverly bundled database and “bot” images which rely on a central highly-available consensus database. Using containers won’t give you any automatic and magical high-availability capabilities.
- Only when you “all in” on a container management framework like Kubernetes and choose some “operator” software (I believe the most popular ones are Crunchy Postgres operators and Zalando operators) to handle the details will life be comparatively simple.
- For example, a very common task, major version upgrades, is surprisingly out of scope for the default Postgres images! It is also out of scope for some Kubernetes operators, so you need to be prepared to get your hands dirty and create some custom intermediate images, or find some 3rd party ones like Spilo.
TLDR;
I don’t want to sound like a luddite again, but you should be aware of two things before going “all in” on containers. One, using a container automation platform like Kubernetes is the only way to really benefit from production-level database containers. Two, the benefits will only be realized if you are willing to become somewhat reliant on third-party software vendors. Third-party vendors don’t make life easier for smaller businesses; they serve larger “K8s for the win” organizations. They frequently include that way of thinking in the frameworks, which might not be appropriate for how you work.
Furthermore, not all aspects of the typical database lifecycle are sufficiently covered. My advice is to be aware that you’re only winning in the simplicity of initial deployment and typically also in automatic high-availability (which is great, of course! ), but not necessarily in all aspects of the whole lifecycle (fast major version upgrades, backups the way you like them, access control, etc.). If it currently works for you “as is,” and you’re not 100% migrating to some container-orchestration framework for all other parts of your software stack, be aware that you’re
On the other hand, if you’re familiar with a container framework like Kubernetes and/or anticipate running a slew of database instances, go for it — after thoroughly researching potential issues, of course.
Positively, I can say that many larger organizations do not want to return to the conventional way of running databases after learning to trust containers because I am in communication with a fairly large crowd of DBAs.
However, it was a bit lengthy. Thanks for reading, and if you have any additional thoughts, please share them in the comments section.
About Enteros
Enteros offers a patented database performance management SaaS platform. It finds the root causes of complex database scalability and performance problems that affect business across a growing number of cloud, RDBMS, NoSQL, and machine learning database platforms.
The views expressed on this blog are those of the author and do not necessarily reflect the opinions of Enteros Inc. This blog may contain links to the content of third-party sites. By providing such links, Enteros Inc. does not adopt, guarantee, approve, or endorse the information, views, or products available on such sites.
Are you interested in writing for Enteros’ Blog? Please send us a pitch!
RELATED POSTS
Enhancing Identity and Access Management in Healthcare with Enteros
- 19 November 2024
- Database Performance Management
In the fast-evolving world of finance, where banking and insurance sectors rely on massive data streams for real-time decisions, efficient anomaly man…
Maximizing Efficiency with Enteros: Revolutionizing Cost Allocation Through a Cloud Center of Excellence
In the fast-evolving world of finance, where banking and insurance sectors rely on massive data streams for real-time decisions, efficient anomaly man…
Driving Efficiency in the Transportation Sector: Enteros’ Cloud FinOps and Database Optimization Solutions
- 18 November 2024
- Database Performance Management
In the fast-evolving world of finance, where banking and insurance sectors rely on massive data streams for real-time decisions, efficient anomaly man…
Empowering Nonprofits with Enteros: Optimizing Cloud Resources Through AIOps Platform
In the fast-evolving world of finance, where banking and insurance sectors rely on massive data streams for real-time decisions, efficient anomaly man…