What are Linux cgroups?
cgroups manual page, quote:
Control groups, usually referred to as cgroups, are a Linux kernel feature which allow processes to be organized into hierarchical groups whose usage of various types of resources can then be limited and monitored.
cgroups are managed with special commands that start with “cg,” but can also be managed through a special cgroups file system and systemd
.
Now that a running PostgreSQL cluster is a collection of processes, that makes sense.
There are a number of defined subsystems (also known as “controllers” in cgroups terminology). The following are relevant for PostgreSQL among these:
- Memory: beneficial for reducing overall memory usage
- Limiting the I/O throughput with blkio is beneficial.
- CPU: useful for setting upper and lower limits on the amount of CPU time that processes can use.
- Useful for assigning processes to a portion of the available CPU cores is cpuset.
Configuring cgroups
During system startup, cgroups are created as defined in the /etc/cgconfig.conf
configuration file.
Let’s establish a cgroup to erect a PostgreSQL cluster cage:
group db_cage { # user and group "postgres" can manage these cgroups perm task uid = Postgres; gid = Postgres; fperm = 774; admin uid = Postgres; gid = Postgres; dperm = 775; fperm = 774; Limit memory to 1 GB and disable swap memory: memory.limit_in_bytes = 1G; memory.memsw.limit_in_bytes = 1G; # limit read and write I/O to 10MB/s each on device 8:0 blkio blkio.throttle.read_bps_device = "8:0 10485760"; blkio.throttle.write_bps_device = "8:0 10485760"; # limit CPU time to 0.25 seconds out of each second cpu { cpu.cfs_period_us = 1000000; cpu.cfs_quota_us = 250000; } Only CPUs 0–3 and memory nodes 0 can be used. cpuset { cpuset.cpus = 0-3; cpuset.mems = 0; } }
Run the following commands as root to make it active:
# /usr/sbin/cgconfigparser -l /etc/cgconfig.conf -s 1664
To have that done automatically at server start, I tell systemd
to enable the cgconfig
service:
# systemctl enable cgconfig # systemctl start cgconfig
Starting PostgreSQL in a cgroup
To start PostgreSQL in the cgroups we defined above, use thecgexec
executable (you may have to install an operating system package called libcgroup
or libcgroup-tools
for that):
$ cgexec -g cpu,memory,blkio:db_cage \ /usr/pgsql-10/bin/pg_ctl -D /var/lib/pgsql/10/data start
We can verify that the proper cgroup in which PostgreSQL is running is:
$ head -1 /var/lib/pgsql/10/data/postmaster.pid 16284 $ cat /proc/16284/cgroup | egrep '\b(cpu|blkio|memory)\b' 10:cpu,cpuacct:/db_cage 9:blkio:/db_cage 4:memory:/db_cage
To change a running process to a cgroup, you can use cgclassify
(but then you have to change all running PostgreSQL processes).
Using cgroups with systemd
systemd
provides a simpler interface to Linux cgroups, so you don’t have to do any of the above. systemd
can create cgroups “on the fly” for the services it starts.
If your PostgreSQL service is called postgresql-10
, simply create a file /etc/systemd/system/postgresql-10.service
like this:
# include the original service file rather than editing it # so that changes don't get lost during an upgrade .include /usr/lib/systemd/system/postgresql-10.service [Service] # limit memory to 1GB # sets "memory.limit_in_bytes" MemoryMax=1G # limit memory + swap space to 1GB # this should set "memory.memsw.limit_in_bytes" but it only # works with cgroups v2 ... # MemorySwapMax=1G # limit read I/O on block device 8:0 to 10MB per second # sets "blkio.throttle.read_bps_device" IOReadBandwidthMax=/dev/block/8:0 10M # limit write I/O on block device 8:0 to 10MB per second # sets "blkio.throttle.write_bps_device" IOWriteBandwidthMax=/dev/block/8:0 10M # limit CPU time to a quarter of the available # sets "cpu.cfs_quota_us" CPUQuota=25% # there are no settings to control "cpuset" cgroups
Now you have to tell systemd
that you changed the configuration and restart the service:
# systemctl daemon-reload # systemctl restart postgresql-10
As you see, not all cgroup settings are available with systemd
. As a workaround, you can define cgroups in /etc/cgconfig.conf
and use cgexec
to start the service.
How useful are cgroups for PostgreSQL?
I would say that it depends on the subsystem.
memory
At first glance, it sounds interesting to limit memory usage with cgroups. But there are several drawbacks:
- If PostgreSQL is allowed to use swap space, it will start swapping when the memory quota is exceeded.
- If PostgreSQL is not allowed to use swap space, the Linux OOM killer will kill PostgreSQL when the quota is exceeded (alternatively, you can configure the cgroup so that the process is paused until memory is freed, but this might never happen).
- The memory quota also limits the amount of memory available for the file system cache.
None of this is very appealing — there is no option to make malloc
fail so that PostgreSQL can handle the problem.
I think that it is better to use the traditional way of limiting PostgreSQL’s memory footprint by setting shared_buffers
, work_mem
and max_connections
so that PostgreSQL won’t use too much memory.
That also has the advantage that all PostgreSQL clusters on the machine can share the file system cache, so that clusters that need it can get more of that resource, while no cluster can become completely memory starved (everybody is guaranteed shared_buffers
).
blkio
I think that cgroups are a very useful way of limiting I/O bandwidth for PostgreSQL.
The only drawback is maybe that PostgreSQL cannot use more than its allotted quota even if the I/O system is idle.
cpu
cgroups are also a good way of limiting CPU usage by a PostgreSQL cluster.
Again, it would be nice if PostgreSQL were allowed to exceed its quota if the CPUs are idle.
cpuset
This is only useful on big machines with a NUMA architecture. On such machines, binding PostgreSQL to the CPUs and memory of one NUMA node will make sure that all memory access is local to that node and consequently fast.
You can thus partition your NUMA machine between several PostgreSQL clusters.
About Enteros
Enteros offers a patented database performance management SaaS platform. It finds the root causes of complex database scalability and performance problems that affect business across a growing number of cloud, RDBMS, NoSQL, and machine learning database platforms.
The views expressed on this blog are those of the author and do not necessarily reflect the opinions of Enteros Inc. This blog may contain links to the content of third-party sites. By providing such links, Enteros Inc. does not adopt, guarantee, approve, or endorse the information, views, or products available on such sites.
Are you interested in writing for Enteros’ Blog? Please send us a pitch!
RELATED POSTS
Enhancing Identity and Access Management in Healthcare with Enteros
- 19 November 2024
- Database Performance Management
In the fast-evolving world of finance, where banking and insurance sectors rely on massive data streams for real-time decisions, efficient anomaly man…
Maximizing Efficiency with Enteros: Revolutionizing Cost Allocation Through a Cloud Center of Excellence
In the fast-evolving world of finance, where banking and insurance sectors rely on massive data streams for real-time decisions, efficient anomaly man…
Driving Efficiency in the Transportation Sector: Enteros’ Cloud FinOps and Database Optimization Solutions
- 18 November 2024
- Database Performance Management
In the fast-evolving world of finance, where banking and insurance sectors rely on massive data streams for real-time decisions, efficient anomaly man…
Empowering Nonprofits with Enteros: Optimizing Cloud Resources Through AIOps Platform
In the fast-evolving world of finance, where banking and insurance sectors rely on massive data streams for real-time decisions, efficient anomaly man…