Preamble
Yes, I am aware that there are already a number of monitoring tools that are made just for PostgreSQL. I knew something was still missing, though. The main problem, in my opinion, was that all the tools either needed a lot of setup and were intrusive (like installing an extension and restarting the server) or were too rigid, gathering and showing data that someone thought was a good idea.
Yes, this Apple-style method often produces visually appealing results, but it makes it very hard to make changes or report bugs when the tool’s maintainers seem to be on vacation sipping margaritas. The need to quickly learn Perl, Python, C, Java, and so on will result in you yanking out a lot of hair. What then needs to be improved? What should a “brave new” monitoring application’s primary characteristics be?
Main concepts of pgwatch2
- 30-second setup
The easiest installations should be fast and easy, and you shouldn’t need to know anything about how Postgres works or any programming language, compiling, or other similar things. Because of these requirements, the choices are very limited, but in the modern world, Docker, which works on almost every platform, is often the only choice. Docker can still be installed from the source code on platforms where it is not an option, but the user will have to take extra steps to do so.
- Free Software
Community support is greatly appreciated!
- Non-invasive
As a remote agent, Pgwatch2 makes regular network queries to read metrics and stores the results, which can then be used to make graphs. It does all of this without making any changes to the database server.
- simple extension without coding
People will want to use your program in situations that you never even considered, creating new metrics as they go. Braking changes will eventually make it into the core of Postgres. And after that, you want to be able to customize the new or modified metric in just a few minutes. In our situation, this entails continuing to define metrics using standard SQL. The major benefit is that everyone who works with databases can understand them to a large extent.
- Data initial
It all comes down to data, as recent events in life have demonstrated on many levels. Everyone can build on it if it is simple to access. In the context of pgwatch2, this means to bet on InfluxDB, which is one of the best-known time series databases. Written in Go, it provides excellent performance, excellent data compression, and simple data discovery. This is especially true after learning InfluxQL, another SQL-like query language. When using pgwatch2, you only need to worry about it if you want to do something with the metrics data.
- Don’t use visuals to reinvent the wheel.
I know from maintaining PgObserver, another popular and nice Postgres monitoring tool, how hard it is to make nice user interfaces for metrics. Especially for backend/database developers Numerous Javascript frameworks, a wide range of hardware, etc. So, for pgwatch2, we decided to leave it to the professionals and stick with Grafana, which is the best dashboarding (free graphing) solution out there. This means that users can easily and intuitively put together the exact graphs, tables, and counters they want to see by using the few predefined dashboards (and data, of course) that we will provide. Wonderful—less work for us! We’ll also offer a simple administrative web interface, but its main focus will be on managing databases and metrics that need to be kept track of.
Quickstart
It couldn’t be any easier to “install” pgwatch2 in its pre-configured, all-inclusive configuration, which should satisfy most users. Assuming, of course, that Docker has already been set up on the computer that will be the monitoring server:
# start the latest docker image in the background docker run -d -p 3000:3000 -p 8080:8080 --name pw2 cybertec/pgwatch2
After that, you can wait a minute for Docker to start up before using Grafana on port 3000 to look at metric graphs with predefined dashboards. Please take a look at the screenshots above if you are unable to try it out right away.
keeping track of your databases
It goes without saying that you should start tracking metrics from your own PostgreSQL database after starting the Docker image. You must add the connection string for your “database under monitoring” to pgwatch2 for a new statistics-gathering process to start internally. For that, there’s a small Web UI on port 8080 (by default, no authentication). You can set up the connection information (host, port, user, and password) through the UI (see the screenshots above) by using the example of an existing entry, which is actually the pgwatch2 configuration database running inside Docker. You ought to be able to see graphs for your own database after five minutes.
Note: Make sure that the monitored database server can be accessed from within Docker (possible problems with pg_hba.conf rules) and that the monitored user or role is allowed to query the pg_stat_* statistic views, which is allowed by default even for non-superusers. “Docker logs” is currently restricted to providing logs for the management process, so it can’t be of much use in this situation. If things still don’t seem to be working, you might want to take a look at the logs that are accessible from the web interface. Keep in mind that 3 months is also the typical data retention period for metrics! However, an administrative command for InfluxDB makes it simple to extend.
Metrics
Almost all of the metrics that the Postgres Statistics Collector subsystem (pg_stat_* views) offers can be collected and stored right out of the box. Users only need to turn on the metrics with the same names for each database, since monitoring everything is usually not necessary. There are some predefined monitoring configurations available (for example, named descriptively “basic” or “exhaustive”), and users can peek into their definitions and choose an appropriate one or create their own configurations consisting of metric names and monitoring intervals in seconds, represented as JSON. This enables choosing the appropriate amount of monitoring detail. The Grafana “Documentation” dashboard also displays a list of the metrics that are available, along with some instructions.
A named SQL query that returns some columns with values is referred to as a “metric” in this context. Text, integers, and floating point numbers are supported, and you can (and should) create your own metrics because it is very simple to do so. All metric queries should also return an “epoch_ns” column (seconds since the Unix epoch in nanoseconds, but this can be copied from existing metrics) to record the precise time of the metrics snapshot. Simply add the query selecting something from the statistics views (or even normal tables, i.e., you can do business process monitoring!) via the UI. By prefixing the column names with “tag_,” columns can optionally be “tagged” or “indexed” in the storage backend (InfluxDB). For such columns, Grafana will also offer nice data discovery features, and queries will run faster on them as well.
Dashboards
In this context, we refer to Grafana dashboards, which are essentially a collection of graphs, counters, and tables on a single page that give you an overview of something based on the data you have. Most importantly, by simply clicking around, you can quickly visually compose dashboards yourself! Even though everything is fairly self-explanatory and you will be given a list of available databases and metrics, etc., it will still take you about an hour of trying and reading to really master it. However, the time you invest will more than pay for itself. Please take note that you must log into Grafana as “admin” in order to create your own dashboard. The README contains the default password, and after logging in, you can adjust any other settings and add new users.
There are currently 5 predefined dashboards available from our side:
- Documentation: A synopsis of the metrics that are available
- Overview of the database’s key performance indicators (growth of the database, write activity, typical query runtimes, cache ratio, etc.).
- Table specifics, including changes in table/index size, sequential/index scans, and cache ratios
- Top-ranked time-consuming/called statements by query_id from Stat_statement details (requires installation of pg_stat_statement extension)
- graphs for all the information that pg_stat_statement provides in a single query detail
Conclusions and a request for feedback
Of course, there are additional internal details to this solution (the metrics gathering daemon implementation, the configuration DB, metric fetching helpers for reading the CPU load and pg_stat_statement data, alerting possibility, security aspects, etc.), and some questions may remain unanswered. However, you can find these details in the source code or the README, and I also have plans to write another blogpost about the specifics soon.
About Enteros
Enteros offers a patented database performance management SaaS platform. It finds the root causes of complex database scalability and performance problems that affect business across a growing number of cloud, RDBMS, NoSQL, and machine learning database platforms.
The views expressed on this blog are those of the author and do not necessarily reflect the opinions of Enteros Inc. This blog may contain links to the content of third-party sites. By providing such links, Enteros Inc. does not adopt, guarantee, approve, or endorse the information, views, or products available on such sites.
Are you interested in writing for Enteros’ Blog? Please send us a pitch!
RELATED POSTS
Revolutionizing Healthcare IT: Leveraging Enteros, FinOps, and DevOps Tools for Superior Database Software Management
- 21 November 2024
- Database Performance Management
In the fast-evolving world of finance, where banking and insurance sectors rely on massive data streams for real-time decisions, efficient anomaly man…
Optimizing Real Estate Operations with Enteros: Harnessing Azure Resource Groups and Advanced Database Software
In the fast-evolving world of finance, where banking and insurance sectors rely on massive data streams for real-time decisions, efficient anomaly man…
Revolutionizing Real Estate: Enhancing Database Performance and Cost Efficiency with Enteros and Cloud FinOps
In the fast-evolving world of finance, where banking and insurance sectors rely on massive data streams for real-time decisions, efficient anomaly man…
Enteros in Education: Leveraging AIOps for Advanced Anomaly Management and Optimized Learning Environments
In the fast-evolving world of finance, where banking and insurance sectors rely on massive data streams for real-time decisions, efficient anomaly man…