Article
We are pleased to announce a brand-new major pg_timetable v4 release!
This time, a ton of work was put into it, and the pg_timetable v4 release is incompatible with earlier iterations. Job migration is simple, though, so nobody should be discouraged from updating.
We added extensive read the documentation in the docs and will continue to improve it Please let us know if there are any areas that should be highlighted or new topics added.
Check out some of the fantastic new features right away! You can see a full list of changes on the official release page, which also has links to issues, commits, and pull requests.
I would like to remind you that pg_timetable is a collaborative effort. Therefore, don’t be afraid to ask any questions, report any bugs, bookmark the pg_timetable project, and let everyone know about it.
pg_timetable v3 versus v4
The earlier version is still regarded as stable. There are no known bugs or serious problems with how it works, so it is safe to keep using it.
If new bugs are found, it won’t be backpatched, though. Furthermore, the v3 branch will not get any new features added.
Performance
We made several efforts to improve performance.
The database schema was first overhauled and made as simple as possible. This will give the real workers more bandwidth and connections because the new pg_timetable will only have to handle a smaller number of system queries.
Second, we replaced PL/PgSQL with pure SQL when rewriting cron-related functions. It enables us to eliminate tiresome debugging in addition to improving performance.
The extensive logging system makes up the third pillar. Details can be found in the separate section below.
File Configuration Support
If you only use command-line parameters and environment variables to handle configuration, it’s easy to make mistakes and takes a long time.
Rules of common sense suggest using configuration files, in this case. We have provided you with a comprehensive, self-explanatory config.example.yaml file.
Naturally, you have the option to use a unique set of options or not use it at all. To improve security, it seems like a good idea to put general options in the configuration file and connection options in the environment variables.
built-in CopyFromFile task
One of the most common things database schedulers do is bring data from outside sources, like sensors, logs, official statistics, etc., into the database. For these types of data, the de facto standard is the CSV format. The copy command in PostgreSQL can import CSV files. However, the file must be stored in the server’s file system for the PostgreSQL process to have access to it in order to use this function. Especially in the cloud era, that is not always the case for production environments.
To get around this restriction in the older version of pg_timetable, we recommend using the psql copy command, which runs a frontend (client) copy. This is an operation that runs an SQLCOPY
command, but instead of the server reading or writing the specified file, psql reads or writes the file and routes the data between the server and the local file system. This means that the local user, not the server, has access to and privileges over the files, and no SQL superuser privileges are needed.
You no longer require any extra software to perform client copy. One can use the new built-in CopyFromFile
task. The manual may contain the entire example. As you can see in the example below, the task accepts JSON parameters in the form of objects.
SQL statement: "COPY location FROM STDIN WITH DELIMITER '|' NULL 'NULL>'", filename: "download/orte_ansi.txt"
- Here
sql
is the correctCOPY .. FROM STDIN
command. One can use any of the standard clauses controlling file specification, e.g. file encoding, condition for insert, escaping, quoting, etc. - And the
filename
is the correct path (absolute or relative to pg_timetable) of the source CSV file.
This task perfectly works for all formats supported by PostgreSQL: text
, csv
and binary
.
improved logging
One of the key architectural decisions made for pg_timetable was the database-driven concept. In other words, scheduler sessions are entirely managed by the database, and all output is saved there as well for later analysis.
File logs
Despite this, people were using output redirection to store logs in files, such as
pg_timetable -c loader -u scheduler -d timetable > pgtt.log
The disadvantage of this method is that data is stored as plain text, which is difficult for machines to process. In the new version, we have introduced the ability to duplicate logs to a file system using --log-file
command-line parameter. Moreover, the user can choose the exact format of the underlying file, specifying --log-file-format
: json
or text
.
database logs’ efficiency
Another issue that required attention was the database logging. Heavy clients were producing a lot of information to be stored by separate INSERT
statements, causing intensive use of connections. We have made a great implementation of a log hook for the well-known logrus package.
The key benefits are:
- usage of the
COPY
machinery; - a cache that is controlled by the time and the number of logged entries;
- in cases of extremely high load, the ability to skip entry logging
I personally encourage you to incorporate our log hook into your projects. Let us know if you think we should make it a standalone project.
Tasks output
pg_timetable has been storing output for program tasks from the very beginning. It is helpful for incident analysis and debugging at a previous time point. It wasn’t until Nikolay Samokhvalov made the suggestion that we realized this feature could be used with the built-in SQL tasks. So now one can have SQL task with SELECT
statement to determine which output will be saved upon execution, e.g.
"SELECT * FROM timetable.active_session"); add_job --------- 1 (1 row) ... my_database=> FROM timetable.execution_log, SELECT chain_id, finished, and output; chain_id | finished | output ----------+-------------------------------+---------- 1 | 2021-08-12 15:40:46.064795+02 | SELECT 2 (1 row)
The SQL task gives back the tag of the command that was run and the number of rows that were returned or changed. We don’t store the result set because commands can be as simple or as complicated as you want. Users should be cautious and explicitly store any important information, such as
CREATE TABLE my_database (SELECT timetable.add_job('dump-sessions', "* * * "), "INSERT INTO history_session SELECT now(), * FROM timetable.active_session'); add_job --------- 3 (1 row) ... SELECT * FROM my_database history_session; ts | client_pid | client_name | server_pid -------------------------------+------------+-------------+------------ 2021-08-12 15:52:02.378981+02 | 10440 | loader-foo | 15676 2021-08-12 15:52:02.378981+02 | 10440 | loader-foo | 16436 2021-08-12 15:52:02.378981+02 | 10440 | loader-foo | 14892 2021-08-12 15:52:02.378981+02 | 10440 | loader-foo | 25480 2021-08-12 15:53:02.446911+02 | 19664 | writer-boo | 21876 2021-08-12 15:53:02.446911+02 | 19664 | writer-boo | 11852 2021-08-12 15:53:02.446911+02 | 19664 | writer-boo | 10052 15:53:02.446911+02 | 19664 | writer-boo | 27396 (8 rows) 15:53:02.446911+02 | 19664 | writer-boo | 27396 (8 rows) 2021-08-12 15:53:02.446911+02 | 19664 | writer-boo | 27396 (8 rows) my_database=> SELECT chain_id, finished, output FROM timetable execution_log; chain_id | finished | output ----------+-------------------------------+------------ 1 | 2021-08-12 15:40:46.064795+02 | SELECT 2 1 | 2021-08-12 15:52:02.380711+02 | SELECT 4 3 | 2021-08-12 15:52:02.382296+02 | INSERT 0 4 1 | 2021-08-12 15:52:07.34872+02 | SELECT 3 3 | 2021-08-12 15:52:07.448914+02 | INSERT 0 4 (5 rows)
Resource Choices
Our scheduler should be as adaptable as possible. We have therefore added the resource management options category:
--cron-workers: number of concurrent workers for scheduled chains (16 by default);
--interval-workers; number of concurrent workers for interval chains (16 by default);
--chain-timeout: any chain that takes more time than the allotted milliseconds should be aborted;
--task-timeout: Stop any task in a chain that takes longer than the predetermined number of steps.
of milliseconds.
Finally
This was the first piece in a series on the new capabilities of pg_timetable v4. The best features will be emphasized, so keep checking this space.
About Enteros
Enteros offers a patented database performance management SaaS platform. It finds the root causes of complex database scalability and performance problems that affect business across a growing number of cloud, RDBMS, NoSQL, and machine learning database platforms.
The views expressed on this blog are those of the author and do not necessarily reflect the opinions of Enteros Inc. This blog may contain links to the content of third-party sites. By providing such links, Enteros Inc. does not adopt, guarantee, approve, or endorse the information, views, or products available on such sites.
Are you interested in writing for Enteros’ Blog? Please send us a pitch!
RELATED POSTS
Enhancing Identity and Access Management in Healthcare with Enteros
- 19 November 2024
- Database Performance Management
In the fast-evolving world of finance, where banking and insurance sectors rely on massive data streams for real-time decisions, efficient anomaly man…
Maximizing Efficiency with Enteros: Revolutionizing Cost Allocation Through a Cloud Center of Excellence
In the fast-evolving world of finance, where banking and insurance sectors rely on massive data streams for real-time decisions, efficient anomaly man…
Driving Efficiency in the Transportation Sector: Enteros’ Cloud FinOps and Database Optimization Solutions
- 18 November 2024
- Database Performance Management
In the fast-evolving world of finance, where banking and insurance sectors rely on massive data streams for real-time decisions, efficient anomaly man…
Empowering Nonprofits with Enteros: Optimizing Cloud Resources Through AIOps Platform
In the fast-evolving world of finance, where banking and insurance sectors rely on massive data streams for real-time decisions, efficient anomaly man…