Preamble
A lot has been written about timeseries analysis and how to deal with temporal data in general. There have been countless papers submitted and published online explaining different tactics. But a lot of the real technology is sometimes hidden behind flashy marketing materials that don’t say anything useful and don’t work.
But as more businesses use cutting-edge methods and learn how valuable data really is, timeseries analysis is becoming more and more important.
Basic timeseries analysis is more than adequate, and in many circumstances, additional tooling is not required to get things moving quickly and properly. There are many things that can be done with timeseries, and in this article, I’d like to discuss a straightforward yet effective concept that can be used to find trends or well-known patterns in the data.
Loading test data into PostgreSQL
I first put together a simple table with some sample data in our PostgreSQL database to show how this could be done:
CREATE TABLE t_timeseries ( id serial, data numeric ); COPY t_timeseries FROM stdin DELIMITER ','; 1,11 2,14 3,16 4,9 5,12 6,13 7,14 8,9 9,15 10,9 \.
Can we identify a particular trend in the data, then? In this case, the issue is whether we can identify a time frame in which the value increased consistently (for instance, “three times in a row”). The goal is to develop a plan that enables a more in-depth analysis, though.
Encoding timeseries as strings
The relatively simple method of encoding timeseries as strings is one option. The benefit is that standard string crunching approaches can be easily applied to those strings. When talking about trends and other things, it can be helpful to know the difference between the current value and the previous value. Thankfully, PostgreSQL (and SQL in general) offers a simple method for doing that:
test=# SELECT *, data - lag(data, 1) OVER (ORDER BY id) AS diff FROM t_timeseries; id | data | diff ----+------+------ 1 | 11 | 2 | 14 | 3 3 | 16 | 2 4 | 9 | -7 5 | 12 | 3 6 | 13 | 1 7 | 14 | 1 8 | 9 | -5 9 | 15 | 6 10 | 9 | -6 (10 rows)
In accordance with the order specified in the OVER-clause, the lag function will shift the data by one row. Now, it’s simple to figure out how much one row differs from the next.
After this brief introduction, it’s time to concentrate on the key maneuver. The goal is to examine the differences between each row using the results of the previous SQL statement. We encode a value as “u” (for “up”) if it is greater than zero and “d” (for “down”) if it is not. How does it benefit us? The series’ advantage is that each action is represented by a single character, making it simple to understand later on. We use a sliding window after the encoding process. We combine the data from all five periods—two earlier, the current period, and two later—into a single string.
What transpires is as follows:
test=# SELECT *, string_agg(CASE WHEN diff > 0 THEN 'u'::text ELSE 'd'::text END, '') OVER (ORDER BY id ROWS BETWEEN 2 PRECEDING AND 2 FOLLOWING) AS encoded FROM ( SELECT *, data - lag(data, 1) OVER (ORDER BY id) AS diff FROM t_timeseries ) AS x; id | data | diff | encoded ----+------+------+--------- 1 | 11 | | duu 2 | 14 | 3 | duud 3 | 16 | 2 | duudu 4 | 9 | -7 | uuduu 5 | 12 | 3 | uduuu 6 | 13 | 1 | duuud 7 | 14 | 1 | uuudu 8 | 9 | -5 | uudud 9 | 15 | 6 | udud 10 | 9 | -6 | dud (10 rows)
As you can see, the encoded string only contains three characters at the beginning. We only see what is in the future because there are no values that came before them. We receive some information from the query along with an encoded string.
The result, if the query can be transformed into a view, is for simplicity’s sake:
CREATE VIEW v AS SELECT *, string_agg(CASE WHEN diff > 0 THEN 'u'::text ELSE 'd'::text END, '') OVER (ORDER BY id ROWS BETWEEN 2 PRECEDING AND 2 FOLLOWING) AS encoded FROM ( SELECT *, data - lag(data, 1) OVER (ORDER BY id) AS diff FROM t_timeseries ) AS x;
Remember that the encoder is fairly basic and typically insufficient to perform a real-world example. The encoder (“time series codec”) may be much more advanced if you intend to perform real-world timeseries analysis in PostgreSQL. The point here is just to give you some ideas of what can be done with a fairly simple technique using a standard PostgreSQL database.
Analyzing an encoded string
Since the data has been encoded, it is now easy to analyze with common PostgreSQL tools. The following straightforward query can be used to find all areas of the data where the value was increasing at least three times in a row:
id | data | diff | encoded = # SELECT * FROM v WHERE encoded LIKE '%uuu%'; ----+------+------+--------- 5 | 12 | 3 | uduuu 6 | 13 | 1 | duuud 7 | 14 | 1 | uuudu (3 rows)
To find a more complex pattern, you might typically want to use a more sophisticated search algorithm. Regular expressions could be very helpful when searching for more complex items. To find areas in your timeseries that are similar to what you are looking for, it can also make sense to create a “distance” function and use KNN. Basically, if you’re willing to exercise a little creativity, you can accomplish a lot of things with ease.
About Enteros
Enteros offers a patented database performance management SaaS platform. It finds the root causes of complex database scalability and performance problems that affect business across a growing number of cloud, RDBMS, NoSQL, and machine learning database platforms.
The views expressed on this blog are those of the author and do not necessarily reflect the opinions of Enteros Inc. This blog may contain links to the content of third-party sites. By providing such links, Enteros Inc. does not adopt, guarantee, approve, or endorse the information, views, or products available on such sites.
Are you interested in writing for Enteros’ Blog? Please send us a pitch!
RELATED POSTS
Optimizing Database Performance with Enteros and AWS Resource Groups: A RevOps Approach to Streamlined Efficiency
- 13 November 2024
- Database Performance Management
In the fast-evolving world of finance, where banking and insurance sectors rely on massive data streams for real-time decisions, efficient anomaly man…
Enhancing Healthcare Data Integrity: How Enteros, Logical Models, and Database Security Transform Healthcare Operations
In the fast-evolving world of finance, where banking and insurance sectors rely on massive data streams for real-time decisions, efficient anomaly man…
Optimizing Budgeting and Cost Allocation in the Finance Sector with Enteros: A Smarter Approach to Financial Efficiency
- 12 November 2024
- Database Performance Management
In the fast-evolving world of finance, where banking and insurance sectors rely on massive data streams for real-time decisions, efficient anomaly man…
Enteros and Cloud FinOps: Unleashing Big Data Potential for eCommerce Profitability
In the fast-evolving world of finance, where banking and insurance sectors rely on massive data streams for real-time decisions, efficient anomaly man…