Turn raw data into fixed time intervals for metric analysis and visualization.
If you are working with time-series data, you need a way to be able to easily manipulate, query and visualize that data. What you may or may not already know is that TimescaleDB provides a number of time-oriented functions that aren't found in traditional relational databases.
These functions are meant to provide two key benefits: improved ease of use for time-series analytics, and improved performance. The functions live alongside full SQL and can be viewed as extensions of the SQL language. Since a vast amount of developers today already know SQL, the learning curve is greatly reduced.
Two critical TimescaleDB time-series functions are:
time_bucket_gapfill(). Time_bucket() is used for aggregating arbitrarily-sized time periods and gapfill() is important when your time buckets have missing data or gaps, which is a very common occurrence when capturing 1000s of time series readings per second. Together, both of these are essential for analyzing and visualizing time series data.
In this blog we’ll discuss both of these capabilities, and show them in action using Grafana.
TL;DR Check out this Grafana dashboard that is visualizing data from TimescaleDB (both hosted in Timescale Cloud) showing time bucketing and gapfilling in action.
Background on time_bucket()
time_bucket() is a more powerful version of the standard PostgreSQL
date_trunc “truncates” a TIMESTAMP or an INTERVAL value based on a specified date part (e.g. hour, week or month) and returns the truncated timestamp or interval.
date_trunc can aggregate by 1 second, 1 hour, 1 day or 1 week. However, users often want to see aggregates by 5 minutes or 4 hours, etc. This can get pretty complicated in SQL, but
time_bucket() makes it easy.
Time bucketing allows for arbitrary time intervals (e.g., 5 minutes, 6 hours, etc.), as well as flexible groupings and offsets, instead of just second, minute, hour, and so on.
In addition to allowing more flexible time-series queries,
time_bucket() also allows you to write these queries in a simpler way. In fact, it knows to infer the time range directly from the WHERE clause, which greatly simplifies the query syntax.
Here it is in action:
SELECT time_bucket('5 minutes', time) AS five_min, avg(cpu) FROM metrics GROUP BY five_min ORDER BY five_min DESC LIMIT 12;
By the way, this feature has actually been a core function of TimescaleDB since the first release in April 2017, and our users love it!
When to use time_bucket()
As you can imagine, time bucketing can be helpful for a number of scenarios. When it comes to creating dashboards or visualizations of time-series data, many rely on this function to turn their raw observations into fixed time intervals.
When graphing time-series data using a solution such as Grafana, aggregations can help identify trends over time by grouping raw data into higher level aggregates. For example, you might want to average monthly raw data daily to achieve a smoother trend line or count the number of occurrences of non-numeric data.
Building on top of time_bucket() with time_bucket_gapfill()
One issue users often encounter when working with time-series data is recording measurements at irregular or mismatched intervals. This creates a time bucket interval with missing data or gaps. Fortunately, TimescaleDB has a function called
time_bucket_gapfill() that allows you to aggregate your data into continuous time intervals. You can choose two different ways to fill in gaps:
locf() which carries the last known value in the time range forward or
interpolate() which does a linear interpolation between gaps.
Time_bucket_gapfill was introduced in TimescaleDB 1.2 as a Community feature. To learn more about gapfill, check out this blog post.
Get hands-on with a sample application
Now that you know what
time_bucket_gapfill() do, it’s time to get hands-on with a sample application! We’ve built a simple Python application that pulls data from the Open AQ Platform API (an open API air quality data). Essentially, the application reads the API to get air quality values for all cities in Great Britain over period of time, parses the results, and stores all measurements collected from air quality sensors in TimescaleDB.
All the code for this application can be found on GitHub https://github.com/timescale/examples/tree/master/air-quality.
We also connected a Grafana instance to the TimescaleDB database that stores these results, and have written a step by step tutorial of how to use
time_bucket_gapfill() to visualize the data.
time_bucket() function is just one example of how TimescaleDB offers optimized SQL functions for working with time-series data. To learn more about other TimescaleDB functions, check out our docs.
Like this post and interested in learning more? Sign up for our mailing list below or follow us on Twitter.