How we made DISTINCT queries up to 8000x faster on PostgreSQL
TimescaleDB 2.2.1, released today, includes “TimescaleDB Skip Scan”, our implementation of a technique to vastly improve query times for DISTINCT queries. Skip Scan works on PostgreSQL tables, TimescaleDB hypertables, and TimescaleDB distributed hypertables (which scale across machines in a multi-node cluster), achieving 26x-8000x faster queries in our tests.
Shout out to the engineers who worked on this feature: Sven Klemm, Joshua Lockerman, Nikhil Sontakke, and the entire team of reviewers and testers!
Finding the most recent data quickly
How many times in the last week have you used an application or queried data looking for the most recent value of something? Maybe you recently reviewed the latest stock or cryptocurrency price for each of your investments. Or, maybe you checked the latest On Base Percentage for all of your Fantasy Baseball team players.
Or maybe you work with IoT devices, or monitor your infrastructure with tools like Prometheus, and have graphs and alarms that repeatedly query the most recent values for every device or service.
Whether you're looking at stocks or trying to find a piece of network equipment that's failing, there's a good chance you've queried for data like this in the last couple of days:
- What was the last reading of each item (stock, IoT device, etc.) and what time did it occur?
- What devices reported an hour ago but haven't sent data in the last 10 minutes?
- Retrieve a paged list of the most recent readings for all items of client X
As we collect and analyze more data than ever before - be it from monitoring our applications, IoT devices, financial markets, or a myriad of other scenarios - it is critical that we are able to compute answers to our questions quickly, especially when powering an application or dashboard.
Why DISTINCT queries are slow on PostgreSQL
PostgreSQL is an amazing database, but it can struggle with certain types of queries, especially as tables approach tens and hundreds of millions of rows (or more). DISTINCT queries are an example of this.
Why are DISTINCT queries slow on PostgreSQL when they seem to ask an "easy" question? It turns out that PostgreSQL currently lacks the ability to efficiently pull a list of unique values from an ordered index. Even when you have an index that matches the exact order and columns for these "last point" queries, PostgreSQL is still forced to scan the entire index to find all unique values. As a table grows (and they grow quickly with time-series data), this operation keeps getting slower.
Other databases like MySQL, Oracle, and DB2 implement a feature called "Loose indexscan" or "Index Skip Scan" or “Skip Scan” to speed up the performance of queries like this.
When a database has a feature like "Skip Scan", it can incrementally jump from one ordered value to the next without reading all of the rows in between. Without support for this feature, the database engine has to scan the entire ordered index and then deduplicate at the end – which is a much slower process.
Since 2018, there have been plans to support something similar in PostgreSQL. (Note: We weren’t able to use this implementation directly, due to some limitations of what is possible within the Postgres extension framework.)
Unfortunately, this patch wasn't included in the CommitFest for PostgreSQL 14, which means that it won't be included until PostgreSQL 15 at the earliest (i.e., no sooner than Fall 2022, at least 1.5 years from now).
We don’t want our users to have to wait that long.
Skip Scan in TimescaleDB 2.2.1, available today
Today, via TimescaleDB 2.2.1, we are releasing TimescaleDB Skip Scan, a custom query planner node that makes ordered DISTINCT queries blazing fast in PostgreSQL 🔥.
As you'll see in the benchmarks below, some queries performed more than 8000x better than before – and many of the SQL queries your applications and analytics tools use could also see dramatic improvements with this new feature.
This feature works in both TimescaleDB hypertables, TimescaleDB distributed hypertables, and normal PostgreSQL tables.
This means that with TimescaleDB, not only will your time-series DISTINCT queries be faster, but any other related queries you may have on normal Postgres tables will also be faster.
This is because TimescaleDB is not just a time-series database. It’s a relational database: specifically, a relational database for time-series. Developers who use TimescaleDB get the benefit of a purpose-built time-series database, plus a classic relational (Postgres) database, all in one, with full SQL support.
(We are also open to upstreaming this implementation to PostgreSQL, but would actually prefer that the other patch, which is implemented at a lower level, makes it into PostgreSQL 15. So if we can help, please let us know.)
And to be clear, we love PostgreSQL. We employ engineers who contribute to PostgreSQL. We contribute to the ecosystem around PostgreSQL. PostgreSQL is the world’s fastest growing database, and we are excited to support it alongside thousands of other users and contributors.
We constantly seek to advance the state of the art with databases, and features like Skip Scan are only our latest contribution to the industry. Skip Scan makes TimescaleDB better but also makes PostgreSQL a better, more competitive database overall, especially relative to MySQL, Oracle, DB2, and others.
What is Skip Scan and how can it make such a huge difference? Read on to learn how it works, how we benchmarked performance across varying cardinality and simulated scenarios, and how to get started.
What about RECURSIVE CTEs?
At this point, an experienced PostgreSQL user might point out that it is already possible to get reasonably fast DISTINCT queries via RECURSIVE CTEs.
From the PostgreSQL Wiki, using a RECURSIVE CTE can get you good results, but writing these kinds of queries can often feel cumbersome and unintuitive, especially for developers new to PostgreSQL:
WITH RECURSIVE cte AS ( (SELECT tags_id FROM cpu ORDER BY tags_id, time DESC LIMIT 1) UNION ALL SELECT ( SELECT tags_id FROM cpu WHERE tags_id > t.tags_id ORDER BY tags_id, time DESC LIMIT 1 ) FROM cte t WHERE t.tags_id IS NOT NULL ) SELECT * FROM cte LIMIT 50;
But even if writing a RECURSIVE CTE like this in day-to-day querying felt natural to you, there's a bigger problem. Most application developers, ORMs, and charting tools like Grafana or Tableau will still use the simpler, straight-forward form:
SELECT DISTINCT ON (tags_id) * FROM cpu WHERE tags_id >=1 ORDER BY tags_id, time DESC LIMIT 50;
In PostgreSQL, without a "skip scan" node, this query will perform the much slower Index Only Scan, causing your applications and graphing tools to feel clunky and slow.
Surely there's a better way, right?
Skip Scan is the way
Skip Scan is an optimization for queries of the form of SELECT DISTINCT ON (column). Conceptually, a Skip Scan is a regular IndexScan but that “skips” across an index looking for the next value that is greater than the current value:
With Skip Scan in TimescaleDB/PostgreSQL, query planning and execution can now utilize a new node (displayed as
(SkipScan) in the EXPLAIN output) to quickly return distinct items from a properly ordered index. Rather than having to scan the entire index with an Index Only Scan, Skip Scan incrementally searches for each successive item in the ordered index. As it locates one item, the
(SkipScan) node quickly restarts the search for the next item. This proves to be a much more efficient way of finding distinct items in an ordered index. (See GitHub for more details.)
Benchmarking TimescaleDB Skip Scan vs. a normal PostgreSQL Index Scan
In every example query, TimescaleDB with Skip Scan improved query response times by at least 26x.
But, the real surprise is how much of a difference it makes at lower cardinalities with lots of data - almost 8500x faster to retrieve all columns for the most recent reading of each device. That's fast!
In our tests, Skip Scan is also consistently faster - by 80x or more - in our 4,000-device benchmarks. (This level of cardinality is typical for many users of TimescaleDB.)
Before we share the full results, here is how our benchmark was set up.
Benchmark set up
To perform our benchmarks, we installed TimescaleDB on a DigitalOcean Droplet using the following specifications. PostgreSQL and TimescaleDB were installed from packages and we applied the recommended tuning from
- 8 Intel vCPUs
- 16GB of RAM
- 320GB NVMe SSD
- Ubuntu 20.04 LTS
- Postgres 12.6
- TimescaleDB 2.2 (First release with Skip Scan. TimescaleDB 2.2.1 primarily adds distributed hypertable support and some bug fixes.)
To demonstrate the performance impact of Skip Scan on varying degrees of cardinality, we benchmarked 3 separate datasets of varying size. To generate our datasets, we used the 'cpu-only' use case in the Time Series Benchmark Suite (TSBS), which creates 10 metrics, every 10 seconds, for each device (identified by the
tag_id in our benchmark queries).
|Dataset 1||Dataset 2||Dataset 3|
|100 devices||4000 devices||10,000 devices|
|4 months of data||4 days of data||36 hours of data|
|~103,000,000 rows||~103,000,000 rows||~144,000,000 rows|
Additional data preparation
In real life, not all device data is up-to-date, because devices go offline and internet connections get interrupted. Therefore, to simulate a more realistic scenario, (i.e., that some devices had stopped reporting for a period of time), we deleted rows for random devices over each of the following periods.
|Dataset 1||Dataset 2||Dataset 3|
|5 random devices over:||100 random devices over:||250 random devices over:|
|30 minutes||1 hour||10 minutes|
|36 hours||12 hours||1 hour|
|7 days||36 hours||12 hours|
|1 month||3 days||24 hours|
To delete the data, we utilized the
tablesample function of Postgres. This SELECT feature allows you to return a random sample of rows from a table, based on a percentage of the total rows. In the example below, we randomly sample 10% of the rows (
bernoulli(10) ) and then take the first 10 (
limit 10 ).
DELETE FROM cpu WHERE tags_id IN (SELECT id FROM tags tablesample bernoulli(10) LIMIT 10) AND time >= now() - INTERVAL '30 minutes';
From there, we ran each of the benchmarking queries multiple times to accommodate for caching, with and without Skip Scan enabled.
As mentioned earlier, the following two indexes were present on the hypertable for all queries.
"cpu_tags_id_time_idx" btree (tags_id, "time" DESC) "cpu_time_idx" btree ("time" DESC)
Here are the results:
About the queries benchmarked
For this test, we benchmarked 5 types of common queries:
Scenario #1: What was the last reported time of each device in a paged list?
SELECT DISTINCT ON (tags_id) tags_id, time FROM cpu ORDER BY tags_id, time DESC LIMIT 10 OFFSET 50;
Scenario #2: What was the time and most recently reported set of values for each device in a paged list?
SELECT DISTINCT ON (tags_id) * FROM cpu ORDER BY tags_id, time DESC LIMIT 10 OFFSET 50;
Scenario #3: What is the most recent point for all reporting devices in the last 5 minutes?
Note: This query is similar to the Prometheus "Instant Query" and can significantly speed up queries on distinct series.
In some initial testing on one Promscale instance with 3.5 million rows, this query improved by 9x with Skip Scan enabled.
SELECT DISTINCT ON (tags_id) * FROM cpu WHERE time >= now() - INTERVAL '5 minutes' ORDER BY tags_id, time DESC;
Scenario #4: Which devices reported at some time today, but not within the last hour?
WITH older AS ( SELECT DISTINCT ON (tags_id) tags_id FROM cpu WHERE time > now() - INTERVAL '24 hours' ) SELECT * FROM older o WHERE NOT EXISTS ( SELECT 1 FROM cpu WHERE cpu.tags_id = o.tags_id AND time > now() - INTERVAL '1 hour' );
Scenario #5: Which devices reported yesterday, but not in the last 24 hours?
WITH older AS ( SELECT DISTINCT ON (tags_id) tags_id FROM cpu WHERE time > now() - INTERVAL '48 hours' AND time < now() - INTERVAL '24 hours' ) SELECT * FROM older o WHERE NOT EXISTS ( SELECT 1 FROM cpu WHERE cpu.tags_id = o.tags_id AND time > now() - INTERVAL '24 hour' );
How will your application improve?
But, Skip Scan isn’t a theoretical improvement reserved for benchmarking blog posts 😉 – it has real-world implications, and many applications we use rely on getting this data as fast as possible.
Think about the applications you use (or develop) every day. Do they retrieve paged lists of unique items from database tables to fill dropdown options (or grids of data)?
At a few thousand items, the query latency might not be very noticeable. But, as your data grows and you have millions of rows of data and tens of thousands of distinct items, that dropdown menu might take seconds - or minutes - to populate.
Skip Scan can reduce that to tens of milliseconds!
Even better, Skip Scan also provides a fast, efficient way of answering the question that so many people with time-series data ask every day:
"What was the last time and value recorded for each of my [devices / users / services / crypto and stock investments / etc]?"
As long as there is an index on "device_id" and "time" descending, Skip Scan will retrieve the data using a query like this much more efficiently.
SELECT DISTINCT ON (device_id) * FROM cpu ORDER BY device_id, time DESC;
With Skip Scan, your application and dashboards that rely on these types of queries will now load a whole lot faster 🚀 (see below).
How to use Skip Scan on TimescaleDB
How do you get started? Upgrade to TimescaleDB 2.2.1 and set up your schema and indexing as described below. You should start to see immediate speed improvements in many of your DISTINCT queries.
To ensure that a (SkipScan) node can be chosen for your query plan:
First, the query must use the
DISTINCT keyword on a single column. The benchmarking queries above will give you some examples to draw from.
Second, there must be an index that contains the
DISTINCT column first, and any other
ORDER BY columns. Specifically:
- the index needs to be a
- The index needs to match the
ORDER BYin your query
DISTINCTcolumn must either be the first column of the index, or any leading column(s) must be used as constraints in your query
In practice, this means that if we use the the questions from the beginning of this blog post ("retrieve a list of unique IDs in order" and "retrieve the last reading of each ID"), we would need at least one index like this (but if you're using a TimescaleDB hypertable, this likely already exists):
"cpu_tags_id_time_idx" btree (tags_id, "time" DESC)
With that index in place, you should start to see immediate benefit if your queries look similar to the benchmarking examples below. When Skip Scan is chosen for your query, the
EXPLAIN ANALYZE output will show one or more
Custom Scan (SkipScan) nodes similar to this:
-> Unique -> Merge Append Sort Key: _hyper_8_79_chunk.tags_id, _hyper_8_79_chunk."time" DESC -> Custom Scan (SkipScan) on _hyper_8_79_chunk -> Index Only Scan using _hyper_8_79_chunk_cpu_tags_id_time_idx on _hyper_8_79_chunk Index Cond: (tags_id > NULL::integer) -> Custom Scan (SkipScan) on _hyper_8_80_chunk -> Index Only Scan using _hyper_8_80_chunk_cpu_tags_id_time_idx on _hyper_8_80_chunk Index Cond: (tags_id > NULL::integer) ...
Increasingly, we see applications and community driven examples of PostgreSQL being used to store and query high-volume, time-series, ordered data.
Whether it's IoT device monitoring or high-frequency trading, data is being ingested quickly – and users need their "what was the last thing that happened?" queries to return quickly too. In this area, the built-in PostgreSQL planner isn't currently optimized for this kind of query. That’s why we built Skip Scan.
After using TSBS to generate >100 million rows of data and simulate varying ranges of "last reported time" for all devices, we saw that Skip Scan can have a dramatic impact on ordered DISTINCT query performance (up to 8500x in some cases!). Notably, when an application needs to show the most recent value for all devices in a paged interface, our query benchmarks show that Skip Scan improves the responsiveness of the application in these scenarios.
In addition to the query speed improvements, introducing Skip Scan in TimescaleDB 2.2.1 also highlights the power of PostgreSQL’s extensible architecture.
As users try Skip Scan, we hope that the performance results that they experience solves the query problems they have now and can be used to support additional Skip Scan development and implementations.
And the fact that we - and any community member - can solve query pain points using the tooling provided by PostgreSQL extensions demonstrates the versatility of the ecosystem, which in our opinion is one of the best parts of PostgreSQL.
Learn more and get started
If you’re new to TimescaleDB, create a free account to get started with a fully-managed TimescaleDB instance (100% free for 30 days).
If you are an existing user:
- Timescale Cloud: TimescaleDB 2.2.1 is now the default for all new services on Timescale Cloud, and any of your existing services will be automatically upgraded during your next maintenance window.
- Managed Service for TimescaleDB: Skip Scan for hypertables and PostgreSQL tables is available today via TimescaleDB 2.2. Support for distributed hypertables is coming in the next few weeks via TimescaleDB 2.2.1.
- Self-managed TimescaleDB: Here are upgrade instructions.
Join our Slack community to share your results, ask questions, get advice, and connect with other developers (I, as well as our co-founders, engineers, and passionate community members are active on all channels).