How we leveraged the foundation of PostgreSQL and extended its capabilities to improve performance.

TimescaleDB is an open-source time-series database that leverages PostgreSQL and extends it to scale for time-series workloads. In order to achieve these performance characteristics, TimescaleDB partitions data into chunks, where each chunk has a start and end time. By breaking data up into chunks, we support faster inserts since recent chunks fit in memory. On the querying side, we leverage the information we know about these chunks to optimize query performance.

In this post, my primary focus is on the work we’ve been doing to optimize query performance. Specifically, since we know how data is partitioned, we can use that knowledge to intelligently reduce the number of chunks we query. If a query has constraints on partitioning columns, we can use those constraints to only target chunks that could return results according to those constraints. By reducing the number of chunks accessed, you reduce compute cycles and disk access, all of which ultimately speed up your queries.

Before we jump into the optimizations that we’ve built in TimescaleDB, I’d like to share some more about PostgreSQL’s internals to provide context on where our constraint exclusion optimizations fit in.

For more information on TimescaleDB’s internals, visit our docs.

What really happens when you execute a query in PostgreSQL

After an application program establishes a connection to a PostgreSQL server, the program transmits a query to the server and waits for a result. The parser validates the query against the static grammar definition and produces a parse tree. It then performs semantic analysis and checks the catalog for referenced objects and produces a query tree. Next, the rewriter takes the query tree and looks for any rules to apply. For example, if a query is selecting from a VIEW, the rewriter will rewrite the query to select from the underlying tables (not the VIEW itself).

The query planner generates different paths that would all produce the desired query result and assigns a cost to each one. The cheapest path is turned into a plan that the executor can use. The executor makes use of the storage system while scanning relations, performs sorts and joins, evaluates qualifications, and finally creates the result.

Every PostgreSQL function has a volatility classification which can be defined as a promise to the optimizer about the behavior of the function. Here are the three categories:

  • Immutable: cannot modify the database, guarantees to return the same results given the same arguments forever, can be evaluated safely at plan time, e.g. length()
  • Stable: cannot modify the database, guaranteed to return the same results given the same arguments within the same query, can be evaluated safely during execution time, e.g. now()
  • Volatile: can modify the database, able to return different results on successive calls with the same arguments, e.g. random()

Since operators are implemented as functions, expressions inherit their volatility category from the functions used in the expression. These expressions come into play when looking at the types of constraints that are applied during each stage of constraint exclusion. Volatility also determines whether an expression may be used in an index scan - only immutable and stable expressions may be used as condition in an index scan.

PostgreSQL only does constraint exclusion during plan time for normal tables. Runtime constraint exclusion is limited to tables using native partitioning in PG11+. With TimescaleDB 1.4, we made several planner improvements to allow users to benefit from constraint exclusion, even for queries that cannot leverage PostgreSQL’s built-in constraint exclusion that occurs during the planning stage.

How we implemented constraint exclusion at execution time

PostgreSQL represents how plans should be executed using “nodes”. There are a variety of different nodes that may appear in an EXPLAIN output, but we’ll focus specifically on append nodes, which essentially combine the results from multiple sources into a single result.

PostgreSQL has two standard Appends that are commonly used:

  • Append: appends results of child nodes to return a unioned result
  • MergeAppend: merge output of child nodes by sort key; all child nodes must be sorted by that same sort key

In order to apply constraint exclusion at execution time, we added some TimescaleDB custom nodes:

  • ConstraintAwareAppend: implements constraint exclusion at executor startup; usually sits on top of an Append or MergeAppend node
  • ChunkAppend: implements executor startup and runtime exclusion, and Ordered Append; within TimescaleDB it replaces Append and sometimes MergeAppend

Stage 1: Planner exclusion

We discussed earlier about how PostgreSQL’s constraint exclusion can only be applied during plan time, but regardless, it can still result in significant performance improvements. The first stage of constraint exclusion is planner exclusion which prevents unneeded chunks from entering the query plan. This stage is useful for queries with constraints on partitioning columns and applies to any immutable expressions.  

For example, say we want to perform the following query:

SELECT * FROM hypertable WHERE time < ‘2000-01-03’

Without optimization, here is the plan:

dev=# EXPLAIN (ANALYZE,COSTS OFF,BUFFERS,TIMING OFF,SUMMARY OFF)
SELECT * FROM metrics WHERE time < '2000-01-03';
                                               QUERY PLAN
---------------------------------------------------------------------------------------------------------
 Append (actual rows=2880 loops=1)
   Buffers: shared hit=39
   ->  Seq Scan on metrics (actual rows=0 loops=1)
         Filter: ("time" < '2000-01-03 00:00:00+01'::timestamp with time zone)
   ->  Index Scan using _hyper_1_1_chunk_metrics_time_idx on _hyper_1_1_chunk (actual rows=2880 loops=1)
         Index Cond: ("time" < '2000-01-03 00:00:00+01'::timestamp with time zone)
         Buffers: shared hit=31
   ->  Index Scan using _hyper_1_2_chunk_metrics_time_idx on _hyper_1_2_chunk (actual rows=0 loops=1)
         Index Cond: ("time" < '2000-01-03 00:00:00+01'::timestamp with time zone)
         Buffers: shared hit=2
   ->  Index Scan using _hyper_1_3_chunk_metrics_time_idx on _hyper_1_3_chunk (actual rows=0 loops=1)
         Index Cond: ("time" < '2000-01-03 00:00:00+01'::timestamp with time zone)
         Buffers: shared hit=2
   ->  Index Scan using _hyper_1_4_chunk_metrics_time_idx on _hyper_1_4_chunk (actual rows=0 loops=1)
         Index Cond: ("time" < '2000-01-03 00:00:00+01'::timestamp with time zone)
         Buffers: shared hit=2
   ->  Index Scan using _hyper_1_5_chunk_metrics_time_idx on _hyper_1_5_chunk (actual rows=0 loops=1)
         Index Cond: ("time" < '2000-01-03 00:00:00+01'::timestamp with time zone)
         Buffers: shared hit=2

With optimization, here is the plan:

dev=# EXPLAIN (ANALYZE,COSTS OFF,BUFFERS,TIMING OFF,SUMMARY OFF)
SELECT * FROM metrics WHERE time < '2000-01-03';
                                               QUERY PLAN
---------------------------------------------------------------------------------------------------------
 Append (actual rows=2880 loops=1)
   Buffers: shared hit=31
   ->  Index Scan using _hyper_1_1_chunk_metrics_time_idx on _hyper_1_1_chunk (actual rows=2880 loops=1)
         Index Cond: ("time" < '2000-01-03 00:00:00+01'::timestamp with time zone)
         Buffers: shared hit=31

As you can see, by implementing constraint exclusion during the plan phase, we were able to prevent unneeded chunks from entering our query plan.

Stage 2: Executor startup exclusion

The second stage happens during the query executor initialization and removes unneeded chunks before execution. This stage is useful for queries with constraints on partitioning columns and applies to any stable expressions.

For example, say we want to perform the following query:

SELECT * FROM hypertable WHERE time < now()

With optimization, here is the plan:

dev=# EXPLAIN (ANALYZE,COSTS OFF,BUFFERS,TIMING OFF,SUMMARY OFF)
SELECT * FROM metrics WHERE time < now() - interval '19years 5month 28days';
                                               QUERY PLAN
---------------------------------------------------------------------------------------------------------
 Custom Scan (ChunkAppend) on metrics (actual rows=2389 loops=1)
   Chunks excluded during startup: 4
   Buffers: shared hit=26
   ->  Index Scan using _hyper_1_1_chunk_metrics_time_idx on _hyper_1_1_chunk (actual rows=2389 loops=1)
         Index Cond: ("time" < (now() - '19 years 5 mons 28 days'::interval))
         Buffers: shared hit=26

While this plan is similar to previous query, the now() function makes the expression no longer accessible to plan time constraint exclusion. For executor startup exclusion, TimescaleDB uses our custom scan node, ChunkAppend, to remove hypertable chunks that are not needed due to constraints during executor initialization.

Stage 3: Executor runtime exclusion

The final stage occurs during execution and is a new feature in TimescaleDB 1.4. It prevents unneeded chunks from being hit during execution and applies to any references to runtime values. This function is particularly useful for Nested Loop Joins, LATERAL Join, and subqueries.

For example, say we want to perform the following query:

SELECT * FROM hypertable WHERE time = (SELECT max(time) FROM hypertable)

dev=# EXPLAIN (ANALYZE,COSTS OFF,BUFFERS,TIMING OFF,SUMMARY OFF)
SELECT * FROM metrics WHERE time = (SELECT max(time) FROM metrics);
                                                                         QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------
 Custom Scan (ChunkAppend) on metrics (actual rows=1 loops=1)
   Chunks excluded during runtime: 4
   Buffers: shared hit=18
   InitPlan 2 (returns $1)
     ->  Result (actual rows=1 loops=1)
           Buffers: shared hit=3
           InitPlan 1 (returns $0)
             ->  Limit (actual rows=1 loops=1)
                   Buffers: shared hit=3
                   ->  Custom Scan (ChunkAppend) on metrics metrics_1 (actual rows=1 loops=1)
                         Order: metrics_1."time" DESC
                         Buffers: shared hit=3
                         ->  Index Only Scan Backward using _hyper_1_5_chunk_metrics_time_idx on _hyper_1_5_chunk _hyper_1_5_chunk_1 (actual rows=1 loops=1)
                               Index Cond: ("time" IS NOT NULL)
                               Heap Fetches: 1
                               Buffers: shared hit=3
                         ->  Index Only Scan Backward using _hyper_1_4_chunk_metrics_time_idx on _hyper_1_4_chunk _hyper_1_4_chunk_1 (never executed)
                               Index Cond: ("time" IS NOT NULL)
                               Heap Fetches: 0
                         ->  Index Only Scan Backward using _hyper_1_3_chunk_metrics_time_idx on _hyper_1_3_chunk _hyper_1_3_chunk_1 (never executed)
                               Index Cond: ("time" IS NOT NULL)
                               Heap Fetches: 0
                         ->  Index Only Scan Backward using _hyper_1_2_chunk_metrics_time_idx on _hyper_1_2_chunk _hyper_1_2_chunk_1 (never executed)
                               Index Cond: ("time" IS NOT NULL)
                               Heap Fetches: 0
                         ->  Index Only Scan Backward using _hyper_1_1_chunk_metrics_time_idx on _hyper_1_1_chunk _hyper_1_1_chunk_1 (never executed)
                               Index Cond: ("time" IS NOT NULL)
                               Heap Fetches: 0
   ->  Index Scan using _hyper_1_1_chunk_metrics_time_idx on _hyper_1_1_chunk (never executed)
         Index Cond: ("time" = $1)
   ->  Index Scan using _hyper_1_2_chunk_metrics_time_idx on _hyper_1_2_chunk (never executed)
         Index Cond: ("time" = $1)
   ->  Index Scan using _hyper_1_3_chunk_metrics_time_idx on _hyper_1_3_chunk (never executed)
         Index Cond: ("time" = $1)
   ->  Index Scan using _hyper_1_4_chunk_metrics_time_idx on _hyper_1_4_chunk (never executed)
         Index Cond: ("time" = $1)
   ->  Index Scan using _hyper_1_5_chunk_metrics_time_idx on _hyper_1_5_chunk (actual rows=1 loops=1)
         Index Cond: ("time" = $1)
         Buffers: shared hit=3

You’ll notice that chunks that we determine do not need to be accessed during runtime are marked as (never executed) in the EXPLAIN plan.

Next Steps

If you are interested in learning more about TimescaleDB 1.4, visit the blog post or GitHub page.

If you are just getting started with TimescaleDB, visit the installation guide. We also recently announced the launch of Timescale Cloud and encourage you to check it out.

Also, keep your eyes peeled for a follow-up blog post where we’ll discuss another interesting optimization we’ve implemented: OrderedAppend.