Overheard @ Office Hours
A selection of FAQs from Timescale Office Hours. During our last session we discussed chunk size, indexing, space partitioning, and more.
Last week, we had a very engaged group join for office hours. There were TimescaleDB users from a variety of industries including fintech, IoT, and manufacturing.
As a reminder, we run open office hours a few times a month to give users an opportunity to ask any questions they might have about TimescaleDB. If you are interested in joining to discuss your use case with our engineers and learn more from other community members, you can sign up for our next session here.
Below, we will share a selection of questions from our last office hours.
Can you insert a time value that’s not current? As in, we want to insert data and record it as a previous point in time.
Yes, this is fully supported by TimescaleDB. If a new chunk needs to be created under the hood, it will. However, there can be some performance implications. For example, if you are importing all of your recent data, Postgres can store this in memory and speed it up, but if you are entering it randomly in time it won’t be as efficient (but it will still work).
In terms of query efficiency, does it make sense to have data duplicated across the table?
With partial indexes, having duplicated data could help you avoid performance issues. However, using separate tables is probably the way to go.
We are querying up to 6 months of data from 2-10 GBs of data per day. What is the best chunk size interval and how should we decide it? Can you change it for existing chunks?
For sizing, it’s generally based on memory. If most of your data coming in is recent data, you want the entire chunk and its indexes to be stored in memory in the Postgres shared buffers. The chunk size should be 20-25% including indexes. We have a function called
chunk_relation_size (docs) which will give you the full size of the table including all the indexes. It’s generally not a problem to go a little over or under the recommended size, but you don’t want a table that takes up all your memory and requires the index to swap in and out of RAM.
For new chunks you can change the chunk size interval. For historic chunks, you can’t change the size without dumping the data.
Does indexing affect the database storage size?
Yes. Indexes can be huge. The characteristics of indexes in TimescaleDB are pretty similar to those in Postgres. But, the main difference is that TimescaleDB doesn’t create global indexes across all the chunks. Instead, the index is local to that chunk and that’s how we are able to scale. Within TimescaleDB, there’s not a restriction on indexes or how they behave. (Learn more about using composite indexes to speed up time-series queries here.)
How do you do HA and what is our recommended option?
HA in Postgres and TimescaleDB is pretty similar since we inherit many characteristics from Postgres. Postgres has the option to do hot standbys or synchronous replicas, and you can run Patroni which watches the health of your databases to determine who is the leader. (More on this topic here.) Internally, we use Kubernetes to manage all of HA. We plan to share more on our methodology for doing this in the upcoming months.
I’m collecting lots of different types of data from different devices and want to have one table for each device. Is it possible to have 50K+ tables in Postgres?
It is possible. However, the challenge here is table management. We’d suggest using a naming convention or metadata tables.
I’m looking to use space partitioning as a way to achieve multi-tenancy. Will this speed up performance?
While you can use space partitioning for this use case, you won’t see direct performance benefits. Space partitioning is generally use to get parallelism across multiple disks. On the same topic, TimescaleDB has a reordering policy (docs) which will take your table and reorder it based on an index. This will make your queries a lot faster if you typically query on a specific index.