Save money and improve performance by carefully managing your time-series data

Save money and improve performance by carefully managing your time-series data

Time-series data is relentless. It grows at an ever-increasing rate and quickly becomes expensive and unwieldy to store and query. Managing your time-series data is essential in building high-performance and cost-effective applications. Timescale provides the tools you need to maintain storage and query performance - without requiring you to delete all of your data - and keep tight control over your costs.

So, what does it mean to manage your time-series data? In many ways, it's similar to what application developers call Application Lifecycle Management, the continuous process of planning, developing, testing, and improving software applications. Having a continuous feedback loop in place allows a team to adjust requirements and deliver exceptional application experiences month after month.

The same is true of data management. Too often, companies choose the easiest path to launch a product, acquire and store data, and start to produce charts and actionable information... only to find that performance crumbles and costs skyrocket after the beta phase is over. If your database solution doesn't provide the tools to help you proactively manage the entire lifecycle, from the moment data is ingested to when data must be archived or deleted for good, it's going to be hard to play catch-up when the deluge of time-series data pours in.

There are five steps to “data lifecycle management”:

  1. Ingest and store data efficiently
  2. Query recent raw data regularly
  3. Create aggregated historical rollups
  4. Archive/data tier older raw data
  5. Drop raw data after some predefined interval passes
The diagram showing five steps of the data lifecycle management.
Managing the time-series data lifecycle will enable you to reduce costs and improve performance.

Time-series data adds another layer of complexity. Time-series data comes at you fast, sometimes generating millions of data points per second. In order to measure everything that matters, you need to capture all of the data you possibly can. But, storing and maintaining that data at scale can be difficult, whether it’s for cost, query performance, or a myriad of other reasons.

As just one example, in a recent benchmark, we attempted to fit as much data as possible in a single 1TB instance. Our core instance was an 8vCPU with 32GB of RAM and a monthly cost of approximately $1500 to run. We generated a dataset with 100,000 devices, each producing 10 readings every minute, amounting to 144,000,000 rows of data per day totaling 27.5GB of disk. As you can tell, in this case, the disk would be 20% full within a week and we would have to start dropping data after about a month. Fortunately, we turned on compression, enabling us to save 90% on our storage without sacrificing (and in some cases, improving!) query performance.

Without features like native compression in TimescaleDB, the total cost to store 6 months worth of data would likely be around $20,000. With the ability to thoughtfully manage your time-series data in TimescaleDB, that cost can be reduced to $7,000 or less. That's nearly a 60% savings in cost over the entire six months!

Managing your time-series data should be an essential consideration when using a time-series database. But, where do you start? How do you effectively manage your data? What tools do you have at your disposal?

Fortunately, TimescaleDB offers a number of built-in features beyond compression to manage your time-series data. (Side note: while developers may come to TimescaleDB because it’s built on PostgreSQL, so their SQL skills and favorite tools “just work” or because a feature like continuous aggregates solved a problem, we often hear that they stay because we’ve engineered TimescaleDB to ingest and query data efficiently at massive scale – and help developers continue to manage it over time.)

Read on for more information on what managing time-series data entails, how time-series data poses a unique challenge to managing your data, and some of the features in TimescaleDB to help you more effectively manage your data. We use a fictitious scenario to demonstrate how to put everything together and include example queries and advice to help you apply each capability to suit your needs.

If you’d like to get started with TimescaleDB right away, you can try TimescaleDB for free with our hosted service, or download it and run it locally or in your own cloud infrastructure.

From there, join our Slack community to ask questions and meet other time-series data enthusiasts around the world (you’ll find me, Timescale engineers, and our amazing community members active in all channels).


Data lifecycle management

Managing data at scale is a complex task, particularly when you don't know what to expect with regards to long-term cost and performance.  Finding a database that can help you build a complete solution to manage the data that matters most can certainly be challenging.

Any solution, whether SQL or NoSQL, needs to effectively provide tooling to manage each phase of the lifecycle:

  1. Ingest and store data efficiently. When it comes to time-series data, storing data as quickly as you can and as efficiently as possible is key to success. Not all databases can effectively give you both.
  2. Query recent data regularly. Once your data is stored, you need to query it efficiently to satisfy the business requirements of your application or monitoring solution. All database technologies have some form of a query planner that translates your query into “instructions”: fetch the data, aggregate it, and return it in the form requested. And regardless of technology, queries always work best when the data you’re working with fits into fast, physical memory.
  3. Create aggregated historical rollups. As you ingest more and more time-series data, it’s common to want to analyze your data over longer - and longer - ranges of time (e.g., to assess trends, plan future resource needs, etc.). In most cases, this becomes untenable as raw data increases in cardinality and size, because a query may need to process tens or hundreds of millions of rows.
  4. Archive/data tier older raw data. One of the hardest parts of managing massive amounts of data is knowing how to save costs as time goes by without sacrificing the performance of near-time data or access to historical data. Typically, you can accomplish this with data storage tiering and possibly physically archiving raw data for some period of time (even if it's not immediately accessible at query time).
  5. Drop raw data after some predefined interval passes. Sooner or later, data in its raw form becomes less and less valuable to look at in the context of what's happening today. It also slows down queries and increases storage costs. Eventually, you need an effective way to drop the raw data from your database. Once again, not all databases are created equal for such an essential task.

It’s about time: time-series data management

Data lifecycle management is even more important, and more difficult when it comes to time-series data.

Time-series data is a sequence of data points collected over time intervals, giving us the ability to track changes over time. Time-series data can track changes over milliseconds, days, or even years. As we’ve discussed before, time-series data has widespread applicability in many problem domains and industries. In all these applications, time is a primary axis on which data is stored.

Time-series data has a way of overwhelming you quickly. When it comes to the first two aspects of managing your time-series data (creating and querying raw data), we've shown often that TimescaleDB excels at ingesting data quickly and providing the features to query raw data efficiently. (See our benchmarks for reference: TimescaleDB vs. Amazon Timestream: 6000x higher inserts, 5-175x faster queries, 150x-220x cheaper;TimescaleDB vs. InfluxDB: Purpose-built differently for time-series data; TimescaleDB vs. MongoDB: 260% higher insert performance, up to 54x faster queries, and simpler implementation).

Thus, in order to maintain efficient query performance and keep tight control over costs, you need a time-series database with complete data management capabilities.

The TimescaleDB tool belt

When we built TimescaleDB, we knew that ingesting and storing data at petabyte scale was only one critical part of an overall time-series solution. For all the reasons described above, we knew that a category-defining database for time-series data needed category-defining features for managing time-series data. From the architecture of TimescaleDB itself to numerous features purpose-built for managing time-series data, we’ve made sure that everything you need is in one (easy-to-use) product.

Hypertables and chunks

Hypertables provide the core foundation of the TimescaleDB architecture, and, thus, unsurprisingly enable much of the functionality for time-series data management. What are hypertables? From a user's perspective, TimescaleDB exposes what looks like singular tables into a feature called hypertables. A hypertable is the primary point of interaction with your data, as it provides the standard table abstraction that you can query via standard SQL.

Virtually all user interactions with TimescaleDB are with hypertables. Inserting, updating, or deleting data, querying data via SELECTs, altering tables, adding new columns or indexes, JOINs with other tables or hypertables, and so forth can (and should) all be executed on the hypertable.

However, hypertables are actually an abstraction or virtual view of many individual tables that actually store the data, called chunks.

Hypertables are particularly well suited for time-series data management because they already efficiently partition all incoming time-series data into smaller, more manageable sets of data. The smaller chunks of data that are created have special time-based properties that allow TimescaleDB to understand the range of data that exists in each chunk and effectively maintain each individual chunk, rather than one ever-growing table. As you'll see, this small detail provides some huge benefits when working with time-series data.

This all translates into fast data ingestion that can scale to millions of rows of data per second. (Relatedly, optimizing ingest rate is critical to many common PostgreSQL use cases – but it’s even more essential for time-series scenarios, where you retain all data vs. overwrite past values. See a few of our favorite ways to speed up PostgreSQL insert performance.)

Compression

Once an application has created a suitable schema and is ingesting large quantities of time-series data, the next hurdle most users face is how to store as much raw data as possible, while still executing fast queries on data spanning long time intervals. Many databases, both relational and NoSQL, provide some form of compression at the disk or document level. This saves physical space which allows you to store more data but querying the data still requires retrieving and scanning all rows (or properties). That is, the data itself hasn't been stored more efficiently and so as time passes and you query longer periods of data, the slower queries will typically get slower.

In contrast, TimescaleDB provides two benefits when native columnar compression is applied to a hypertable.

First, TimescaleDB modifies compressed chunks so that data is now stored by columns and not rows. Doing this can often save you 94-97% or more on disk space by deploying best-in-class algorithms for compressing various types of data, including Gorilla compression for floats, delta-of-delta, and more. TimescaleDB automatically chooses the algorithm that is the best fit for your data.

Second, because the data is now stored as individual columns it can increase the query performance for historical aggregate data. This is possible because of how we store the compressed data. For each segmentby column in your hypertable, TimescaleDB will (effectively) group up to 1,000 rows of data at a time, for each column, as an array. This allows TimescaleDB to efficiently retrieve individual columns of data for long time periods.

As an example, let's assume that the cpu table below is partitioned into chunks every 7 days (what TimescaleDB calls the chunk_time_interval of a chunk). If we store one reading every minute for 1,000 devices, a chunk would store approximately 10,000 rows per device over those 7 days (and ~10 million rows in total per chunk).

Without compression, the example query below would have to scan ~30 million rows of data (which would likely retrieve all columns) to attain the average cpu usage.

SELECT time_bucket('time','1 hour') bucket, 
   avg(cpu_usage) cpu_avg
FROM cpu
WHERE time <= now() - INTERVAL '7 days' 
     AND time > now() - INTERVAL '1 month'

After compression, however, the same query would only have to scan ~30 thousand rows of data (.001% of the uncompressed query), for just one column. This is significantly less work and can dramatically improve query responses on some queries.

With native compression, you can store more data for longer periods at a lower cost, providing you greater flexibility on how you maintain your data. And, for some of your queries, the columnar format could increase the performance of your queries on older data.


Continuous aggregates

Performing analysis on time-series data often involves using aggregate functions to observe trends over time - functions like SUM(), AVG(), MIN(), MAX(), etc. These functions become slower over time as more and more data is aggregated, simply because more data exists. Regardless of how big and fast your server maybe, reading tens of millions of records to find the average hourly CPU usage over a month will still take time and resources.

One potential solution is the materialized view feature of PostgreSQL, which persists the results of the query to disk as if it were a regular table. This is better than a regular view in PostgreSQL that executes the underlying query every time the view is referenced (which doesn't reduce any server load over time). The main limitation of materialized views is that they require additional development work to automatically stay up-to-date (e.g., utilizing triggers and stored procedures).

In contrast, TimescaleDB provides a feature called continuous aggregates that enhances materialized views so that the aggregated data is intelligently kept up-to-date, refreshing only the portions of the view where underlying data has changed, on a schedule of your choice. If values were updated or deleted, only the time-range that was modified would be recomputed. This is a powerful feature that can improve the performance of historical queries. Additionally, because the data is persisted to disk, it can be used as a long-term store of the aggregated data. Even when older raw data is eventually dropped from the hypertable, the continuous aggregate retains the historical aggregation of that data.

Therefore, continuous aggregates provide multiple benefits in the context of your data lifecycle. They provide an efficient way to query aggregated data (reducing query load on the raw data), and they provide long-term storage of aggregated data, decoupled from any modifications to the underlying raw data.

Data tiering

Even with features like native compression and continuous aggregates, there are still times where your application needs access to raw data for very long periods of time, even if that means older queries take longer. In most databases or data lakes, this is a feature known as data tiering.

With data tiering you can move data to cheaper (and often slower) storage once it has reached some age threshold. The data isn't specifically archived at this point, but in many cases, this is seen as the first step towards archiving raw data.

For this functionality, TimescaleDB works in tandem with the PostgreSQL feature known as tablespaces. Tablespaces allow you to specify different locations on disk where PostgreSQL can store the data files that make up your database. When used with different classes of storage, administrators can implement policies that move older data onto cheaper (but slower) disks to save money, although queries on older data will likely be slower.

With TimescaleDB, data tiering can be achieved by using tablespaces in conjunction with the move_chunk function to move data and indexes over time.

Data retention

Once you've used compression to store more data for longer periods of time, created continuous aggregates to improve query performance, and maybe even instituted some data tiering for your growing time-series dataset, a well-planned strategy for managing time-series data requires that you consider when it's appropriate to physically drop raw data from your database. At some point, querying older raw data on a regular basis will take additional resources and provide decreasing value with time-series data.

TimescaleDB provides built-in functionality to set up data retention policies. With a straightforward API, users can set up a policy that will drop chunks from a hypertable once the data inside of that chunk reaches a certain age. Because TimescaleDB knows the minimum and maximum timestamp for every chunk, it can efficiently find chunks that can be safely dropped from the database.

Dropping chunks as the primary function of data retention also helps overcome common problems in PostgreSQL (and other relational databases), when the alternative is issuing a DELETE for some range of data. Deleting rows of data is slow, prone to locking issues, and negatively impacts index maintenance, and requires more work through the use of VACUUM. With hypertable chunks, however, all indexes are local to that child table, so no index maintenance is needed and the operation is as fast as dropping a regular PostgreSQL table - nearly instantaneous.

Setting up a data retention policy is done using the APIs that come with TimescaleDB. If you have a hypertable called conditions and want to drop chunks of data once they are 6 months old, this SQL query will create the automated policy.

SELECT add_retention_policy('conditions',INTERVAL '6 months',true);

Once you execute this command, TimescaleDB will begin checking once a day for chunks that can be dropped.

User-defined actions

The final piece of a solid strategy for managing your time-series data is customization - having the ability to fine-tune existing schedules or craft new actions to deal with your data in a specific way that the built-in TimescaleDB policies don't currently support. If you use vanilla PostgreSQL or another time-series database, job scheduling is not typically available without installing additional extensions.

TimescaleDB, on the other hand, provides direct access to its internal job scheduling framework through a feature called user-defined actions, enabling you to write and implement fully customizable procedures to work with your data on your terms. This unlocks a world of possibilities for using all of the principles of managing time-series data together to craft just the right level of access, compression, downsampling, and archiving of data.

With user-defined actions, you create the stored procedure and accept a generic config parameter that accepts JSONB. Within that config object, you can add any parameters needed for your script to work with.

Let's look at a few examples that we provide in our documentation.

Generic data retention action

With this example, we create a short stored procedure to drop chunks from all hypertables after a configurable interval. This has the advantage of not requiring a data retention policy for each hypertable as shown earlier in this article. In this example, the script sets one parameter from the config object, drop_after, which is the interval to use for dropping chunks.

CREATE OR REPLACE PROCEDURE generic_retention (job_id int, config jsonb)
LANGUAGE PLPGSQL
AS $$
DECLARE
  drop_after interval;
BEGIN
  SELECT jsonb_object_field_text (config, 'drop_after')::interval INTO STRICT drop_after;

  IF drop_after IS NULL THEN
    RAISE EXCEPTION 'Config must have drop_after';
  END IF;

  PERFORM drop_chunks(format('%I.%I', table_schema, table_name), older_than => drop_after)
    FROM timescaledb_information.hypertables;
END
$$;

This script could be modified for any number of data retention scenarios specific to your use case. For instance, if you have different schemas for different clients or different types of data, you could provide configuration parameters that specify unique drop_after properties for each schema.

Data tiering action

Another example we provide in our documentation relates to one of the tenets of data lifecycle management we discussed above - data tiering. This sample stored procedure sets three parameters based on the config JSONB input: hypertable, lag, and destination. These parameters are used to find chunks that are older than a specified interval (lag ) in order to move them to the specified tablespace using standard ALTER TABLE syntax.

CREATE OR REPLACE PROCEDURE custom_move_chunks (job_id int, config jsonb)
LANGUAGE PLPGSQL
AS $$
DECLARE
  ht REGCLASS;
  lag interval;
  destination name;
  chunk REGCLASS;
  tmp_name name;
BEGIN
  SELECT jsonb_object_field_text (config, 'hypertable')::regclass INTO STRICT ht;
  SELECT jsonb_object_field_text (config, 'lag')::interval INTO STRICT lag;
  SELECT jsonb_object_field_text (config, 'tablespace') INTO STRICT destination;

  IF ht IS NULL OR lag IS NULL OR destination IS NULL THEN
    RAISE EXCEPTION 'Config must have hypertable, lag and destination';
  END IF;

  FOR chunk IN
  SELECT show.oid
  FROM show_chunks(ht, older_than => lag)
  SHOW (oid)
    INNER JOIN pg_class pgc ON pgc.oid = show.oid
    INNER JOIN pg_tablespace pgts ON pgts.oid = pgc.reltablespace
  WHERE pgts.spcname != destination;
  LOOP
    RAISE NOTICE 'Moving chunk: %', chunk::text;
    EXECUTE format('ALTER TABLE %s SET TABLESPACE %I;', chunk, destination);
  END LOOP;
END
$$;

Once created, we can schedule it to be run once a day with the following SQL query.

SELECT add_job('custom_move_chunks','1d', config => '{"hypertable":"metrics","lag":"12 month","tablespace":"old_chunks"}');

Again, as you look at this sample stored procedure, there are numerous ways that you could modify this procedure to fit your data tiering requirements.

For instance, you might create a table to log every time a chunk is moved to cheaper storage. This information could be used to bill a customer differently for different tiers of storage, or it might be valuable for customer support if a customer reports slower query results (i.e., is the slower query time because data lives in a “slow” data tier?).

With user-defined actions, you create the workflows you need to manage and track changes in your TimescaleDB database.

Putting it all together

Many companies and development teams know that data lifecycle management is something they should take seriously – but often don't know which tools they have at their disposal to effectively manage the overwhelming amount of data or where to start.

As we've seen, TimescaleDB comes with the capabilities you need to manage the full lifecycle of your time-series data. Let's look at a fictitious end-to-end example of what this might look like for a small IoT sensor company that tracks the activity and health of pets: Pet Active.

Pet Active isn't very large at the moment; they currently have just a few thousand sensors in the wild collecting and reporting data to their production database. But, their service is catching on – and they're about to get a big boost from a celebrity whose Yorkshire Terrier was recently rescued because she was wearing one of Pet Active’s devices.

The devices currently report data anywhere from 60 seconds up to 5 minutes, depending on the connection (LTE or WiFi), battery, and kind of activity. Most of the time the device reports 10 separate activity metrics as well as GPS location when available. On average, Pet Active is receiving about 3 million rows of data per day.

Pet Active provides the following data available to their users through various dashboards, and they have three levels of data retention guarantees.

Type of data Data Retention Interval
Raw Data 60 days
Hourly aggregates 6 months
Daily aggregates 1 year
Monthly aggregates 5 years

Early on, however, the company also decided that even if the raw data wasn't easily accessible after 60 days, they at least wanted to keep an archived backup in AWS S3 so that they could recreate the data if needed and (possibly) use more historical, raw data for analytics down the road.

Now that we know the Pet Active SLA for data durability (i.e., how long data is available at various aggregation timeframes, as noted above), we can create one example setup for an effective plan to manage their time-series data.

Create the raw hypertable

When creating hypertables, our best practice guidelines for hypertables recommend sizing chunks so that the most recent chunks (the ones you're querying and inserting data into the most) comprise about 25% of your available server memory.

With that guidance in mind, we've determined that our chunk_time_interval should be set to 4 days based on the amount of data Pet Active receives and the size of their database server, specifically memory.

SELECT create_hypertable('pet_stats','time',chunk_time_interval=> INTERVAL '4 days');

Setup a compression policy

Based on Pet Active’s current data ingestion rate, the hypertable might grow by a few hundred megabytes a day. Choosing the interval after which older chunks are compressed is typically determined by a number of factors:

  • How much disk space is available for uncompressed, recent data
  • How often older data needs to be inserted into a hypertable (e.g., from devices that were offline and then send days of data once they reconnect)
  • When queries for older data may benefit from narrow (one or two column) queries over a longer period of time

Taking all of these things into account, we've decided to compress chunks once they are 8 days old, meaning that the two most recent chunks will always be uncompressed. Over time, and as the service ingests more data with a growing user base, we may need to revisit these settings to achieve optimal performance down the road.

ALTER TABLE pet_stats SET (timescaledb.compress, timescale.compress_segmentby='pet_id');

SELECT add_compresison_policy('pet_stats', INTERVAL '8 days');

Setup a data retention policy


To satisfy the Pet Active data durability SLA, we need to make sure raw hypertable data is retained for at least 60 days. Initially, we'll set up the retention policy for 90 days to give us some breathing room in case we want to do something different with our raw data, like archiving it to AWS S3 as part of a separate process.

SELECT add_retention_policy('pet_stats',INTERVAL '90 days');

Create the aggregate data

As we saw, Pet Active provides different aggregation views to downsample data over time. Below, we'll only create the first continuous aggregate for hourly data as an example, but the other views would be created in a similar way, even if the specific aggregate data returned (the SQL query) is different.

CREATE MATERIALIZED VIEW pet_stats_hourly
WITH (timescaledb.continuous) AS
SELECT pet_id,
       time_bucket(INTERVAL '1 hour', time) AS bucket,
       AVG(heart_bpm) bpm_avg,
       MAX(heart_bpm) bpm_max,
       MIN(heart_bpm) bpm_min,
	 SUM(active_time) active_total,
       SUM(rest_time) rest_total
FROM pet_stats
GROUP BY pet_id, bucket;

We also want this continuous aggregate to stay up-to-date over time, so we'll make sure it refreshes every hour and looks for any changes over the last 7 days. Any data that arrives with a timestamp older than 7 days, will still be inserted into the hypertable, but since we don't currently have a dashboard metric that shows more than 7 days of 1-hour data, we don't need to update the continuous aggregate for data earlier than that.

SELECT add_continuous_aggregate_policy('pet_stats_hourly',
    start_offset => INTERVAL '7 days',
    end_offset => INTERVAL '1 hour',
    schedule_interval => INTERVAL '1 hour');

Setup a continuous aggregate data retention policy


Just like the underlying hypertable, our Pet Active SLA allows us to drop aggregate data at various intervals. For the 1 hour continuous aggregate that we created above, we want to drop chunks from the continuous aggregate after the data is six months old. The SQL to make this happen is exactly like our example above with the hypertable retention policy, only this time we name the continuous aggregate and update the interval.

SELECT add_retention_policy('pet_stats_hourly',INTERVAL '180 days');

Create a custom action

Finally, Pet Active realized that although customers don't currently have access to raw data older than 60 days, they want to archive all data to Amazon S3 before the data is dropped from the hypertable with the 90-day retention policy. Moving archival data to other storage solutions enables you to save costs while retaining historical information should it be needed later. To do this, we'll create a stored procedure that exports data stored in older chunks and schedule it to run once a week. Once the CSV files are exported, a separate process external to TimescaleDB would actually move the files into S3.

CREATE OR REPLACE PROCEDURE file_archive (job_id int, config jsonb)
LANGUAGE PLPGSQL
AS $proc$
DECLARE
  export_after interval;
  hypertable regclass;
  file_prefix TEXT;
  select_stmt TEXT;
  file_name text;
  export_start timestamp;
  export_end timestamp;
BEGIN
  SELECT jsonb_object_field_text (config, 'export_after')::interval INTO STRICT export_after;
  SELECT jsonb_object_field_text (config, 'hypertable_name')::regclass INTO STRICT hypertable;
  SELECT jsonb_object_field_text (config, 'file_prefix') INTO STRICT file_prefix;

  IF export_after IS NULL THEN
    RAISE EXCEPTION 'Config must have export_after';
  END IF;

  IF file_prefix IS NULL THEN
    RAISE EXCEPTION 'Config must have file_prefix';
  END IF;

  SELECT INTO file_name, export_start, export_end
  	file_prefix || '_' || hypertable_name || '.' || hypertable_name || '_' || split_part(range_start::TEXT,' ',1) || c1 || '_',
    range_start, 
    range_end
  FROM timescaledb_information.chunks c
	INNER JOIN show_chunks(hypertable, older_than=> export_after) AS c1
    ON c.chunk_name = (SELECT split_part(c1::text, '.',2))
   ORDER BY range_start
   LIMIT 1;
 
   SELECT FORMAT($$ SELECT * FROM %1$I WHERE time >= '%2$s' AND time < '%3$s' $$, hypertable, export_start, export_end) INTO select_stmt;
  
   EXECUTE('COPY (' || select_stmt || ') TO ' || QUOTE_LITERAL(file_name || '.csv') || ' CSV HEADER'); 

END
$proc$;

With this final piece in place, we've set up a full strategy for managing the time-series data generated by all Pet Active IoT devices:

  • Data is compressed after a few weeks, reducing costs and improving historical query latency.
  • Continuous aggregates have been created for each downsampled time range to help improve the responsiveness of dashboards, while also setting up retention policies on the aggregate data over longer periods.
  • Historical raw data is archived to AWS S3 using a custom script and user-defined actions every 30 days for archival purposes.
  • And finally, raw data is permanently dropped from the database after 90 days, enabling us to maintain tight control over our costs.

With everything in place, Pet Active can focus on delivering value to their customers.

Conclusion

Creating and consuming data has never been easier, especially time-series data. Development teams should create a plan to proactively manage the amount of time-series data that is stored, downsampled, archived, and eventually dropped from the database – before the amount of data becomes overwhelming. This will enable you to maintain high performance and better control costs.

With TimescaleDB, the tools you need to create a robust plan are available immediately, out-of-the-box, for free. And if you don’t see the specific functionality or API that your situation requires, you can use user-defined actions to create policies and manage your data exactly as you need.

If you’re new to TimescaleDB, you can try TimescaleDB for free with our hosted service, or download it and run it locally or in your own cloud infrastructure.

If you’re an existing user, you have access to all of the above functionality automatically, assuming you’re running the latest version. Need to update? See our documentation for upgrade instructions.

From there, join our Slack community to ask questions and meet other time-series data enthusiasts around the world (you’ll find me, Timescale engineers, and our amazing community members active in all channels).

This post was written by
19 min read
Engineering
Contributors

Related posts

TimescaleDB - Timeseries database for PostgreSQL

Explore TimescaleDB

Learn more about how TimescaleDB works, compare versions, and get technical guidance and tutorials.

Go to docs Go to products