Announcing Automated Disk Management: Safely Managing Your Cloud Database

Announcing Automated Disk Management: Safely Managing Your Cloud Database

To close out our final week of “Always Be Launching” month, we are announcing “Automated Full Disk Management” for Timescale Cloud, a new capability to ease operational overhead, protect against unforeseen overages, and keep your database up and running.

Throughout this month, we’ve announced a number of new features that improve Timescale Cloud, our cloud-native database service for time-series, which is designed to give developers a worry-free experience – without sacrificing flexibility and control.

These announcements have included major query performance increases in TimescaleDB (8000x for finding distinct values), significant improvements to TimescaleDB’s best-in-class columnar compression, the ability to scale to larger compute instances and greater storage capacity supporting 100+ TB of pre-compressed data), a new “operations center” for your database, flexible VPC Peering for greater security, and hardened backup/restore mechanisms for ensuring the reliability of your data. And, since Timescale Cloud is powered by TimescaleDB, you get all of these capabilities and use the PostgreSQL environment you know and love. It’s been quite a month!

To continue with this theme of delivering a worry-free platform for time-series data that gives you control when and where you want, we’re launching support for automated full-disk management on Timescale Cloud.

Automated full-disk management will alert you whenever you approach storage limitations on your account, put your service in a read-only state so that your data is not lost, and give you an opportunity to configure your service so that you can resume collecting everything that matters.

Keep reading for more about how we automated full disk management on Timescale Cloud, the four key components of our approach and how they work together to give you more peace of mind about your disk usage.

If you’re new to Timescale, create a free account to get started with a fully-managed Timescale Cloud service (100% free for 30 days, no credit card required).

Once you are using TimescaleDB, please join the Timescale community and ask any questions you may have about time-series data, databases, and more.

And, for those who share our mission and want to join our fully remote, global team: we are hiring broadly across many roles.

Finally, special thanks to Ivan Tolstosheyev and the entire Timescale Cloud team behind the development of this capacity 🙏 .

Full disks are a pain

We’ve all experienced it.You’re trying to take that latest puppy video or photo to share in your group text, but your storage is full. You’re left trying to figure out which caches to clear or other pictures to delete.

Nobody likes a full disk, including your database and the operating system it uses. You try to insert more data into your database, and the “write-ahead-log” (WAL) it uses to ensure all data is reliably and atomically written has no place to write its log. Try to add an index, and there’s no place on disk to store the index pages. And, even if you don’t directly write any new data to the database, things are happening (or, perhaps more accurately, not happening) in the background. Temp files can’t be written. File system blocks can't be allocated. Unexpected things go wrong.

A cloud-native platform like Timescale Cloud circumvents these issues, providing built-in safety mechanisms (or, as we like to say “worry-free”) for your TimescaleDB database. Why should disk resources be finite and fixed, rather than scale as needed over time? Why shouldn’t you have an “escape hatch” to safely recover some space when needed and return user instances to healthy states?

Many of these mechanisms can and should be transparent and “just work” (as one example, Timescale Cloud employs system balloon files for an additional layer of “defense in depth” to full disks). But today, we wanted to share a bit more about new and existing user-facing capabilities for full disk management.

Automating full-disk management

Timescale Cloud's automated full-disk management performs four key tasks in sequential order:

  1. Detect if storage capacity begins to fill up;
  2. Notify users about this growth in a timely manner;
  3. Automatically employ overload protections, switching databases to read-only when full, so they remain available for queries; and
  4. Allow users to easily return their database to a normal state, increasing storage or freeing up their consumption in a few short steps.

Let’s walk through the components that underpin Timescale Cloud’s automated full disk management:

Continuous storage monitoring

Timescale Cloud continuously monitors the health and resource consumption of all database services. This real-time data is always available in the “metrics” tab of your cloud console (and also monitored by our 24/7 operations team). When the database’s storage consumption exceeds certain thresholds of available resources, the platform triggers automated actions. (This includes both user-facing actions and behind-the-scenes ones, as described below.)

An example graph of storage consumption found in the Timescale Cloud console.
Database storage consumption from within your cloud console. ✨ Fun fact: both user- and operations-facing metrics are stored and powered by TimescaleDB.

Automated user alerting

Timescale Cloud automatically triggers email notifications when your storage exceeds 75%, 85%, and 95% capacity. But, because we’re developers too and know excessive emails are quickly lost or ignored, we’ve added a few parameters to balance signals and noise: alerting thresholds use low- and high-watermarks for thresholds, and messages are capped by time, so that developers should expect at most one email about storage capacity per 24 hours for each database.

Example email alert notifying you that your disk storage capacity has exceeded 75%.
Example email alert notifying you that your disk storage capacity has exceeded 75%. Alerts will also be sent when disk storage capacity hits 85% and 95% capacity.

Automated overload protection

We also know that sometimes you might not see or react to alerts immediately, especially if you’re performing a large batch insert on a relatively small disk at 3am. So, the platform will automatically place the database in a safe “read-only” state once it reaches 99% full (and alert you via both email and the cloud console).

At this point, you can still query your database, but cannot insert any new data. So while the database is still otherwise available for queries, the main goal at this point is to determine next steps: resize storage capacity for continued growth or shrink data usage.

User-initiated storage resizing with zero downtime

Timescale Cloud allows you to increase your storage capacity, from 10GB to 10TB, without any downtime. And, because the platform decouples compute and storage, you can incrementally increase just the resource that’s needed; if you need another 500GB, no need to pay for another 4 CPU because that’s the next VM instance that’s available.

Just navigate to your cloud console and select the disk size that works for you. You’ll see side-by-side comparisons and cost calculations as you make adjustments – and, once you’re all set, hit apply, and additional capacity will be allocated to your service in a few seconds. And, we mean zero downtime - even ongoing queries will be unaffected during the resizing. Once your service’s storage has been increased, the database is automatically taken out of read-only protection and you can start writing freely again.

Example of resizing storage from 25GB to 500GB
Resize your storage plan anytime with zero downtime from the “Resources” section in your Timescale Cloud console. Storage and compute plans are independent, allowing you to scale each resource individually according to your budget and needs.

Safe user-initiated storage recovery

To shrink storage consumption, users can turn off read-only mode, then perform any needed actions, e.g., compressing data, deleting rows/tables, or dropping old data via data retention policies. You can turn the entire database back to read-write mode through the cloud console if you want, although this isn’t our recommended approach: if your service has any runaway ingest pipelines and applications that will auto-reconnect, these applications might start immediately re-inserting data once you do so (although the automated overload protections will kick in again shortly after).

Better yet, you can make an individual session read-write, while the database overall remains in read-only model (this is a built-in TimescaleDB capability, which Timescale Cloud inherits). You can log in to your database, and enable compression, data retention, or delete rows or tables from within only that session.
As a concrete example, connect to your database via psql and run the following to turn off read-only protection for that specific session.

SET default_transaction_read_only TO off; 

Then from within that same session, you can turn on compression to save 94 - 97% of your storage consumption.

ALTER TABLE purchases SET (
  timescaledb.compress,
  timescaledb.compress_segmentby = 'sku'
);

SELECT add_compression_policy('purchases', interval '1 day');

Or you can create a data retention policy to only retain, for example, data for 90 days, which will start working on any old data to free up space.

SELECT add_retention_policy('purchases', interval '90 days');

And you are done! As soon as the storage consumption drops beneath the appropriate threshold, the platform’s continuous service monitoring will automatically remove the read-only protections so you can start inserting data again.

Towards auto-scaling storage

With automated full-disk management, Timescale Cloud now provides capabilities for monitoring storage consumption, automatically triggering actions when above a certain threshold, and resizing database storage without any downtime.

But, our larger goal is to provide full auto-scaling. With today’s launch, the triggered actions are sending email alerts and placing a database into read-only mode, enabling you to resize your instance with a single click. A natural next step is allowing you to select (or opt-in) to auto-scaling for your service(s), so that triggered actions also include automatically increasing storage when needed.

Of course, control is still critical for cloud platforms, and a big part of our approach. So we’ll continue to notify users as storage fills up or is resized, and plan to allow developers to specify limits to auto-scaling to avoid unexpected costs. And then, if a database service even hits that preconfigured auto-scale limit, Timescale Cloud’s overload protection will kick in to make sure safe actions can be taken.

And we’re just getting started

We kicked off “Always Be Launching” with our announcement of $40M in new financing. (You can read my co-founder’s post for more details about our new investors and long-term vision for Timescale.)

This is the final post of our #AlwaysBeLaunching month. Throughout twelve announcements this month, our goal was to demonstrate to our customers and the broader industry our commitment to delivering high-quality features and products at a fast pace:

  1. $40M to help developers measure everything that matters (and introducing Launch Month) - In addition to announcing our Series B investment, led by Redpoint Ventures, we kicked off something more ambitious (and possibly more foolish) than anything we’ve done before: a month with 10+ launches of new features to our database, managed service, and observability products.
  2. New documentation - We re-designed our docs to make it faster for you to access what you need and easier to contribute, plus better getting started materials to become a TimescaleDB pro in less time.
  3. TimescaleDB 2.2.1 - We introduced TimescaleDB SkipScan, an optimization that makes DISTINCT queries up to 8000x faster on PostgreSQL.
  4. Timescale Cloud Explorer - We released an easy to use “operations center” for your cloud-native database services, which helps you better understand the state and performance of your database.
  5. State of PostgreSQL 2021 - We curated key findings, summaries of developer sentiment and recommendations for how to make the community even more welcoming and helpful to new developers from surveying 400+ PostgreSQL developers from all over the world.
  6. Larger storage plans (up to 10TB) on Timescale Cloud - We listened to user feedback and released new 5-10TB plans, enabling you to store 100+ TB of pre-compressed data on your cloud-native database in Timescale Cloud.
  7. Timescale Cloud VPC Peering - We released a capability that enables you to securely connect your existing AWS infrastructure to Timescale Cloud without ever exposing it to the public Internet, to better ensure the safety and privacy of all your data.
  8. Promscale 0.4 - We improved Promscale, an open-source analytical platform and long-term store for Prometheus data, with better support for Prometheus high-availability, support for multi-tenancy, improved user permissions (using role-based access control), and more.
  9. Larger compute plans (up to 32 CPU / 128 GB RAM) on Timescale Cloud - Once again, we listened to user feedback and launched new compute plans, enabling you to scale to even larger time-series workloads.
  10. How I learned to stop worrying and love PostgreSQL on Kubernetes: continuous backup/restore validation on Timescale Cloud - PostgreSQL contributor Oleksii Kliukin gave us a behind-the-scenes tour of how we automatically test backups to ensure they can be restored when users need them (and how we use PostgreSQL tools and Kubernetes to do it).
  11. TimescaleDB 2.3 - We made our native compression easier for developers to work with compressed hypertables by enabling INSERTs on compressed hypertables.
  12. Automated Full Disk Management for Timescale Cloud (today’s post) - We released new capabilities to ease operational overhead, protect against unforeseen overages, and keep your database up and running.

On a personal note, when Ajay and I founded Timescale a few years ago, we were always excited about the possibilities for time-series data, and its need for the right type of database. But what has continuously amazed us is the breadth and variety...and sheer coolness...of its use cases.

From building tiny battery-less sensors that harvest energy from the thin air, to collecting data from outer space from orbital missions, conserving museum artifacts, improving air travel and our busy skies, listening to space weather, improving yields in smart agriculture, empowering retail investors with trading bots, and more. Helping developers measure everything that matters to them, from the mundane to the amazing.

We’re even more excited and passionate about the future of Timescale and time-series data than when we first started. Just wait to see what we have planned in the coming months and beyond!

So as our friendly mascot Eon likes to say, it’s time to make that future into today and “Always Be Launching”.

Your next steps

If you’re new to Timescale, create a free account to get started with a fully-managed Timescale Cloud instance (100% free for 30 days). After creating a new database service, start loading data to use TimescaleDB without worrying about limits – because our automated full disk management is here to protect your service.

And once you are using TimescaleDB, please join the TimescaleDB community and ask any questions you may have about time-series data, databases, and more.

And, for those who share our mission and want to join our fully remote team: we are hiring broadly across many roles.

To the stars! 🐯🚀

Ingest and query in milliseconds, even at terabyte scale.
This post was written by
10 min read
Always Be Launching
Contributors

Related posts