Introducing Prom-migrator: A universal, open-source Prometheus data migration tool

Introducing Prom-migrator: 
A universal, open-source Prometheus data migration tool

We've built a brand-new, 100% free, open-source tool that makes it easy to migrate your Prometheus metrics data to and from various long-term storage systems. Learn how it works and get started.

Prometheus is a metric collection, alerting, and storage system. It is architected to store data locally in its own time-series database (TSDB) and/or to ship data to other long-term storage systems via its remote_write/remote_read APIs. Over time, many long-term storage systems have emerged, and, as a result, users can have data stored in a variety of systems. For example, we recently shipped Promscale, an analytical platform and long-term store for Prometheus, built on top of TimescaleDB. But switching between these storage systems has not been possible because, until now, there has been no universal tool to migrate data between them.

Prom-migrator solves this problem. It is a universal, open-source Prometheus data migration tool that is community-driven and free to use. Prom-migrator migrates data from one remote-storage system to another remote-storage system, leveraging Prometheus’ remote storage API. This means that this tool can be used to migrate data from any storage system that supports the Prometheus remote_read protocol. Similarly, it can migrate data to any storage system that supports the Prometheus remote_write protocol.

In this post, we'll share why we built Prom-migrator, including the problems it solves, which systems it is compatible with, and how it works. We'll also take you through two short examples to show how you'd use Prom-migrator in two very different scenarios – and, in the process, hopefully inspire you to give it a try yourself.

Why did we build Prom-migrator?

Prometheus has many remote-storage systems and this has been growing over time.  However, there was no universal tool for data migration, leaving users with few good options if they wanted to switch between different remote-storage systems:

  • They could throw away the old data they had in their previous system. This led to gaps in historical knowledge about a system.
  • They could continue to run both the old and the new system and try to redirect relevant queries to either system. This led to operational and data management headaches.
  • They could continue using the previous system and not switch. This is classic vendor lock-in.

Prom-migrator features

Prom-migrator offers several new features:

  1. Data migration from and to any storage system. This tool is designed to work with any remote storage system, so that users can migrate data across any system in a wide range of scenarios.
  2. Informative outputs during runtime, allowing users to track progress. Prom-migrator keeps users informed about the migration progress, so that users can plan their time accordingly.
  3. Ability to resume migration(s) in case of any unintended shutdowns. Keep a record of migration progress and, in the case failure or interruptions, automatically resume the migration where you left off.
  4. Stateless working model. Easy deployment due to the fact that the migrator does not need to keep state. No need to worry about mounting volumes or attaching persistent storage.

Compatibility between Prom-migrator and other databases

The chart below describes the systems Prom-migrator can work with, both for reading and for writing data. We break out two cases for the migration destination endpoint: what we call “write,”  which is migrating data to an empty database and what we call “backfill,”  which is migrating data to a database that already has data newer than what is being migrated.

As you can see, many remote storage systems support migrating into an empty database, but not backfill. This is because these systems expect data ingest in loose time order and cannot support out-of-order ingest. (Promscale is the exception here, as it does support backfill.)

Key:

  • Read - means data can be read from system
  • Write - means data can be written to an empty database
  • Backfill - means data can be written to a database that already contains data newer than what is being inserted
Storage name Read Write Backfill
Promscale Yes Yes Yes
Prometheus Yes Yes
(Work under progress,
Experimental)
No*
Cortex (blocks storage) Yes Yes No*
Cortex (chunk storage) Yes Yes No
Thanos Yes
(via G-research’s thanos-remote-read)
Yes No*
M3DB Yes Yes No
Victoria Metrics No Yes Yes
InfluxDB v1.8 Yes Yes Yes

* you may be able to use promtool to backfill data in these systems

In general, all storage systems that support Prometheus’ remote_write are supported by Prom-migrator for writing to an empty database. Similarly, storage systems that support Prometheus’s remote_read endpoint are supported for reading data by the migration tool.

How Prom-migrator works

Let’s dive deeper into how Prom-migrator works with your desired storage systems. Here’s a conceptual overview of the process:

Overview of how Prom-migrator works.

Prom-migrator migrates data from one storage to another. It iteratively pulls data from a remote storage system using the remote_read endpoint for a certain time-range.

Architecture diagram: Prom-migrator remote-read endpoint metric data flow
To start, Prom-migrator pulls data from remote storage systems, using the remote_read endpoint

Then, Prom-migrator pushes the data to another remote storage system using the remote_write endpoint. It then advances the time-range it is working on and repeats this process. This continues until it finishes the entire time-range specified by the user. The time-range of any individual read is adaptively auto-adjusted to bound the overall memory usage of the system.

Architecture Diagram: Prom-migrator data flow
Once data is pulled, Prom-migrator then pushes data to the destination’s remote_write endpoint

The system is able to auto-resume the migration if previously interrupted. This is done by adding a progress-metric to the data while it is writing.

The sample of the progress-metric records the maximum time that was written. Thus, when a migration process resumes, it simply reads the progress-metric to find out what was last written and picks up where it left off.

Architecture Diagram: Prom-migrator progress metric data flow
Prom-migrator tracks the maximum time written to the destination storage system in order to provide auto-resume capabilities

For detailed information about working, design and process, please refer to the Prom-migrator design doc.

Using Prom-migrator

Let’s examine how you could use Prom-migrator to migrate your existing data in two scenarios:

  1. Migration from Prometheus’s time-series database (tsdb) to Promscale
  2. Migration from one remote storage system to another

Migration from Prometheus’ TSDB to Promscale

In this case, we will show how to migrate data from Prometheus’ built-in time-series database to any remote storage system that supports a Prometheus-compliant remote_write endpoint. You may want to do this when you’re first adding a long-term storage system to your observability stack.

In this scenario, your existing Prometheus data will show up in the system right away. Promscale is the remote-storage solution provided by TimescaleDB, so we will illustrate this example by sending data to TimescaleDB using Promscale.


Install and extract Prometheus

You can download the Prometheus binary from the GitHub releases page of the Prometheus repository. For setting up of Prometheus, you can visit the related Prometheus documentation.

# Download the Prometheus binary
wget https://github.com/prometheus/prometheus/releases/download/v2.24.1/prometheus-2.24.1.linux-amd64.tar.gz

# Extract the contents of the binary
tar -zxvf prometheus-2.24.1.linux-amd64.tar.gz 

Once you have extracted the downloaded tar file, you will notice the prometheus.yml file. This is our configuration file for running the Prometheus binary. For simply running Prometheus, the default configuration file (prometheus.yml) will not require any changes.

Start Prometheus

./prometheus --config.file=prometheus.yml

This sets up the remote_read storage which we will be using as our source of data. Upon starting Prometheus, it will start scraping the targets mentioned in its configuration file and store the samples scraped into its local TSDB only (provided remote_write url is not mentioned).

Setup Promscale

Now, let's set up Promscale. Promscale is a remote read/write storage platform that is offered by Timescale. It accepts Prometheus data via remote_write and stores it in TimescaleDB. Promscale enables Prometheus data to be queried and analyzed using PostgreSQL and PromQL natively (while remaining 100% PromQL compatible according to the  PromQL compliance tests result from PromLabs).

For setting up Promscale, refer to the installation section of Promscale. For this example, we will be using binaries from the GitHub releases page of Promscale.

# Download the Promscale binary
wget -O promscale https://github.com/timescale/promscale/releases/download/0.1.4/promscale_0.1.4_Linux_x86_64

# Provide execution permissions
chmod +x promscale

After downloading the respective Promscale version, you need to start Promscale as mentioned in the Promscale README. In this example, we want to run Promcale on bare-metal.

Start Promscale

./promscale -db-name=<db-name> -db-password=<password> -db-ssl-mode=allow

After setting up Prometheus and Promscale, we now have to supply three URLs as inputs to Prom-migrator:

  • remote_read URL
  • remote_write URL
  • progress-metric URL

For this example, these would correspond to:

Migrating data from Prometheus to Promscale using Prom-migrator.
Migrating data from Prometheus to Promscale using Prom-migrator.

The progress metric is used internally by Prom-migrator to ensure that if our migration is interrupted, it can be automatically resumed from where it left off. Prom-migrator does this by reading the value of the progress-metric it wrote as part of the interrupted migration.

Thus, in order to resume the migration, we need to tell the migrator where to read the value or the progress metric. Since the migrator writes the progress metric to the destination, the URL for fetching the progress metric will be the remote_read URL of Promscale, which serves as the input to progress-metric-url.

Next, let’s set up the things necessary for performing the migration itself. We start by downloading the Prom-migrator’s binary from Promscale’s GitHub release page.

Let's do the migration

Any migration requires the following:

  • Minimum time from which the fetching of data is to be started.
  • Read URL, the url of the storage system from which the data is to be read/fetched.
  • Write URL, the url of the storage system where the data is to be pushed.

The maximum time is an option field, as its default value is the current time. In this example, we want to migrate everything till now. So, we will leave the maximum time empty. Moreover, since  Promscale is the destination, we can use its remote_read url as the progress-metric-url. For more information about various configurations of Prom-migrator, please refer to the Prom-migrator docs.

We execute the migration with the following command after ensuring that Prometheus and Promscale are running on the URLs mentioned above.

./prom-migrator -mint=1608018121 -read-url=http://<prometheus_host>:9090/api/v1/read -write-url=http://<promscale_host>:9201/write -progress-metric-url=http://<promscale_host>:9201/read 

The above command executes migration from the Prometheus instance running at :9090 to Promscale instance running at :9201, migrating data from 1608018121 (time in unix seconds) up to now and at the same time, maintaining the progress of the migration carried out, so that the process can be resumed in case of any interruption.

Note: We did not specify the maxt (or maximum timestamp up to which migration should be carried out) since by default, maxt corresponds to the current time, meaning all data from mint upto now would be migrated.

We’ve recorded the following video to walk you through the entire process:

Vineeth takes you through how Prom-migrator works and how to get up and running in under 10 mins 🔥.

Migration from one remote storage to another

In this second scenario,  we will discuss how to migrate data from one remote-storage system to another. For this example, we will transfer data from a Cortex instance to Promscale using Prom-migrator.

Cortex supports a Prometheus-compliant remote_read endpoint. Hence, the read API in here will serve as the input to -read-url in Prom-migrator. For the -write-url, we will use Promscale’s Prometheus-compliant remote_write endpoint. We also want to keep track of the progress so that we are protected from intentional crashes. Hence, we provide Promscale’s remote_read endpoint as the input to -progres-metric-url so that prom-migrator can push the timestamp of the most recently migrated block.

Let's break this down step by step.

Getting started

In this example we will assume that Cortex is already running. You can get information on how to set up Cortex from its documentation.

You will also need to download and install Promscale according to the instructions found in the “Setup Promscale” section, above.

Start both Promscale (“Start Promscale," above) and Cortex according to their respective instructions.

Download Prom-migrator

Promscale and Cortex are up and running. You can download Prom-migrator from Promscale’s releases page.

Now we can begin the migration:

./prom-migrator -read-url=http://<cortex_host>:9009/api/prom/api/v1/read -write-url=http://<promscale_host>:9201/write -mint=1609920418 -progress-metric-url=http://<promscale_host>:9201/read

The above URL migrates the data from Cortex running at :9009 to Promscale running at :9201 from the timestamp 1609920418 to now. At the same time, the prom-migrator utilizes the progress-metric-url to maintain the progress of the migration carried out, so that the process can be resumed in case of any interruption.

Note: We did not specify the maxt (or maximum timestamp up to which migration should be carried out) since by default, maxt corresponds to the current migration, meaning all data from mint up to now needs to be migrated.

With the above command, you should see blocks with progress bars being formed with their respective time-ranges and the progress percent as per the overall migration task.

Conclusion

Prometheus is an open-source systems monitoring and alerting toolkit that can be used to easily and cost-effectively monitor infrastructure and applications. Over the past few years, Prometheus has emerged as the monitoring solution for modern software systems. The key to Prometheus’ success is its pull-based architecture in combination with service discovery, which is able to seamlessly monitor modern, dynamic systems in which (micro-)services startup and shutdown frequently.

Long-term storage of Prometheus metrics gives you greater insight into your systems. Unfortunately, Prometheus does not itself provide durable, highly-available long-term storage or advanced analytics, but relies on other projects to implement this functionality.

Prom-migrator is a universal, open-source Prometheus data migration tool that's community-driven and free to use. With Prom-migrator, you can move your data from Prometheus to the long-term storage of your choice, or migrate between different long-term storage options.

For more:

Finally, we would love to collaborate with any remote-storage provider to improve Prom-migrator support for their tool.

Please reach out to promscale@timescale.com or simply file issues and PRs in GitHub.

This post was written by
10 min read
Announcements & Releases
Contributors

Related posts

TimescaleDB - Timeseries database for PostgreSQL

Explore TimescaleDB

Learn more about how TimescaleDB works, compare versions, and get technical guidance and tutorials.

Go to docs Go to products