How a Senior Technologist at animation film studio LAIKA developed a tool to store metrics from Netdata to TimescaleDB
Mahlon Smith has a job many people would envy. He works at LAIKA, the studio behind animation films such as Coraline, ParaNorman, Kubo and the Two Strings, and most recently, Missing Link.
Making animated films requires a broad array of IT infrastructure: render farms, workstations, virtual machines, and more. And these all produce enormous amounts of metrics and time-series data.
Part of Mahlon’s job is to monitor this data to make sure everything runs smoothly. TimescaleDB is one tool Mahlon uses to do this. TimescaleDB allows him to consolidate metrics from all of LAIKA’s host machines and other infrastructure into one central, performant, reliable time-series database.
Netdata is one source of metrics for Mahlon and LAIKA. After seeing a GitHub issue requesting support for TimescaleDB as a Netdata backend, Mahlon decided to scratch his own itch and put together the netdata-tsrelay tool, so that he could consolidate his Netdata metrics with other time-series data in TimescaleDB.
And because of his initiative, Mahlon is now also a Founding Hero in the Timescale Heroes program, Timescale’s program to recognize and promote our most active and awesome community members.
In this post, we will share more about Netdata, LAIKA, and how you can use Mahlon’s tool to write metrics to TimescaleDB from Netdata clients.
What is Netdata?
Netdata is a system for real-time performance and health monitoring of distributed systems. It’s designed to run on all types of systems (physical and virtual servers, containers, IoT devices, etc.) without disrupting their core functions. And it’s quite popular with 41K+ stars on GitHub!
Users report that it’s extremely easy to use and requires minimal setup. Many have also commented that the modern interface allows them to track real-time performance per second which is unlike other monitoring solutions. It also provides options for alerting and creates beautifully designed graphs.
Relay tool for storing Netdata metrics in TimescaleDB
As mentioned above, in 2017 a user requested Netdata to add support for TimescaleDB as a backend. If you follow the thread, you will see that instead of waiting for the Netdata team, Mahlon took the initiative and created a tool to solve the problem himself.
According to the description, here’s what the tool does:
“This program is designed to accept JSON streams from Netdata clients, and write metrics to a PostgreSQL table - specifically, Timescale backed tables”
The relay tool makes storing Netdata metrics in TimescaleDB possible. This is important because it allows Netdata users to benefit from TimescaleDB’s performance gains, and allows users to combine their time-series data from Netdata with data from other data sources and consolidate it all in one place for analysis.
(Note: This summer, the team at Netdata increased the priority of the open issue since several people have requested TimescaleDB support. We will update this post once support is official.)
How LAIKA uses TimescaleDB & Netdata
Since 2005, LAIKA has been the creative force behind animated films such as Coraline, Paranorman, Kubo and the Two Strings, and most recently, Missing Link.
As an animation studio, LAIKA has render farms, workstations, and virtual machines producing enormous amounts of IT metrics and time-series data. They require a sophisticated database system to handle data processing and storage.
LAIKA is a long-time PostgreSQL user and added TimescaleDB to their infrastructure in 2018 to help manage and store their IT metrics and time-series data. So far, the tool has been in production at LAIKA for over a year and helps them with their use case of time-based logging, where they record over 8 million metrics an hour for netdata content alone.
Keeping Netdata metrics in TimescaleDB, as opposed to Netdata itself, has three main advantages for LAIKA:
First it enables LAIKA to consolidate metrics data across all their host machines, as they can store host metrics from all machines in one environment. This allows them to ask questions about their farm as a single resource (i.e machines in aggregate), rather than as a collection of individual machines.
Secondly, it enables LAIKA to consolidate their Netdata metrics with other sources of time-series data. This allows them to correlate host events to many of their other systems more seamlessly.
Thirdly and finally, it allows LAIKA to view long term trends in how resources are being consumed. Keeping all their Netdata metrics in TimescaleDB as a long term store allows that data to be analyzed over the timeframe of several years, which is the timeframe of producing one of its stop-motion feature films. This is not possible using Netdata alone as it only saves a small buffer of data in memory, meant for short term analysis.
If you are looking to experience the advantages of using TimescaleDB as a backend for your Netdata metrics, we encourage you to check out Mahlon’s tool on our examples page and signup for Timescale Cloud, where you’ll get $300 in free credits to get started with your implementation.
Finally, if you’re like Mahlon and you’re passionate about Timescale and want to share your story with the world, you might be our next Timescale Hero! Interested in joining the Heroes program? Fill out this form and we’ll reach out to you.