How SolarNetwork Fuels Energy Conservation With TimescaleDB

This is an installment of our “Community Member Spotlight” series, where we invite our customers to share their work, shining a light on their success and inspiring others with new ways to use technology to solve problems.

In this edition, Matt Magoffin, Technical Director at SolarNetwork Foundation, joins us to share the work they’re doing to promote renewable energy and sustainability throughout the world.

SolarNetwork Foundation is a New Zealand-registered charity organisation that helps individuals and organisations track their energy usage – with the ultimate goal of supporting renewable energy uptake and resource conservation. Above all, we believe decisions should be based on measurable facts; outcomes should be measurable and provide feedback into the decision making process. The “facts” in this domain include measurements of energy flows, weather, and markets—across time.

We’re a charity for two primary reasons:

Free, universal access to this information is vital to helping society (as a whole) transition to a climate-friendly energy economy.
Everyone should have access to an easy-to-use technology platform that supports individuals up to large governments and organisations

To the second point, our platform, SolarNetwork, provides an infrastructure service that allows developers and application integrators to use and surface the energy data we collect as they create applications for various audiences and scenarios.

The groups we work with are a testament to this: we have many individual renewable energy enthusiasts who like tracking their own energy use, and some have solar panels and like tracking their own energy production; we partner with schools that install solar panels and then design science curriculum around their energy consumption and production; and we work with sustainability-focused businesses who track their energy use and production – and integrate with automated systems to conserve energy.

About the Team

We started more than 10 years ago, as a research project and hobby, and we run an extremely lean organisation, staffed with just two founding members. John Gorman serves as our Strategic Director, while Matt Magoffin (aka me) is our Technical Director.

We both have backgrounds in electrical and computer engineering, and feel strongly about sustainability and renewable energy (we also both have lovely dogs).

John’s day-to-day activities involve working with people to plan and deploy renewable energy systems, as well as the hardware to integrate with the SolarNetwork platform. I focus on writing code for the SolarNetwork platform and managing the operational side of running it.

About the project

SolarNetwork is primarily written in Java and collects a variety of data:

Electrical energy measurements from power meters, solar panels, etc.
Weather data from public sources or private weather stations
Electricity grid market price data
Electric vehicle telemetry data, such as GPS, battery charge, etc.
Water meter data
Natural gas meter data
Control states, such as switches that can be turned on/off
Security camera images

SolarNetwork has a flexible data schema that doesn’t restrict the type of data it can collect, other than each bit of data must have a unique identifier and a date. Our users decide what data they need to achieve their goals and connect their own data sources, be it simple energy monitoring or full-blown building automation (we also support collecting data like energy market prices and weather via the SolarNetwork API). The general goal is still the same: reduce non-renewable energy use, by way of conservation and renewable generation.

Today, we average around 312K “datum” added each day; “datum” can have any number of metrics associated, so 312K datum include 3M+ different metrics. Since launching in 2000, our customers have deployed 300 nodes, which can collect any number of sources (e.g., power meters), and those nodes currently collect data from 3500+ sources.

What we see really motivates people is when they can view this mostly “invisible” information in clear and simple ways.

We’ve helped a variety of organisations do this, from simple tables:

3 tables displaying household energy use, electricity performance, and live temperatures — Example energy utilisation tables, displaying massive amounts of time-series data in human-readable form

to fully interactive learning tools:

to ad-hoc dashboards:

to Electric Vehicle charging station screens:

Report showing energy savings from one communal electric vehicle charging station

Using TimescaleDB

We first found out about and started using TimescaleDB in 2017.

We already used Postgres as our database, and, at the time, we used an extension called pg_partman to manage our largest tables as partitions. TimescaleDB was appealing to us because it provided a better way to manage the table partitioning and adapt it over time, as our usage grew. We also evaluated other tools like Cassandra, Druid, and CockroachDB – but none seemed like a perfect fit, especially since we already used Postgres.

For reference, our database is about 300GB in size, and we use ZFS with compression enabled, so our actual on-disk size is about ¼ of that. Besides the raw data itself, we maintain hourly, daily, and monthly pre-computed aggregate rows to increase query performance of common queries. Since we store metrics as jsonb, we use the plv8 language extension to perform most processing directly in the database.

In addition, SolarNetwork allows for data to be ingested in real-time and in bulk, with dates at any point in the past or future. Other tools placed restrictions on these ingest patterns that TimescaleDB does not.

As we got up and going, we were very happy with how supportive the Timescale team was (and continues to be!) in answering questions, providing feedback, and helping troubleshoot problems.

Editor’s Note: To reach the Timescale team (and community members like Matt), join our public Slack. Our engineers, co-founders, and many passionate TimescaleDB users are active in all channels.

Current Deployment & Future Plans

Many tools require quite a large initial investment in terms of deployment resources that were cost-prohibitive for us, as a charity organisation. TimescaleDB allowed us to work with the infrastructure we already had at the time we adopted it, and to grow as needed since. We're still able to run TimescaleDB on a single primary machine and secondary replica for most queries. Both machines run FreeBSD and leverage ZFS for its compression and integrity features.

To expand on the above graphic, our databases are PG 9.6 + TimescaleDB; SolarQuery Proxy, TLS Proxy are nginx; SolarIn/MQTT and SolarFlux are VerneMQ; Billing Admin / Billing Service are Kill Bill; and SolarIn, SolarQuery, SolarUser, SolarJobs, SolarSSH, Node Image Maker, and SolarNodes are all Java applications.

The main benefits we’ve had from using TimescaleDB are:

Maintaining ingest flexibility (importing data for any dates)
Simplified maintenance / automated partitioning (I don’t have to worry about customers importing data for old/future dates, which other systems couldn’t handle and we can adjust partition sizes as our usage grows).
Maintaining efficient, fast queries across the TimescaleDB datasets as they grow

To get an idea of the types of data and queries we expose to our customers / users, you can check out our SolarNetwork/solarnetwork-central GitHub repo, where you’ll find all of the open source cloud-hosted SolarNetwork service code (including all the SQL scripts/queries used by the application).

That link will highlight one of our latest query additions (i.e., lines 1638-1800 of the SQL file), which our customers use to get "meter reading" style data (e.g., “tell me the difference of an energy meter reading between the end and start of Month X.”).

The tricky parts of this query were that:

The date ranges are provided in node-local time and nodes can be in different time zones. To solve this, it first groups the requested nodes into time zones, so it can then query on absolute date ranges for each unique zone.
The query must find the "most recent row not after" the provided start/end dates, so that consistent results are reported month over month. For example, if we’re looking at a January time range, the first meter reading might be from Dec 31 23:59, and last at Jan 31 23:58.
We allow "auxiliary" datum records to be added for any source to handle various situations (like when a physical meter is replaced, and its reading drops back to 0). These auxiliary records allow the query to partition the data by time, calculate the difference between each partition, and then sum the results.

The query powers the datum/reading SolarNetwork API found here, and here's an example response object (note the wattHours_end and wattHours_start properties associated with the calculated wattHours difference):

{"created": "2018-07-01 03:59:42.014Z",
"nodeId": 18,
"sourceId": "/GEN/1",
"localDate": "2018-06-30",
"localTime": "23:59",
"wattHours": 10320000,
"wattHours_end": 805703000,
"wattHours_start": 795383000,
"endDate": "2018-08-01 03:58:42.014Z",
"timeZone": "America/New_York",
"localEndDate": "2018-07-31 23:58:42.014"
}

This result shows that SolarNetwork found a starting datum at 2018-07-01 03:59:42.014Z with a wattHour reading of 795383000 and an ending datum at 2018-08-01 03:58:42.014Z with a wattHour reading of 805703000 and calculated the difference as 10320000.

In the future, we would like to adopt more TimescaleDB features, such as data retention and continuous aggregates. We had (and still have) a similar home-grown aggregates feature deployed when we first adopted TimescaleDB, but like having the feature natively supported in the database.

Advice & Resources

The Timescale website and documentation contain a wealth of helpful information about managing large time-series datasets. As noted earlier, the Slack workspace is extremely helpful, and the Timescale team has always been friendly and amazingly responsive to community questions, large and small.

We’d like to thank Matt and the SolarNetwork team for their willingness to share their story and their commitment to sustainability; their work to ensure individuals and organizations are armed with the information they need to understand and conserve energy consumption is yet another example of the amazing power data has to help us understand the world around us.

We’re always keen to share community projects, either via our blog or DataPub, our monthly virtual meetup for open data enthusiasts (everyone’s welcome, so we hope to see you at a future session!).

Do you have a story or project you’d like to share? Send a note to [email protected] or reach out on Slack (@lacey butler), and we’ll go from there :).

Ingest and query in milliseconds, even at terabyte scale.

This post was written by