How I power a (successful) crypto trading bot with TimescaleDB
This is an installment of our “Community Member Spotlight” series, where we invite TimescaleDB community members to share their work, shining a light on their success and inspiring others with new ways to use technology to solve problems.
In this edition, Felipe Queis, a senior full-stack engineer for a Brazilian government traffic institution, joins us to share how he uses TimescaleDB to power his crypto trading bot – and how his side project influenced his team’s decision to adopt TimescaleDB for their work.
My first experience with crypto wasn’t under very good circumstances: a friend who takes care of several servers at his job was infected with ransomware – and this malware was demanding he pay the ransom amount in a cryptocurrency called Monero (XMR).
After this not-so-friendly introduction, I started to study how the technology behind cryptocurrencies works, and I fell in love with it. I was already interested in the stock market, so I joined the familiar (stock market) with the novel (crypto). To test the knowledge I’d learned from my stock market books, I started creating a simple Moving Average Convergence Divergence (MACD) crossover bot.
This worked for a while, but I quickly realized that I should - and could - make the bot a lot better.
Now, the project that I started as a hobby has a capital management system, a combination of technical indicators, and sentiment analysis powered by machine learning. Between 10 March 2020 and 10 July 2020, my bot resulted in a success rate of 61.5%, profit factor of 1.89, and cumulative gross result of approximately 487% (you can see a copy of all of my trades during this period in this Google Sheet report).
I'm 29 years old, and I’ve worked in a traffic governmental institution in São Paulo, Brazil (where I live too) as an senior full-stack developer since 2012.
In my day job, my main task at the moment is processing and storing the stream of information from Object Character Recognition (OCR)-equipped speed cameras that capture data from thousands of vehicles as they travel our state highways. Our data stack uses technologies like Java, Node.js, Kafka, and TimescaleDB.
(For reference, I started using TimescaleDB for my hobby project, and, after experiencing its performance and scale with my bot, I proposed we use it at my organization. We’ve found that it brings together the best of both worlds: time-series in a SQL database and open source).
I started to develop my crypto trading bot in mid- 2017, about six months after my first encounter with the crypto ecosystem – and I’ve continued working on it in my spare time for the last two and a half years.
Editor’s Note: Felipe recently hosted a Reddit AMA (Ask Me Anything) to share how he’s finally “perfected” his model, plus his experiences and advice for aspiring crypto developers and traders.
About the project
I needed a bot that gave me a high-performance, scalable way to calculate technical indicators and process sentiment data in real-time.
To do everything I need in terms of my technical indicators calculation, I collect candlestick chart data and market depth via an always-up websocket connection that tracks every Bitcoin market on the Binance exchange (~215 in total, 182 being tradeable, at this moment).
The machine learning sentiment analysis started as a simple experiment to see if external news affected the market. For example: if a famous person in the crypto ecosystem tweeted that a big exchange was hacked, the price will probably fall and affect the whole market. Likewise, very good news should impact the price in a positive way. I calculated sentiment analysis scores in real-time, as soon as new data was ingested from sources like Twitter, Reddit, RSS feeds, and etc. Then, using these scores, I could determine market conditions at the moment.
Now, I combine these two components with a weighted average, 60% technical indicators and 40% sentiment analysis.
At the beginning, I tried to save the collected data in simple files, but quickly realized that wasn’t a good way to store and process this data. I started looking for an alternative: a performant database.
I went through several databases, and all of them always lacked something I wound up needing to continue my project. I tried MongoDB, InfluxDB, and Druid, but none of them 100% met my needs.
Of the databases I tried, InfluxDB was a good option; however, every query that I tried to run was painful, due to their own query language (InfluxQL).
As soon as my series started to grow exponentially to higher levels, the server didn't have enough memory to handle them all in real-time. This is because the current InfluxDB TSM storage engine requires more and more allocated memory for each series. I have a large number of unique metrics, so the process ran out of available memory quickly.
I handle somewhat large amounts of data every day, especially on days with many market movements.
On average, I’m ingesting around 20k records/market, or 3.6 million total records, per day (20k*182 markets).
This is where TimescaleDB started to shine for me. It gave me fast real-time aggregations, built-in time-series functions, high ingestion rates – and it didn’t require elevated memory usage to do all of this.
Editor’s Note: For more about how Flux compares to SQL and deciding which one is right for you, see our blog post exploring the strengths and weaknesses of each.
To learn more about how TimescaleDB real-time aggregations work (as well as how they compare to vanilla PostgreSQL), see this blog post and mini-tutorial.
In addition to this raw market data, a common use case for me is to analyze the data in different time frames (e.g., 1min, 5min, 1hr, etc.). I maintain these records in a pre-computed aggregate to increase my query performance and allow me to make faster decisions about whether or not to enter a position.
For example, here’s a simple query that I use a lot to follow the performance of my trades on a daily or weekly basis (daily in this case):
SELECT time_group, total_trades, positive_trades, negative_trades, ROUND(100 * (positive_trades / total_trades), 2) AS success_rate, profit as gross_profit, ROUND((profit - (total_trades * 0.15)), 2) AS net_profit FROM ( SELECT time_bucket('1 day', buy_at::TIMESTAMP)::DATE AS time_group, COUNT(*) AS total_trades, SUM(CASE WHEN profit > 0 THEN 1 ELSE 0 END)::NUMERIC AS positive_trades, SUM(CASE WHEN profit <= 0 THEN 1 ELSE 0 END)::NUMERIC AS negative_trades, ROUND(SUM(profit), 2) AS profit FROM trade GROUP BY time_group ORDER BY time_group ) T ORDER BY time_group
And, I often use this function to measure market volatility, decomposing the range of a market pair in a period:
CREATE OR REPLACE FUNCTION tr(_symbol TEXT, _till INTERVAL) RETURNS TABLE(date TIMESTAMP WITHOUT TIME ZONE, result NUMERIC(9,8), percent NUMERIC(9,8)) LANGUAGE plpgsql AS $$ DECLARE BEGIN RETURN QUERY WITH candlestick AS ( SELECT * FROM candlestick c WHERE c.symbol = _symbol AND c.time > NOW() - _till ) SELECT d.time, (GREATEST(a, b, c)) :: NUMERIC(9,8) as result, (GREATEST(a, b, c) / d.close) :: NUMERIC(9,8) as percent FROM ( SELECT today.time, today.close, today.high - today.low as a, COALESCE(ABS(today.high - yesterday.close), 0) b, COALESCE(ABS(today.low - yesterday.close), 0) c FROM candlestick today LEFT JOIN LATERAL ( SELECT yesterday.close FROM candlestick yesterday WHERE yesterday.time < today.time ORDER BY yesterday.time DESC LIMIT 1 ) yesterday ON TRUE WHERE today.time > NOW() - _till) d; END; $$; CREATE OR REPLACE FUNCTION atr(_interval INT, _symbol TEXT, _till INTERVAL) RETURNS TABLE(date TIMESTAMP WITHOUT TIME ZONE, result NUMERIC(9,8), percent NUMERIC(9,8)) LANGUAGE plpgsql AS $$ DECLARE BEGIN RETURN QUERY WITH true_range AS ( SELECT * FROM tr(_symbol, _till) ) SELECT tr.date, avg.sma result, avg.sma_percent percent FROM true_range tr INNER JOIN LATERAL ( SELECT avg(lat.result) sma, avg(lat.percent) sma_percent FROM ( SELECT * FROM true_range inr WHERE inr.date <= tr.date ORDER BY inr.date DESC LIMIT _interval ) lat ) avg ON TRUE WHERE tr.date > NOW() - _till ORDER BY tr.date; END; $$; SELECT * FROM atr(14, 'BNBBTC', '4 HOURS') ORDER BY date
With TimescaleDB, my query response time is in the milliseconds, even with this huge amount of data.
Editor’s Note: To learn more about how TimescaleDB works with cryptocurrency and practice running your own analysis, check out our step-by-step tutorial. We used these instructions to analyze 4100+ cryptocurrencies, see historical trends, and answer questions.
Current Deployment & Future Plans
To develop my bot and all its capabilities, I used Node.js as my main programming language and various libraries: Cote to communicate between all my modules without overengineering, TensorFlow to train and deploy all my machine learning models, and tulind for technical indicator calculation, as well as various others.
I modified some to meet my needs and created some from scratch, including a candlestick recognition pattern, a level calculator for support/resistance, and Fibonacci retracement.
Today, I have a total of 55 markets (which are re-evaluated every month, based on trade simulation performance) that trade simultaneously 24/7; when all my strategy conditions are met, a trade is automatically opened. The bot respects my capital management system, which is basically to limit myself to 10 opened positions and only use 10% of the available capital at a given time. To keep track of the results of an open trade, I use dynamic Trailing Stop Loss and Trailing Take Profit.
The process of re-evaluating a market requires a second instance of my bot that runs in the background and uses my main strategy to simulate trades in all Bitcoin markets. When it detects that a market is doing well, based on the metrics I track, that market enters the main bot instance and starts live trading. The same applies to those that are performing poorly; as soon as the main instance of my bot detects things are going badly, the market is removed from the main instance and the second instance begins tracking it. If it improves, it's added back in.
As every developer likely knows all too well, the process of building a software is to always improve it. Right now, I’m trying to improve my capital management system using Kelly Criterion, as suggested by a user in my Reddit post (thanks, btw :)).
Getting started advice & resources
For my use case, I’ve found TimescaleDB is a powerful and solid choice: it’s fast with reliable ingest rates, efficiently stores and compresses a huge dataset in a way that’s manageable and cost-effective, and gives me real-time aggregation functionality.
The Timescale website, "using TimescaleDB" core documentation , and this blog post about about managing and processing huge time-series datasets is all pretty easy to understand and follow – and the TimescaleDB team is responsive and helpful (and they always show up in community discussions, like mine on Reddit).
It’s been easy and straightforward to scale, without adding any new technologies to the stack. And, as an SQL user, TimescaleDB adds very little maintenance overhead, especially compared to learning or maintaining a new database or language.
We’d like to thank Felipe for sharing his story, as well as for his work to evangelize the power of time-series data to developers everywhere. His success with this project is an amazing example of how we can use data to fuel real-world decisions – and we congratulate him on his success 🎉.
We’re always keen to feature new community projects and stories on our blog. If you have a story or project you’d like to share, reach out on Slack (@lacey butler), and we’ll go from there.
Additionally, if you’re looking for more ways to get involved and show your expertise, check out the Timescale Heroes program.