Cryptocurrencies are fueling a modern day gold rush. Can data help us better understand this evolving market?

Original post by TimescaleDB intern Sarah Pan

[Update: Thank you everyone for making this #1 on Hacker News! Read the HN discussion here.]

Lately it seems like money has been growing on trees.

With trade volumes reaching billions of dollars a day and market caps hitting tens of billions of dollars, it’s no wonder that cryptocurrencies fuel the gold rush of the modern day.

We live in the age of digital currencies, with cryptocurrencies birthed within the decade. Yet already, there are more than a thousand cryptocurrencies in the market and an initial coin offering (ICO) almost daily.

As we embrace this new, proliferous market, it’s important that we try to understand what’s going on. There are many risks to observe at both the micro-level (e.g., personal investments) and macro-level (e.g., prevention of market crashes and major loss of capital). That’s where we come in.

We’re data people. Specifically, we’re the developers of TimescaleDB, a new open source time-series database built up from PostgreSQL. And we thought it would be insightful (and fun) to analyze the cryptocurrency market using PostgreSQL and TimescaleDB (plus R for data visualization).

For this analysis*, we looked at historical OHLCV price data on over 1200 cryptocurrencies (as of 6/26/2017; courtesy of CryptoCompare). While our current dataset represents only a daily record of rates, TimescaleDB scales easily to much finer-grained historical data. With the constant influx of new coins and exchanges, TimescaleDB can provide a reliable foundation for time-series data in the cryptocurrency market.

Here’s what you should take away from this post:

  • Several high-level insights into the cryptocurrency market
  • A better understanding of how TimescaleDB + PostgreSQL make time-series data analysis easier
  • Instructions on how to load this dataset yourself and draw your own insights (and perhaps find your own arbitrage opportunities!)

*Please note that nothing in this analysis should be construed as financial advice. If you’d like to draw your own conclusions, here are instructions to install TimescaleDB and download this cryptocurrency dataset.

So if you had invested $100 in Bitcoin 7 years ago, it would be worth…

Let’s start with some good old-fashioned FOMO. If you know anything about cryptocurrencies, you’ve probably heard of Bitcoin, the “granddaddy” of all cryptocurrencies. Turns out that if you had invested $100 in Bitcoin in July 2010, it would be worth over $5,000,000 today.

Bitcoin has had a pretty nice run since then (although taking a small dip recently):

-- BTC USD prices by two week intervals 
SELECT time_bucket('14 days', time) as period,
       last(closing_price, time) AS last_closing_price
FROM btc_prices
WHERE currency_code = 'USD'
GROUP BY period
ORDER BY period;
BTC closing price in USD over the last seven years

Using PostgreSQL, we’ve queried BTC’s prices at 2-week intervals, analyzing the rates for USD exchanges. (Note that “time_bucket” and “last” in this query are special TimescaleDB time-series data analysis functions not in PostgreSQL.)

But hopefully you didn’t buy in February 2014…

It hasn’t exactly been a smooth ride for BTC. Let’s hone in on the day-by-day volatility of BTC. Here we calculate daily returns using the power of PostgreSQL window functions:

-- Daily BTC returns by day
SELECT time,
       closing_price / lead(closing_price) over prices AS daily_factor
FROM (
   SELECT time,
          closing_price
   FROM btc_prices
   WHERE currency_code = 'USD'
   GROUP BY 1,2
) sub window prices AS (ORDER BY time DESC);
BTC daily return (as a factor of the previous day’s rate) over the last seven years

As a relatively new market, bitcoin prices are notably subject to volatile fluctuations. While a steady increase in price marks the success of BTC, the highest spike occurred in early 2014. If we zoom in on 2014, we notice that the jump occurred specifically between February and March of 2014. For those who invested at the peak of this market, the price soon stabilized, forcing investors who bought then to hold for a long time to see returns.

BTC daily return (as a factor of the previous day’s rate) over 2014

Goodbye China, hello Japan

The cryptocurrency market is global. When looking at trade volumes by currency, we noticed something interesting:

-- BTC trading volumes by currency 
SELECT time_bucket('14 days', time) as period,
       currency_code,
       sum(volume_btc)
FROM btc_prices
GROUP BY currency_code, period
ORDER BY period;
Volume of BTC in different fiat currencies over the last seven years (in 14 day intervals, stacked)

The year 2014 saw a minor jump for Bitcoin rates in China, presumably caused by the devaluation of the yuan and weakening domestic stock market. This was followed by a subsequent boom in 2016 and early 2017, as Chinese volume dominated Bitcoin trades.

Volume of BTC in CNY over the last year (in 14 day intervals)

Within a few months, the volumes dropped dramatically.

Why? This is where we had to step out of the numbers and do some old-fashioned research. (And what we found shows how you can’t just rely on quantitative data when trying to understand this market.)

In early 2017 the People’s Bank of China began reinforcing regulations and legal liabilities for risky cryptocurrency exchanges. By February, two of the largest Chinese exchanges (OKCoin and Huobi.com) had suspended withdrawals and by mid-2017, Chinese transactions had dried up. From there, Japan became the leader in bitcoin transactions by volume, even going so far as to recognizing bitcoin as legal currency in April 2017.

Volume of BTC in all currencies after BTC/CNY dropped in 2017. BTC/JPY is now volume leader. (In 14 day intervals, stacked).

Now, if you had invested $100 in ETH in January 2017…

Don’t worry if you didn’t hop onto the Bitcoin train in 2010. As volatile as Bitcoin has been, some would argue that Ethereum has been a crazier ride (and the recent “correction” shows it). Let’s look at the Ethereum price over time in Bitcoin (as it’s normally quoted):

-- ETH prices in BTC by two week intervals
SELECT time_bucket('14 days', c.time) as period,
       last(c.closing_price, c.time) AS last_closing_price_in_btc
FROM crypto_prices c
WHERE c.currency_code = 'ETH'
GROUP BY period
ORDER BY period;
Last closing price of ETH in BTC over the last three years

But as we know, Bitcoin itself has been fairly volatile, which renders the above chart less useful. So let’s look at ETH prices in fiat currencies, using each day’s BTC to fiat exchange rates. (Taking advantage of Postgres JOINs and some fancy filters):

-- ETH prices in BTC, USD, EUR, and CNY by two week intervals
SELECT time_bucket('14 days', c.time) as period,
       last(c.closing_price, c.time) AS last_closing_price_in_btc,
       last(c.closing_price, c.time) * last(b.closing_price, c.time) filter (WHERE b.currency_code = 'USD') AS last_closing_price_in_usd,
       last(c.closing_price, c.time) * last(b.closing_price, c.time) filter (WHERE b.currency_code = 'EUR') AS last_closing_price_in_eur,
       last(c.closing_price, c.time) * last(b.closing_price, c.time) filter (WHERE b.currency_code = 'CNY') AS last_closing_price_in_cny
FROM crypto_prices c JOIN btc_prices b ON time_bucket('1 day', c.time) = time_bucket('1 day', b.time)
WHERE c.currency_code = 'ETH'
GROUP BY period
ORDER BY period;
Last closing price of ETH in three different fiat currencies over the last three years

In its first year, ETH surpassed any yearly BTC growth rate in all of BTC history — a hefty 530% surge in average closing price from the previous year marked a good start. Cumulatively, the growth has since fallen to 200% going from 2016 to 2017, though still an impressively high rate for any other asset. And within the last half year, ETH prices have increased by 3000%. So, if you had invested $100 in ETH in January 2017 (less than 7 months ago), it would be worth over $3,000 today.

Projecting the price of ETH in these stable currencies (USD, EUR, CNY), it appears that the trend lines remain consistent between the three fiat monies. A clear progression is apparent in the steep uprise within the last six months and trends reflect a massive growth for the coin when quoted in all currencies, except BTC. Relative to the fiat charts, the ETH/BTC chart is quite unstable due to the fluctuating price of BTC over the years. As a result, the representation of ETH by BTC price inflates the variability of ETH. Clearly BTC is still too immature to be considered a base currency.

What about the 1200 other cryptocurrencies?

With that brief examination of BTC and ETH trends, hopefully you have more context into the hectic world of cryptocurrencies. So what do we do with the other 1200 cryptocurrencies?

Well first, let’s use our dataset to trace the lineage of these cryptocurrencies:

(Disclaimer: our dataset represents when we first have recorded data, which may not necessarily correspond to the ICO.)

-- Currencies sorted by first time we have data for them
SELECT ci.currency_code, min(c.time)
FROM currency_info ci JOIN crypto_prices c ON ci.currency_code = c.currency_code
AND c.closing_price > 0
GROUP BY ci.currency_code
ORDER BY min(c.time) DESC;
currency_code |          min
---------------+------------------------
 CIR           | 2017-06-26 20:00:00+00
 MDC           | 2017-06-26 20:00:00+00
 WBTC          | 2017-06-26 20:00:00+00
 NET           | 2017-06-26 20:00:00+00
 NAS2          | 2017-06-26 20:00:00+00
 TPAY          | 2017-06-26 20:00:00+00
 MRSA          | 2017-06-26 20:00:00+00
 XCI           | 2017-06-26 20:00:00+00
 PAY           | 2017-06-19 20:00:00+00
 SNM           | 2017-06-18 20:00:00+00
 LGD           | 2017-06-17 20:00:00+00
 SNT           | 2017-06-16 20:00:00+00
 IOT           | 2017-06-12 20:00:00+00
 QRL           | 2017-06-09 20:00:00+00
 MGO           | 2017-06-09 20:00:00+00
 CFI           | 2017-06-08 20:00:00+00
 VERI          | 2017-06-07 20:00:00+00
 EQT           | 2017-06-07 20:00:00+00
 ZEN           | 2017-06-05 20:00:00+00
 BAT           | 2017-05-31 20:00:00+00

It’s an evolving market. And one with no clear ceiling, as we can see when we query the number of new cryptocurrencies by day. Above are just the most recent 20 records, showing how many new currencies amass each week.

Here’s a look at that same day, but counting the number of new currencies with data each day:

-- Number of new currencies with data sorted by day
SELECT day, COUNT(code)
FROM (
   SELECT min(c.time) AS day, ci.currency_code AS code
   FROM currency_info ci JOIN crypto_prices c ON ci.currency_code = c.currency_code
   AND c.closing_price > 0
   GROUP BY ci.currency_code
   ORDER BY min(c.time)
)a
GROUP BY day 
ORDER BY day DESC;
Number of new cryptocurrencies in the dataset, by day, over the last four years
Number of new cryptocurrencies in the dataset, by day, in 2017
     day           | count
------------------------+-------
 2017-06-26 20:00:00+00 |     8
 2017-06-19 20:00:00+00 |     1
 2017-06-18 20:00:00+00 |     1
 2017-06-17 20:00:00+00 |     1
 2017-06-16 20:00:00+00 |     1
 2017-06-12 20:00:00+00 |     1
 2017-06-09 20:00:00+00 |     2
 2017-06-08 20:00:00+00 |     1
 2017-06-07 20:00:00+00 |     2
 2017-06-05 20:00:00+00 |     1
 2017-05-31 20:00:00+00 |     5
 2017-05-28 20:00:00+00 |    29
 2017-05-27 20:00:00+00 |    13
 2017-05-26 20:00:00+00 |    32
 2017-05-25 20:00:00+00 |   303
 2017-05-16 20:00:00+00 |     1
 2017-05-15 20:00:00+00 |     7

When we query each of the currencies by their first set of data (to track its “age”), it’s clear that the market is not simply for investors, but also for creators of these digital assets. Most recently, a flood of new coins entered our dataset during May 25–28, amounting to over 300 new cryptocurrency records in less than a week. (Of course, these dates reflect when our data source first had price data for the currencies, which may not correspond to the ICO.)

Who’s at the head of the cryptocurrency long tail?

There are so many cryptocurrencies that it becomes hard to separate the good ones from the noise. How do we identify which ones worth focusing on? Here’s one metric: total trade volume over the past week.

-- 1200+ crypto currencies by total transaction volume (in btc) over the last month
SELECT 'BTC' as currency_code,
       sum(b.volume_currency) as total_volume_in_usd
FROM btc_prices b
WHERE b.currency_code = 'USD'
AND now() - date(b.time) < INTERVAL '8 day'
GROUP BY b.currency_code
UNION
SELECT c.currency_code as currency_code,
       sum(c.volume_btc) * avg(b.closing_price) as total_volume_in_usd
FROM crypto_prices c JOIN btc_prices b ON date(c.time) = date(b.time)
WHERE c.volume_btc > 0
AND b.currency_code = 'USD'
AND now() - date(b.time) < INTERVAL '8 day'
AND now() - date(c.time) < INTERVAL '8 day'
GROUP BY c.currency_code
ORDER BY total_volume_in_usd DESC;
currency_code | total_volume_in_usd
---------------+---------------------
 BTC           |       2040879023.54
 ETH           |    1617388472.94011
 LTC           |    287613541.293571
 XRP           |    269417667.514443
 ETC           |    165712729.612886
 ANS           |      126377042.5269
 SC            |    111623857.796786
 DASH          |    86875922.3588143
 ZEC           |    78836728.2129428
 BTS           |    69459051.5958428
Total transaction volume in USD for top 10 currencies over the past week (ranked by volume)

Quick note on what this query is doing: The BTC and crypto-currency data lives in separate tables. So we have to UNION the two queries. Also, as we established earlier, we want to quote volumes in a fiat currency (e.g., USD) and not BTC. So the second half of the query joins with the BTC table to convert BTC to USD.

Top cryptocurrencies (measuring by transaction volume) are (not surprisingly) Bitcoin and Ethereum. But after that, seems like Litecoin (LTC), Ripple (XRP), and Ethereum Classic (ETC) are not far off. As a five-year old coin, Litecoin is almost identical to Bitcoin and is often considered a key player in the market. Meanwhile, Ripple targets a more specific audience as a banking coin in the global commerce arena, also showing promise as a progressively dominant coin. Interestingly in the top 5 for our query, both ETH and ETC make appearances, suggesting a major shift towards Ethereum in the market.

What are the most profitable cryptocurrencies?

Another way to sift through the long tail of cryptocurrencies is by profitability (e.g., as measured by total daily return). Our data contains a set of prices for over 1200 cryptocurrencies. If we hone in on the highest increase in rate by day, we can see which cryptocurrencies lead the daily market.

-- Top crypto by daily return, by day
SELECT  time,
        last(currency_code,daily_factor),
        max(daily_factor)
FROM (
    SELECT currency_code,
    time,
    closing_price,
    lead(closing_price) over (partition BY currency_code ORDER BY time DESC) AS prev_day_closing_price,
    closing_price / lead(closing_price) over (partition BY currency_code ORDER BY time DESC) AS daily_factor
FROM crypto_prices) q
GROUP BY time
ORDER BY time DESC;
 time          |  last  |   daily_return
------------------------+--------+------------------
 2017-06-26 20:00:00+00 | CIN    | 40.1428571428571
 2017-06-25 20:00:00+00 | KC     |               38
 2017-06-24 20:00:00+00 | VOYA   | 14.2747252747253
 2017-06-23 20:00:00+00 | PAY    | 3.18506315211422
 2017-06-22 20:00:00+00 | YOVI   | 119.607843137255
 2017-06-21 20:00:00+00 | ION    | 7.97665369649805
 2017-06-20 20:00:00+00 | TES    | 5.25157232704403
 2017-06-19 20:00:00+00 | KNC    |           150000
 2017-06-18 20:00:00+00 | ZNY    | 22.5217391304348
 2017-06-17 20:00:00+00 | YOVI   | 22.0590746115759
 2017-06-16 20:00:00+00 | LTD    | 9.50207468879668
 2017-06-15 20:00:00+00 | AMIS   | 168758.782201405
 2017-06-14 20:00:00+00 | JANE   |                6
 2017-06-13 20:00:00+00 | YOVI   | 690.636254501801
 2017-06-12 20:00:00+00 | U      | 5.21452145214522
 2017-06-11 20:00:00+00 | JANE   |                6
 2017-06-10 20:00:00+00 | WGO    | 3.58744394618834
 2017-06-09 20:00:00+00 | BNT    |             5000
 2017-06-08 20:00:00+00 | XNC    | 52.7704485488127
 2017-06-07 20:00:00+00 | CBD    | 14.3243243243243
 2017-06-06 20:00:00+00 | CC     |               72
 2017-06-05 20:00:00+00 | BLAZR  | 7.38461538461538
 2017-06-04 20:00:00+00 | GREXIT | 13.0833333333333
 2017-06-03 20:00:00+00 | EPY    | 4.29880478087649
 2017-06-02 20:00:00+00 | YOVI   | 1257.67790262172
 2017-06-01 20:00:00+00 | FCN    | 8.57142857142857
 2017-05-31 20:00:00+00 | EPY    | 348.611111111111
 2017-05-30 20:00:00+00 | BST    | 14.1441860465116
 2017-05-29 20:00:00+00 | FCN    | 45.3086419753086
 2017-05-28 20:00:00+00 | NOO    |  56536.231884058

Here we identify the cryptocurrency with the highest total daily return, by day, going backwards in time. (To do that, we again use a window function to calculate daily returns, and then use the TimescaleDB last function to find the cryptocurrency with the highest return for that day.)

Our output for the last three months shows a numeric lead by AMIS (168x on 6/15), which appears as the cryptocurrency with the top increase for 15 distinct days. However, if we look more closely at AMIS’ rates, we realize that this high increase is also due to high fluctuation rates: AMIS tends to drop back to a closing price of zero after each increase.

Closing price of AMIS over the last year

Likewise, the cryptocurrency YOVI appears 3 times in our list of daily leads but has a similarly unreliable trend like AMIS:

Closing price of YOVI over the last year

While both trends are rather unstable, they show more promise than ETH’s down-sloping first year (2015):

Closing price of ETH in 2015

(Repeat Disclaimer: TimescaleDB does not endorse any of these cryptocurrencies and is not liable for investments that you make / losses you may incur.)

So money grows on… Merkle Trees?

Here we drew some conclusions from public cryptocurrency datasets, highlighting the power of both PostgreSQL and TimescaleDB. Yet we should remember that the cryptocurrency market will inevitably be different next month, week, even day.

But if you’d like to explore this dataset and draw your own conclusions, here are the instructions for downloading it (and installing TimescaleDB).

If you’d like to learn more about TimescaleDB, and how it makes PostgreSQL scalable for time-series data, we’d recommend this technical post.


If you have any follow up questions or comments, we welcome them via email or Twitter.

And if you’d like to learn more about TimescaleDB, please check out our GitHub (stars appreciated), and let us know how we can help.