RedHat, Elastic, MongoDB, Github, Pivotal, Greenplum, and more: Open-source software has finally come of age, is reshaping the entire software industry as we know it. Here’s why.
Like a relentless Pac-Man, software continues to gobble up everything in its path. Yet today the proprietary software industry is both predator and prey, as another software development, distribution, and business model has finally come of age. Once long relegated to the hacker and academic communities, it is reshaping the software industry as we know it.
That model is open source. And while the model itself is nothing new, something very new is happening today. So far this year has seen several multi-billion dollar exits, ranging from $4 billion to $7.5 billion (edit: to $34 billion, including the October 28 announcement of IBM acquiring RedHat), with two in the last week alone. Considering we had only seen one multi-billion dollar exit (the 1999 RedHat IPO) prior to 2017, this trend is significant.
Given how open source is eating into the proprietary software market, this is something everyone in the industry, from developers to operators to investors, should be closely watching.
In this post we describe why open source is flourishing today. We start by reviewing the history of open source, how it has changed over the past several decades, and why it has finally matured. We then describe the strengths and challenges with this model, where it may and may not fit, and conclude with what this means for the future of the software industry.
2018: The biggest year for open source (so far)
Let’s review several facts from this year, which some have already pointed out is the biggest year for open source (so far).
In April, Pivotal Software had the largest IPO ever of an open-source company (at that time), resulting in a $3.9 billion market cap at the end of its first day (and a $4.7 billion market cap as of this article). In May, Salesforce acquired MuleSoft for $6.5 billion, the largest acquisition ever for an open-source company (and one that had IPO-ed just a year earlier).
Less than a month later, Microsoft announced their $7.5 billion acquisition of GitHub, a central repository for open-source projects, although not an open-source company itself. (This is especially remarkable considering that in 2002, a top Microsoft executive’s public stance held that “open source is an intellectual property destroyer. I can’t imagine something that could be worse than this for the software business and the intellectual-property business.” Yes, times have changed.)
The price of MongoDB, an open-source database company that IPO’ed last October 2017, has grown over 2x in a year, with a market cap over $3.5 billion. RedHat, the first company to build a successful business on top of open-source software, continues to thrive, with a market cap north of $20 billion and nearly $3 billion in revenue today.
Just last week, open-source public companies Cloudera and Hortonworks announced a blockbuster merger that will result in a combined company with a market cap over $5 billion and with over $700 million in annual revenue. Two days later, Elastic IPO’ed and almost doubled its market cap to nearly $5 billion on its first day (more on the Elastic story here). That was last Friday, October 5, 2018.
And there is still a deep bench of private open-source companies aiming to go public in the next several years, having already surpassed or nearing $100 million in annual revenue, including Hashicorp, Confluent, and Databricks. And of course there are even more open-source startups at an earlier stage (including your humble authors).
(Disclaimer: Our company TimescaleDB is an open-source time-series database startup and shares investors with several of these companies, including Elastic, MongoDB, Hortonworks, Confluent, and Databricks.)
The success of these companies often comes at the expense of existing proprietary software vendors: e.g., Cloudera and Hortonworks replacing Teradata and other legacy data warehousing systems; Confluent replacing traditional messaging middleware from companies like TIBCO; Elastic replacing Splunk and other log analysis tools.
There are even more examples, if we go further back in history: MySQL and PostgreSQL replacing Oracle and Microsoft SQL Server, RedHat/Linux replacing Microsoft Windows; Apache Webserver beating commercial web servers.
However they are succeeding, one thing is for certain: open-source companies today are thriving.
To understand why the open-source model is flourishing today, and how it is shaping the future of the software industry, let’s start by understanding where this whole movement came from.
Part 1: The early days of computing (1950s-1970s)
It turns out that open-source is not something new, but an idea as old as the computing industry.
In the early days (1950s-1960s), software and hardware was bundled together. Because most of the revenue was generated by hardware, source code for any software was made freely available. And because universities were often early adopters of this new technology, the academic culture of sharing knowledge extended to these early software programs.
This changed after two milestone events. In 1969, partly in response to an antitrust lawsuit by the US Department of Justice, IBM unbundled its software from hardware, and started to charge for software. And then, in the 1970s, a commission by the United States Congress determined that “computer programs…are proper subject matter of copyright” (source, chapter 1).
With these two events, we had the birth of the proprietary software industry, software licenses, and the now ever-present EULA. And for the hacker community that relished in sharing non-proprietary software, it seemed that their era was ending.
That all started to change in the 1980s.
Part 2: The first open-source licenses (1980s-1990s)
Richard Stallman believed that software deserved to be free (free as in “free speech”, not “free beer”, as the old saying goes). To preserve that freedom, he first launched the recursively-acronymed GNU Project in 1983 (which stands for “GNU’s Not Unix”), the Free Software Foundation in 1985, and one of the first open-source licenses of our modern era, the GNU General Public License in 1989 (now commonly referred to as the “GPL”).
This was the world in which a young twenty-one year old Finnish computer science student named Linus Torvalds found himself in 1991.
Linus Torvalds wanted to develop his own operating system and make it free. And he thought it would just be a hobby: “I’m doing a (free) operating system (just a hobby, won’t be big and professional like gnu)”. Because his new kernel heavily relied on GNU tools, he decided in 1992 to release it under the GPL.
(Historical side note: Linus often debated with Andy Tanenbaum, who wrote Minix and was the academic great-grandfather of my co-founder Prof. Mike Freedman. [Andy Tanenbaum -> Frans Kaashoek -> David Mazières -> Mike Freedman.] It’s a small world.)
Of course, this operating system was Linux, which over the following decades has blossomed into one of the most widely used pieces of software in the world.
The popularity of Linux also gave rise to “Linus’s Law”, which became a mantra for describing the value of open-source software for creating better code: “Given enough eyeballs, all bugs are shallow.”
Yet for some, GPL was still too restrictive thanks to a new idea it introduced called “Copyleft” (as opposed to standard copyright): a type of licensing that enabled free usage of software, but with restrictions on how it could be distributed to ensure that it would always be “free”.
For these software engineers, a new kind of open-source license was needed. So in 1998, the Open-Source Initiative (OSI) was founded. Soon after, in 1999, the Apache Foundation was created (around the nascent Apache Web Server project), which developed another major open-source license still popular today: the Apache License.
From the beginning, the goal of the OSI was to promote non-proprietary software using a pragmatic approach that “distinguished it from the philosophically- and politically-focused label ‘free software.” The new label, which they then coined, was “open source.”
And then, on August 11, 1999, a company that had built a business on top of Linux called RedHat went public, and earned the eighth-largest first-day gain in the history of Wall Street.
For many, RedHat was a revelation. The popularity of Linux, which largely took market share away from Microsoft Windows, showed how open source could eat into the proprietary software market; RedHat showed how one could sustain a business on top of that popularity. RedHat pioneered the original open source business model using a combination of subscription-based support, training, and integration services. It achieved a multi-billion dollar valuation on its first day of trading, and has grown massively since then: today worth over $20 billion with nearly $3 billion in revenue.
That got people’s attention. There might be a business in open source.
Part 3: The Modern Open-Source Era (2000s-today)
In the 2000s, the licenses improved, the open source projects proliferated, and the businesses became legitimate.
First the licenses got better: Apache License 2.0 (2004) (which provided better protection against patent infringement); GPLv3 (2007) (which closed loopholes around “tivoization” and patent-related agreements); AGPL (2007) (which closed the “application service provider”, or “cloud provider”, loophole).
In parallel came more exits: JBoss acquired by RedHat (over $350 million, 2006); XenSource acquired by Citrix ($500 million, 2007); Zimbra acquired by Yahoo! ($350 million, 2007); MySQL acquired by Sun ($1 billion, 2008), which led TechCrunch to exclaim, right in the article title, “Open Source Is A Legitimate Business Model”; SpringSource acquired by VMWare ($420 million, 2009).
This era also saw the launch of many new open-source technologies: Wordpress (2003), Firefox (2003), Nginx (2004), Hadoop (2006), Cassandra (2008), Redis (2009), MongoDB (2009), Elasticsearch (2010), Kafka (2011), Prometheus (2012), Docker (2013), CoreOS (2013, later acquired by RedHat for $250M), Spark (2014), Kubernetes (2014), and many others.
Those years also saw the founding of a new generation of open-source companies, including MongoDB (2007), Cloudera (2008), Hortonworks (2011), Elastic (2012). These are also the first wave of open-source companies to go public post-RedHat. Elastic, as pointed out earlier, IPO’ed just last week (at time of publishing).
Which brings us to today.
Why open source is flourishing today
We still have one looming, unanswered question: why is open-source thriving today, especially when compared to proprietary software?
At first, the primary reason for adopting open-source was cost (it was free) and the potential for customizability (one could, at least in theory, change the source code as necessary). Then came Linus’s law stating that the open-source model led to higher quality code (“all bugs are shallow”).
Today there are many reasons to adopt open-source technology. It’s still free: which means you can test and deploy without having to get budgetary sign-off. But it’s more than freemium: open source means eliminating vendor lock-in and instead being able to rely on a community for support. If there is a company sponsoring the project, even if that company goes under, you can continue to use the technology. (As one example, once RethinkDB the company shut down, the open-source project joined the Linux Foundation, and is enjoying a second life.)
Open-source projects are cutting edge, and today many are also enterprise-ready, thanks in large part to its collaborative model. You can fix any bugs yourself. By contributing those fixes back to the main project, you can give back to and feel part of a community. Adopting open-source technology also helps with both hiring and personal development: engineers would rather learn a foundational technology than one vendor’s proprietary system.
There are also benefits to making your own software open-source. You get to provide a new technology and drive the state-of-the-art in a manner that encourages dialogue, collaboration, and feedback. You get to grow a community.
If you are an individual, your project (or your contributions to someone else’s open-source project) will get you visibility and help your long-term career. If you want, you’ll travel the world, deliver talks on your work, and meet like-minded people along the way.
If you are a company, open source means your outreach efforts can focus on educating developers rather than sales, which will enable you to reach the end user directly, without needing to navigate complex org charts. Your sales motion can then focus on businesses who already know your technology and are probably even using it, i.e., an upsell instead of a cold sell. As you grow, your community grows, and you can benefit from the open-source economies of scale (e.g., tutorials, blog posts, connectors, etc.). In general, open source is a more fun and particularly more efficient go-to-market model than traditional proprietary software.
But developing open-source software is not without its challenges.
The challenges with the open-source model
As we can see from its history, there has always been a tension in open source: e.g., philosophical licenses focused on preserving “freedom” vs. permissive licenses taking a pragmatic approach.
But there are two larger tensions today: (1) how to balance openness with sustainability and (2) how to manage the transition to the cloud.
Challenge #1: Balancing openness with sustainability
In the world of proprietary software, the biggest challenge is often market apathy: not enough people finding your software useful.
Yet in the world of open source, the opposite is also a threat. You can become the victim of your own success.
As more developers use your technology, you may find yourself having to dedicate more time to maintenance and development, with very little pay. This leads some, like the developer of BoltDB (a project with over 9,000 Github stars), to throw in the towel and archive their project. Open-source developer burnout is real.
This is because all the “free labor” adds up. A 2001 study found that it would have cost $1 billion (in year 2000 dollars) just to develop the GNU/Linux code base alone. And that was a study from almost two decades ago. That number has only increased.
Some think that open-source foundations should fund open-source projects directly through stipends or fellowships. But these foundations are not money-making machines: e.g., the Apache Foundation barely made $500,000 in 2011. Perhaps these foundations can fund individual developers, but are extremely unlikely to be able to support open-source projects at scale.
Another option is relying on large tech companies to support open-source projects directly. One successful example of this is how Google has supported the development of Kubernetes and sponsored the Cloud Native Computing Foundation (CNCF). Yet then the viability of the open-source project becomes subject to the whims (and the viability) of the larger company, and not the users. For example: What if Google corporate decides it no longer wants to contribute financially to the CNCF for strategic reasons? Will projects like Kubernetes and Prometheus (itself a project that had to be offloaded to the CNCF when its initial host company, SoundCloud, faced business problems) be able to thrive without large corporate sponsors to support it?
We need another option, something that allows open-source projects to become self sustaining.
How does an open-source project become a self-sustaining independent business? At a high-level, it typically involves some combination of selling support, offering some proprietary software on top of the open source (known as the “open-core” approach), and providing a managed cloud service. However, doing justice to this question requires its own blog post (which will be coming soon).
But given the success of RedHat, Cloudera, Hortonworks, MongoDB, Elastic, Hashicorp, Confluent, Databricks, etc. (again, all companies, some of whom are public, who have far surpassed or are nearing $100M in revenue), it’s clear that making money in open-source is very possible.
Challenge #2: Managing the transition to the cloud
It is no secret that more and more computing workloads are moving from on-premise to the fully-managed offerings in the cloud. The cloud is enticing for several reasons: it offloads operational responsibilities to someone else, it converts a large up-front CAPEX into smaller amounts of recurring OPEX, it allows for infrastructure elasticity that can mirror business needs, and it allows anyone to get started in minutes using just a credit card.
Yet while the cloud is a relevant trend to every software vendor, it is a particularly existential threat to the open source model. Because open source is by its nature free and source-available, the public clouds (e.g., AWS, Azure, GCP) have been quite effective at distributing and monetizing open-source software without meaningfully contributing back to open-source projects.
So this is still unclear: as the public clouds become more powerful, how will their behavior affect the open-source model?
That said, despite the best efforts of the public cloud vendors to lock you into proprietary services, the desire to avoid that cloud lock-in likely will continue to drive users to open-source options. But time will either prove that hypothesis right or wrong.
As one example, the success of Elastic, in spite of AWS offering its own competing Elasticsearch service, is promising and shows signs that managing the transition to the cloud is feasible for an open-source business.
Not for everyone
But is open source the right model for all software? The answer, it turns out, is not that simple.
It’s no accident that most open-source applications so far have been software infrastructure, like operating systems and databases. Software infrastructure touches a wide audience, caters to developers and operations, and largely works behinds the scene, making it a strong contender for this model.
(So if you are working on a proprietary piece of software infrastructure, you should seriously consider making it open source.)
But there are also quite a few successful end-user applications that are open-source: e.g., Firefox for web browsing, WordPress for content management and blogging, GIMP (launched in 1996 by the same team that is now behind another open-source startup, CockroachDB) for image manipulation, Grafana and Superset for dashboarding.
We also have Android, an open-source project but with many proprietary components, which has become the most popular mobile platform (surpassing not just Apple iOS, but also beating out the various mobile efforts of Microsoft and Nokia). While the extent of how open-source Android is today can be debated (e.g., it lacks the vibrant community often seen in other projects, and is still very much controlled by Google), much of its success can be traced to the fact that it was open-sourced to start. For example, being open source allowed Android to garner support from various companies in the mobile ecosystem, from OEMs to Carriers, which quickly drove a large business community invested in its success.
But should all applications be open source (e.g., even mobile apps)? One could argue that, in general, open source is a great candidate for software designed for developers and operations, but not for software for consumers. (Software for businesses is likely somewhere in the middle.) But still, only time will tell.
In addition, given the success of some open-source business models (like open-core, mentioned above) that offer a combination of open source and proprietary software, it is clear that proprietary software will never completely go away.
But soon, proprietary software will find itself in the back seat.
The future of software
Today, open-source software is thriving because it is free, cutting edge, often enterprise-ready, customizable, produces higher-quality code, eliminates vendor lock-in, helps with hiring, and many other reasons.
Open source is also flourishing because it has been simmering under the surface for years, slowly maturing and becoming more accessible.
But the heart of its success is that open source builds communities. And collaborative communities don’t just produce fewer bugs, but also better things.
Collaboration is how we got here in the first place: so many of the building blocks of computing, from programming languages to the Internet, developed because of knowledge sharing across the industry.
Open source is not without its challenges: for one, striking a balance between free software and a sustainable business is difficult. Managing the transition to the cloud, especially in light of the behavior of the public cloud vendors, is also a problem.
But one thing is clear: unless proprietary software businesses adapt, they will lose. Some, like Microsoft, have already recognized the shift. But there will be many for whom it will soon be game over.
For the rest of us, the future is open.
This post was built on the shoulders of giants. A big thank you to all of the following people who have shared their open-source wisdom and time with myself and my co-founder Mike over the past few years: Harry Weller (RIP), Forest Baskett, Greg Papadopoulos, and the rest of the team at NEA; Peter Fenton, Chetan Puttagunta, and Eric Vishria and the rest of the team at Benchmark; Rob Bearden, Shaun Connolly, Herb Cunitz, Mitch Ferguson, Jeff Miller, and the rest of the Hortonworks diaspora; Gaurav Gupta from Elastic; Ion Stoica and Patrick Wendell from Databricks; Jay Kreps from Confluent; Spencer Kimball from CockroachDB; and so many, many more. We are honored to have such great peers in our industry.
About the authors: the team at Timescale are developers of TimescaleDB, the first open-source time-series database to scale for fast ingest and complex queries while natively supporting full SQL. Because time is a critical dimension along which data is measured, TimescaleDB enables developers and organizations to harness more of its power to analyze the past, understand the present, and predict the future. TimescaleDB is deployed in production all around the world in a variety of industries including Telecom, Oil & Gas, Utilities, Manufacturing, Logistics, Retail, Media & Entertainment, and more. Based in New York and Stockholm, TimescaleDB is backed by Benchmark Capital, New Enterprise Associates, Two Sigma Ventures, and other leading investors.