With all the focus on AWS re:Invent, I missed an announcement back in October about the proposed merger of Cloudera and Hortonworks. With the two big beasts in the Hadoop jungle to merge, for me this is a sure sign of the cooling and the commoditisation of the “big data” platform market.
The financial data for these companies makes interesting reading:
- Cloudera – Q4 2018 revenues were $103.5 million, with a loss of $45.7 million (source)
- Hortonworks – Q2 2018 revenues were $86.3 million, with a loss of $42.0 million (source)
Both companies are growing revenues at circa 40-50% per annum, and slowly reducing their losses.
There was a Hadoop ecosystem frenzy a few years ago, with a lot of money spent by big players like banks etc. Interestingly, together Cloudera and Hortonworks have more than 120 customers spending over $1 million per annum.
Today’s market feels quite different. It’s moved away from MapReduce and HBase towards Spark/Presto etc on top of the Apache Hadoop project, and these vendors have supported and driven that trend as they move to become “data platforms” rather than just Hadoop distributions. But this trend is now being partially overtaken as well by much more easily and cheaply scaled cloud-based PaaS services, such as Spark-aaS offerings like AWS Glue. Also cloud instance and serverless sizes continue to become more and more powerful. This solution options nibbles away at the lower end of the “big data” use cases. Why have all the hassles of a cluster when a set of parallelised AWS Lambda functions (managed by AWS Step Functions) will do the job?
Machine learning workloads obviously benefit from and need distributed processing models. However, when I can use Amazon SageMaker to train models and scale it across multiple nodes on demand, I don’t really want the undifferentiated heavy lifting or costs of a self-managed cluster. I want Amazon to take care of that for me behind the scenes.
It’ll be interesting to see how the merger plays out, including how it affects MapR’s market strategy…