Hadoop: here today, gone tomorrow?

Written By:
Published:
Content Copyright © 2019 Bloor. All Rights Reserved.
Also posted on: Bloor blogs

By the time you read this MapR may be out of business, while Cloudera is in trouble. As the leading commercial supporters of Hadoop this raises questions about its future. Apologists for the technology have suggested that MapR and Cloudera are losing out to open source Hadoop clusters deployed in the cloud. There is some truth in this, but I don’t think it is the whole story.

I think there are two problems. The interesting thing about Hadoop – why it became popular in the first place – was that it enabled distributed parallelism through the use of MapReduce. But this has now been supplanted by Spark, which means that all that is left of the original Hadoop is the file system (HDFS: Hadoop distributed file system) and there’s nothing special about HDFS. Secondly, implementing Hadoop is complex. For example, Cloudera’s new combined (with HortonWorks) platform ships with more than 30 different open source technologies. Now, you probably won’t use all of these but compare implementing Hadoop along with, say, ten other products, versus a single, complete database environment from another vendor. Who needs that sort of complexity?

Well: nerds and people who want their CVs to look good.

So, what’s going to happen? To begin with there are different classes of Hadoop providers. There are pure plays such as MapR and Cloudera but there are also a bunch of other vendors that have built solutions on top of Hadoop/HDFS. Some of these are limited to Hadoop but many can run on a variety of platforms of which Hadoop is just one. My guess is that suppliers in the former category will move to support other underlying databases. HDFS will become just one choice amongst many, if it isn’t already.

So what about DIY Hadoop? The problem is a) the complexity and b) Spark. Yes, you can spin up a Hadoop cluster in the cloud but why would you choose HDFS over, say, Cassandra, MongoDB, Redis or Aerospike? And the truth is that there is no good reason.

The other thing that is going to happen is that a slew of other companies – and not just NoSQL vendors – will come out with migration offerings to encourage users to move away from Hadoop and onto their platforms. I know specifically of two companies who are planning just such campaigns, but I would be surprised if the total number of suppliers targeting this market isn’t closer to double figures. Not everybody will be convinced but certainly some users will move off Hadoop because of these efforts. This will reduce the pool of Hadoop users still further and will similarly impact on MapR/Cloudera revenues.

All I can see is a downward spiral for Hadoop. Fewer users and more constrained commercial support means less and slower on-going development, which means the technical gap between Hadoop and its competitors will only grow. If Hadoop was a patient, I would say that it is unwell, deteriorating and in danger of going into intensive care. Cloudera is going to have to do something special to keep it alive in anything more than the short term.

If you’re a Hadoop a user and want to know more about your options either contact us here at Bloor Research or post a question as a comment.