Pentaho, Big Data and more – the real Pentaho capability revealed

Written By:
Published:
Content Copyright © 2012 Bloor. All Rights Reserved.

When Pentaho is mentioned it is nearly always in a context that is dominated by it being an open source offering, and statements about it being really cheap compared to much in the marketplace. Indeed, many people I have spoken to think it is so affordable that they have convinced themselves, and have tried to convince me, that it cannot be that capable. However, whenever I see other analysts write about Pentaho it is not in the cheap and cheerful group, but in the enterprise solutions category, and it does well. So what is the truth about Pentaho?

Firstly, it is true that it is not just a report writer; it has reports, it has dashboards, it does OLAP, and it does data mining, and what it offers is not just a token offering in each of those spaces, it does the things that the core of the market, the 80%, will need and it offers them at a price based on its open source heritage. However, that is not all that is on offer – it also does more. It has an ETL capability, which can connect a vast array of sources and feed them to an equally impressive array of destinations. It has to be remembered that Pentaho comes from the open source world, so the community has developed what is needed and has also made sure that what is on offer works, and works well.

So already you have to start to think about Pentaho quite a bit differently to the common perception and then, when you hear about their market strategy, you have to take another step back and reconsider. The Pentaho strategy is based on three strands; firstly there is the enterprise level BI capability, then there is the OEM offerings, where Pentaho is taken up by a wide range of applications that require robust, performant BI capability and use Pentaho embedded in their application. This is a considerable market presence that again emphasises that Pentaho is robust, its functional, and its performant. But the biggest surprise for those who underestimate Pentaho are their Big Data credentials.

What I (and I suspect the vast majority of us outside of the Pentaho community) had not realised is that they were early adopters in the Big Data trend, and they have used their expertise to tackle the issues that surround more wide spread adoption of Big Data. For those of us who are technical, but not uber geeks, so much about Big data and its key components, like Hadoop, have a quaint Heath Robinson air about them. There are few of the tools to enable those who are not technically gifted and fluent in Java to set up, run and manage a Big Data environment. As a consequence the skills that are required are hard to obtain, and in short supply, which means they are expensive. Pentaho have tackled those technical barriers, and have taken their core technology, Pentaho Kettle, their ETL environment, exposed it via a GUI, and thereby have given the capability of a far wider audience the ability to take structured and unstructured data and load it into Hadoop (of whatever flavour), NoSQL environments (the likes of MongoDB, Cassandra, and HBase), analytics databases (such as Netezza, Teradata, or Greenplum), to schedule and run jobs, with the results fed to Pentaho or other BI front end tools. All of which is pretty impressive, but it does not end there. The connectivity is native connectivity, the distribution to a Hadoop cluster is across the full cluster, and the GUI tools include a Visual MapReduce tool that eliminates the need to write MapReduce functions in Java. So Kettle can act as the data bus at the heart of your Big Data implementation, and will offer real productivity enhancement, and at an affordable price.

So this is a quick overview is the truth behind Pentaho. It is a very capable Enterprise BI environment that is robust, functional, and capable, it can tackle the vast majority of the things that most of us need a traditional BI tool to do and, just as importantly, it is also a great way into the world of Big Data. I have been very surprised about what I have learnt about Pentaho. I recognise that it is designed to focus on the key things that are required to provide all the elements of a BI suite, and I know there will be some who will find Pentaho does not offer what they need, but for the vast majority of us they do just what is needed and, as a consequence, you are not paying for things only needed by a small minority. I feel that it does more than enough to justify being included in any tool selection by the vast majority of the market, and it deserves the chance to be evaluated as a valid and valued enterprise solution.