Pentaho are just about to launch version 6 of their Data Integration and Business Analytics Platform. This is the culmination of a journey that has embraced all of the major trends in business intelligence and predictive analytics that has featured over the last few years. Along the way from Pentaho 5 to 6 we have seen the integration of the worlds of big data and data warehousing, and the transformation of what at first looked like a techie amalgam of clever features into a manageable, secure, productive enterprise ready polished product.
My initial interest, which dates back several years now, was to look at an alternative to the established players in the BI market, where you paid a high price to obtain the basic functionality required of an enterprise BI solution, and to look at what a capable, but a bit unpolished open source product could offer. What I saw was a solid offering, and through the open source model a fantastic array of adapters to connect to data throughout, and as Pentaho bundled ETL into the product here was a real end-to-end capability with great potential. That concept of a full analytics pipeline capturing blending and presentation from across the enterprise, whether it be structured or unstructured and then made available to the full range of BI capability from reporting, to dashboards, from OLAP to data mining, and everything in-between has been one of the keys to the evolution of Pentaho. The open source model whereby Pentaho develop the core framework and the community builds and makes available the add-ons that cost so much for a proprietary vendor to develop and maintain has enabled Pentaho to outstrip the competition in terms of innovation, and breadth of coverage, whilst at the same time offering robustness and solidity. As Pentaho has developed they have increased not only the functionality being offered but also the ease of operation, the security, the manageability and the things that distinguish a good value competitive niche player from a serious alternative for the enterprise-level user.
One of the areas that Pentaho have been leading is in the integration of Big Data; Big Data is more than just about scale, it is about velocity, variety and volume of alternative data sources and types when compared to the stable structured nature of traditional BI data sources. Pentaho have been among the leaders in offering the capability to tame those disparate data sources and bring them to play in the same managed environment. The Pentaho vision of governed data delivery lies at the very heart of what distinguishes an enterprise BI platform from an assembly of databases. It is only when data is governed that the results from its analysis can be trusted. You must be able to trust the lineage of the data, the timeliness of the data, and that it adheres to the principles of being conformed to the master data, otherwise decisions may be impacted by the quality of the data.
As I said, in the early days I was interested in Pentaho as an alternative to the established vendors, where you felt that the majority of the costs were supporting not the technology but the splendour of their head offices and the latest models of BMWs driven by their sales force. As the product has matured, the rough edges that existed have been refined, and the current offering is the blend of innovation, leading edge technology and the maturity and ease of use required to support large-scale enterprise use.
The rate of technical innovation continues with Pentaho integrating with the Big Data ecosystem, and keeping in lock step with the new releases of the major distributions, so maybe no longer such headline grabbing new features but still new features of real value, for instance the ability to blend and virtualise data sets on the fly, greater support for data blending at massive scale and collaborative data discovery. But the greatest improvements are now in the features required to break into the big corporate market, with things like data lineage capabilities, enhanced security, enhanced systems monitoring and improved support for virtualised datasets.
To illustrate the extent to which Pentaho is now capable of being ready for use in the most demanding of enterprises I was interested to read that they have recently won the contract to supply the CERN labs, famous for their work on particle physics with an environment to support their complex mission. I have previously worked at CERN and know that they are always looking to have tools that offer capability, but also value for money, so I was really impressed to learn that Pentaho had been selected by them for their Advanced Information Services group. To give an idea of the scale of what this means, this group is there to provide the management information that underpins the activity of 15,000 demanding users, supporting over 650,000 report executions per year. The system that is being replaced is an amalgam of commercial and homemade solutions and I am sure that in selecting Pentaho they would have subjected it to a very rigorous process.
So I have now been following Pentaho for a number of years and feel that they have managed to stay ahead in terms of innovation, but are now also offering the polish that is expected of an enterprise ready solution. They really do demand to be taken seriously, and should be evaluated whatever scale of enterprise is looking at a leading BI platform.