Managing the data asset with Pentaho

Written By:
Published:
Content Copyright © 2014 Bloor. All Rights Reserved.
Also posted on: Accessibility

We live in an age where data is one of our most valuable assets. With data we can achieve an understanding of the past, the present, and a good informed prediction of the future. In a world that is getting ever more complex and in which the rate of change is accelerating, that window of enlightenment is a business essential. It really can be the difference between success and failure and, in some extreme circumstances, can decide between life and death. And yet our management of that asset is all too often haphazard.

To maximize our return from the data that we have spent a lot of money capturing and storing we need to treat it with far more care than is often the case. If we leave data as we find it, its value will quickly degenerate. It needs to be nurtured, maintained, and wherever possible enhanced. However, once you start to talk about managed, controlled, and all the other words beloved within IT circles, the business sees cost, impediments to progress and are turned off.

What is required is an environment where IT can provide the infrastructural things that is knows are required, but to do so quickly and behind the scenes with minimal impact on the business, and where the results are then made available to the business to use at the point of business impact in a straightforward, non technical environment. The business wants to exploit the data, to integrate and blend all of the data it can find, to do all of those things without needing to worry about compatibility, data lineage, and things that, to them, should be taken care of and just happen.

This is the journey that Pentaho have embarked upon, turning that dream into a reality. As someone who works in this environment on a day to day basis, what is so refreshing is that although they know that this is the end game, they are not promising that they can get there in a single miraculous bound. This is not marketing vaporware. What Pentaho are starting on is a journey to put all of the elements required to achieve this in place. They have a roadmap that maps out the steps.

So what people are buying into is the capability to deliver managed datasets to the business, by which I mean that using a parameter-driven front end tool, the business selects the data that it wants; that request then goes into a managed environment previously set up by the IT professionals where they have sorted out the technical aspects of the data, they have profiled it, they have tagged it with all of the metadata tags required to associate it, they have sorted out the keys that will allow data from disparate parts of the business to be brought together. So control is where it should be, and freedom and responsiveness is where it should be, and the two domains are linked via automated processes that avoid arcane technical understanding to initiate.

The data refinery at the back end allows Pentaho to bring all of their expertise gained through the multiple real world implementations of data marts, data warehouses, and big data instances that their customers have built with Pentaho’s assistance and the lessons learnt. Within that environment are the technical tools required to undertake the integrating, blending, and modelling that allow the repositories to be built that show what data is where, where it comes from, where it goes to, what rules are applied can all be built up. Armed with that data the IT professionals can translate the mass of available data into coherent data sets ready for analysis and present them in an orderly fashion. All of the elements can be tested by IT to ensure that it flows together smoothly and then the business points and clicks, and states the ranges it is after and the tool can then bring the elements together using the managed data to ease the transitions into a coherent solution set. Much of the behind the scenes work can be scheduled to run unseen by the business but will ensure that during the office day they have refreshed, governed and timely data sets available when they want them.

Further, the idea is that the results of the refinement that is generating valuable new insights will not be held in disparate sand boxes, as seems to be the norm these days, but can be written back into the controlled environment to make it available to all.

I believe this to be a most worthy initiative. Coming from Pentaho you also know that it is going to be delivered and it will work. So watch this space they are heading in very much the right direction.