Content Copyright © 2018 Bloor. All Rights Reserved.
Both the IT industry and businesses undergoing digital transformation are convinced of the very significant contribution that machine learning, and other forms of artificial intelligence can contribute to future competitiveness. This is a good thing because a good model, fed with good data can indeed contribute actionable insight, which will improve decision-making and improve how we do many activities. However, the days of being able to build a model, train it, validate with a set from production data and leave it to run for weeks on end is over, models need to be refreshed on a frequent basis. Unfortunately to date anything to do with the data side of advanced analytics has always been highly manual and laborious, so it tends to be rather neglected whilst we forge on with developing new interesting models. But if our models drift away from being accurate people will lose faith in them. Equally if our models are so opaque that no-one can really understand how they operate then the business is reluctant to embrace them as a central plank of how they do business. Which makes the whole area of validation, and keeping that validation current and relevant a hot topic.
To put this into context, external research highlights that few models are reviewed and updated daily, only about a third get a weekly refresh, and it is only just over half who are refreshed even on a monthly cycle. This is a problem that will grow and will threaten the faith of the business in the usefulness of the technology.
The new model management capability addresses this in three key areas:
- To get models into production faster, whilst still ensuring they have been validated against over fitting to just the test data used to train. The tool aligns the last mile of data preparation and cleansing the data to the specifics of the algorithm in use. Instead of this having to be a code driven activity there is now a GUI where one who is unable to or prefers not to code can adjust the model parameters.
- Maximising accuracy whilst in production, evaluation statistic with reports and visualisations are produced. These will be highlighted when the accuracy starts to slip from the desired outcomes. The tool will also support A B testing with the current code and the challenger modifications being run against one another, with rapid results, helping the models to be streamlined whilst in flight with far less overhead than current manual comparisons.
- Collaboration and governing models at scale. As I said, one of the problems is explaining how a model works. I have in the past used neural networks to undertake churn analysis for mobile phone companies we could build highly accurate models that identified the groups that contained the most likely to churn, and the results were proven to be right, but as we could not clearly explain why the results were arrived at we added to the frustration of the churn analysts rather than addressing them. The Hitachi Machine learning model management addresses much that was lacking when I did this. They provide a clear data lineage showing what data is being used, and the steps that it goes through prior to deployment in the model. The lineage also includes the type of model and its parameters and coefficients. Being clearly documented those data pipelines become shared assets, available to be used collaboratively across the enterprise. This is an enterprise grade solution capable of working at scale, with the security and availability expected of an enterprise solution.
This is a very significant step forward in the growing maturity of machine learning solutions and Hitachi is to be applauded for producing it. It is currently available via the Pentaho marketplace unsupported. Later it is likely to be embedded into Pentaho Data Integration, at which time it will be generally available and fully supported, which is very exciting.