Content Copyright © 2006 Bloor. All Rights Reserved.
As I sit here writing this on my Apple Mac I have to report that, having looked at the data mining services in Microsoft SQL Server, I am very impressed! Not always an easy thing for those of us who try to avoid Microsoft like to admit. Currently data mining is very much the preserve of only a relatively small group of either statistically trained modellers, or experienced and enthusiastic IT Business Intelligence professionals. What Microsoft has done is to make data mining available on the desktop to everyone. Now models can be built by anyone who is a VB developer, and the way that the queries can be run and presented will be familiar to anyone who uses Excel. But it must be emphasised that this is not a toy, this is fully-fledged corporate data mining, because the algorithms are embedded into the kernel of the database so that everything scales, and operation is fast and reliable.
The data mining services have evolved to now encompass 9 algorithms, so that a complete framework exists for building and deploying data mining. The data mining elements are integrated within the BI stack of SQL Server. As the models are in the BI stack they can be deployed anywhere, which means that it is also possible to model against data that is in the ETL pipe and not yet resident in the database, which allows near real-time modelling to take place. I believe that this is one of the greatest growth areas for people to exploit in the coming years.
The data mining add-in to Office 07 will offer a data-mining tab in Excel, enabling anyone to undertake analyse; for instance, press a button and the tool will tell you, for a given target, what are the influencers. The tool will find interesting rows, and it will find exceptions. All of this is done without the user having to be an expert in data mining or the subject area. It can find categories of data and provide information on the characteristics of those groupings.
There are a number of other features that will mean that the tool has very widespread applicability; for instance a “fill by example” function. Having found a pattern, you train the tool by filling in some values according to the discovered pattern, push a button and the tool does the rest-filling in all of the rows according to your examples. There is a scenario analysis feature which shows how a value in a given column would have to be adjusted to produce a given outcome, so that ‘what if’s’ can be modelled by anyone.
Over the last few years data mining has become a mainstream skill within most organisations, but such is the shortage of skilled individuals able to exploit the existing tools that salaries, and hence costs, have risen. Unfortunately not all of those using data mining are that interested in using the tool to address pressing business problems with the sort of productivity that is now required. They still play with the data mining tools for days perfecting models. These guys had now better watch out because their skills had better be honed towards showing people how to exploit the results, because simply building a model will now become available to everyone who uses SQL Server and Office, with speed and precision.