The future of data mining

Written By:
Published:
Content Copyright © 2009 Bloor. All Rights Reserved.
Also posted on: Accessibility

IBM’s recent announcement that it is to acquire SPSS not only has implications in terms of the completeness of IBM’s analytics offering but it also has ramifications for the future of data mining as a whole. It is the latter upon which I will focus.

There are two aspects to this: the first is with regard to the choices available to users wishing to use data mining tools and the second is with respect to in-database data mining. I will discuss this second issue first.

Traditionally, in order to perform data mining operations you extracted the data from the data warehouse or mart, into the mining tool where you processed it. The problem with this is that there is a significant performance hit from the extraction of all that data. Of course, you could reduce that impact by sampling the data, but then you lose accuracy. As a result, the trend recently has been to perform mining processes “in-database”. And one of the first announcements we can expect after the acquisition is complete is that IBM will be implementing SPSS functionality in DB2.

Now, both Oracle and Microsoft have the ability to process mining algorithms in-database (as indeed does IBM) but neither of these has a tool like SPSS, so this will give IBM a significant advantage over these two. However, almost the entirety of the rest of the warehousing market has no such capabilities, the exceptions being where SAS has been working with Teradata, Netezza and others to implement SAS capabilities in-database. Also notable (and to be commended for their foresight) is Netezza, which acquired NuTech Solutions last year, though this is more of a tool for building predictive applications than a data mining tool per se.

All of this means that most of the warehousing community is going to be utterly dependent on SAS. Those warehousing vendors who do not persuade SAS to invest in the effort involved in implementing in-database capabilities are going to be at a significant disadvantage compared to those that do. This is good news for Tibco.

Historically, there has been one major data mining vendor (SAS), one middling player (SPSS) and several tiddlers. The tiddlers used to include Angoss, Kxen and Insightful, amongst others, and there has never been any indication that any of these smaller players could break the industry stranglehold exerted by SAS and SPSS. However, Tibco acquired Insightful last year following a previous acquisition of Spotfire. And Tibco is not a tiddler. The company also has a leading event processing engine. Put these facts together with Tibco’s focus on the predictive enterprise and you have a company that has the potential to fill the gap left by IBM’s acquisition of SPSS. Whether it will do so or even attempt to do so is another matter but the potential is there. Personally, I hope they do: however much I may like SAS I would prefer it if they had some real competition.