Dell Statistica

During the research for my latest report on data preparation I talked to a number of vendors in the business intelligence and analytics space that offer data preparation capabilities within their product suites. I eventually decided to exclude these suppliers from the report, partly because there are a lot them, which would have the report unwieldy; partly because a lot of the vendors are putting lipstick on a pig and calling it data preparation; and partly because I don’t think you buy a BI/analytics suite because it has data preparation. It might be nice to have but it’s an optional extra.

Anyway, during the course of this research I ran into Dell Statistica (which, incidentally, has a perfectly decent data preparation capability). Now, of course, I am familiar with Dell. I have even written papers about Boomi. But I had never heard of Statistica.

What is worse, I used to write a lot about data mining back in the 90s. Not just SPSS (before IBM acquired it) and SAS but vendors like Angoss as well. And I can remember when KXEN, for example, first came to market (the company was founded in 1998). But I never head of Statistica or StatSoft, the company behind the product, despite the fact that the company was founded in 1984.

My excuse is that StatSoft, which was started by a bunch of university professors and scientists, was highly techy, US-focused and not very good at marketing. Anyway, Dell is doing a rather better job of getting the word out and the product has been well placed recently in comparative reports produced by a number of Bloor Research competitors.

I haven’t space here to talk about all of Statistica’s capabilities – data mining, text mining, sentiment analysis – all the sorts of things one would expect. However, Dell released version 13.1 a couple of weeks ago and it is worth commenting on the latest features, which are quite significant for a point release. The first notable thing, given where I started from in this article, is that the product’s data preparation capabilities have been extended. Secondly, there are new network analytic capabilities. These are billed as enabling enhanced fraud detection capabilities but it is actually broader than this because the “graphical association maps” provided are suitable for exploring all sorts of complex inter-relationships which go beyond the fraud space. Note that this isn’t just a visualisation technique: Dell has embedded the OrientDB graph database so that you get the performance needed when exploring large-scale, complex relationships.

The third thing that has been introduced is in-database analytics processing for additional databases. Previously this was limited to Microsoft SQL Server but has now been extended to Apache Hive (on Spark), MySQL, Oracle, and Teradata. One might say: about time. This is definitely an area where Dell is coming late to the party, as other vendors have been doing this since before this decade. Still, better late than never.

Finally, and perhaps most interesting, is the use of Dell Statistica in conjunction with Dell Boomi for Internet of Things (IoT) environments. When used together you can deploy “analytic” atoms in edge devices or gateways. I have written about edge processing in IoT environments before so I won’t reiterate its importance here. However, I should probably explain what an “atom” is. Briefly, they are (Boomi) run-time engines that execute integration processes but in this case execute analytic processes. What this means is that, if you want to, you can have a consistent analytic environment across the whole IoT implementation with Statistica running at both the edge and the centre.