Alpine Data: the first real data mining solution for the Big Data generation

Written By:
Content Copyright © 2011 Bloor. All Rights Reserved.

The biggest barriers that I see to the widespread adoption of Big Data is the skills that are required to deliver the benefits that we all agree can be obtained. In the standard MIS layers of the BI tool suites (Reporting, Dashboards, and OLAP) we are seeing an increasing emphasis on what is being labelled Agile BI, a tool set that offers the same power as the traditional tools, but which costs less, is easier to use, is targeted at the business user and not the IT professional, is far more visual in how they are controlled and what they output, and which increase productivity in a step-change. But in the area in which Big Data offers the biggest potential return, that of data mining, the application of statistical and mathematical modelling to identify patterns of significance, there has been no comparable change, until now. Alpine Miner is the first offering I have seen that is clearly addressing the challenges of the scale and affordability of exploiting Big Data.

I have, for a long time, been a big fan of KXEN as an alternative to SPSS or SAS for those businesses that do not have the skills required to really make the most of the considerable power of the established market leaders, being easier to deploy and understand the results if you are not a statistician, whilst still delivering models of comparable statistical validity. But all of those technologies are, at present, going to struggle to economically cope with the scale of the data when it comes to Big Data. This is where Alpine Data Labs provide the first sight of a next generation of data mining solution, which copes with the scale of big data, but is still affordable, and is designed to be used by people in the business world and not just statisticians.

Alpine Data Labs are a spin-off from Greenplum (just prior to the EMC acquisition of Greenplum last year). Their primary product, Alpine Miner, is a data mining and analytics platform meant to leverage the processing capabilities of MPP databases like Greenplum and Oracle’s Exadata. Alpine is headquartered in San Mateo, California with a sizeable development shop in Beijing. They have over 15 early adopter customers in both the US and China, and already over 500 evaluation downloads have been taken, so there is a lot of interest and the company is showing very solid growth based on quality opportunities.

This is a disruptive technology, aiming to bring advanced analytics into the hands of people capable using it to change the business landscape. The user interface is a drag and drop GUI, and the technology is designed to be cost effective, so there are few reasons at the business level not to invest. The promise of Big Data will only be achieved when the ROI is compelling. If the benefits can only be obtained by the deployment of very expensive technology, by very expensive consultants, operating over elongated time periods, Business will not bother.

So lets see how Alpine Data address these challenges. Firstly we should note that Alpine Miner is very much a work in progress. The company has a clear and compelling vision of where they are heading, and have already established the fundamental building blocks. The first challenge is one of scalability. This they are confident they have addressed and are well on the way to handling any size of data on any platform. Next they want to provide a platform that is well integrated so the modelling is not just providing insight; that insight must be readily actionable, so that it drives business improvement. Again, this is there already but will probably just get better and better over time. The third point they want to address is making data mining a participative technology, so that it is embedded into the business decision-making process, used naturally within the business to aid effective business management. This model is coming in 2012 with Alpine Miner v3.0, and finally they see some sort of a SaaS offering down the road.

The key to much of what Alpine delivers is that they are embedding the computation into the data, and not moving data to the tool. Alpine Miner is an analytics engine that connects directly to Greenplum, PostgreSQL and Exadata with offerings for Netezza and Hadoop on the roadmap.  Alpine runs all of the transformations, calculations, and analytic processes directly within the database itself, thus eliminating the need to extract data out of the database and sending it off to another (smaller) analytic server for processing. On the client side, a PC or Mac controls things through a point and click GUI, with no arcane statistical notation to navigate. This “in-database” approach leverages the MPP capabilities of the appliance, and eliminates many of the constraints on scalability and integration seen with traditional data mining tools. The tool is designed to be used easily by BI analysts and should be a natural extension of their BI toolkits.

This model is capable of revolutionising how analytics are deployed. The slow, expensive model of the past is being replaced by a quick to deploy, rapid time to results, affordable alternative tailored to the needs of the business user. None of this changes what we do, just how we achieve it. So we are still looking to find answers to understand the customer and how they value things, and how then to market to them those things that they will value the most, but the cost of doing that has been changing dramatically, making the ROI really compelling.

Alpine Miner is a really exciting offering, which makes the promise of Big Data analytics more of a reality, to a broader audience than has been true before. I suggest you track their progress.