Updating data preparation

Written By:
Content Copyright © 2015 Bloor. All Rights Reserved.
Also posted on: Accessibility

It is galling to research and publish (very shortly) papers on a specific topic only to have three more companies announce products in that space while you are still in the editorial phase. However, this is what has happened with respect to self-service data preparation platforms. We will soon be publishing both a generic white paper discussing this topic and a Market Update comparing the offerings from the leading vendors in this space. These include Paxata, Trifacta, Tamr, IBM, Informatica, ClearStory, Progress and Alteryx. However, it doesn’t include Alation, Rocket Software, Advizor Solutions or Datawatch, all of which have released products in this space in the last few weeks, and all of which I have blogged about already or will do so shortly.

Fortunately these are brand new products so it would be early days to include them in any comparison at this stage anyway. Moreover, I know of several other vendors planning to enter this space, which have not yet announced their products so if I held off publication to accommodate the latest announcements I might find myself delaying publication almost indefinitely. So, anyway, that rare thing: an update to something that hasn’t been published yet!

First off, by the end of this year, my guess is that we will have around twice as many products in this space as we had at the beginning of 2015. What does this tell us about the market?

The first thing we need to do is distinguish between the products and the vendors. There are three categories: start-ups that have built a specific platform for data preparation (Paxata, Trifacta, Tamr and Alation); data integration/quality vendors that have extended existing capabilities into this area (Informatica, Progress and IBM, with others to follow); and business intelligence/analytics vendors (IBM again, Rocket Software, Alteryx, ClearStory and Datawatch, with others to come). The notable absentees from this list are the database/warehousing vendors like Teradata, HP (Vertica), Pivotal and so on.

There are also, of course, different foci for the different vendors at present (for example, Trifacta has more of an emphasis on data scientists than other suppliers; Alation more on finding out what data is actually available and where), but I expect these to converge as the market matures so these are only short term differentiators.

This is clearly a space in which there is a lot of interest. Conversely, we will shortly have too many suppliers. If we don’t already. This is typical of emerging markets. And, typically, we can foresee a period of consolidation. Moreover, it is fairly clear how this will come about. Companies in the business intelligence, analytics and data warehousing spaces that don’t have self-service data preparation will start to lose out in competition to competitors that do. As a result they will have a build or buy decision. No doubt, a number of these vendors are already in development but, equally, there will be others that will simply wait for the market to gain more maturity and then they will start to buy the start-ups. I wouldn’t be at all surprised if we see this start to happen during the course of this year.

Consolidation will mean that there are data preparation/business intelligence stacks available from the major vendors (this is already the case with IBM’s Watson Analytics) and, if you don’t want to be locked in that way, the predominant choice will be from data integration/quality vendors. I would guess that there might be one pure play start-up left in the market (à la Denodo in the data federation/virtualisation space) or possibly none at all (as with the major complex event processing engines). How long this will take, however, is another matter.