Forthcoming research

The data management group at Bloor has been and is particularly busy: we are about to publish a report on Enterprise Search (by Roger Whitehead) and another on Master Data Management (by Harriet Fryman) while separate reports into Business Intelligence and Corporate Performance Management are well under way (both by Gerry Brown). In addition, Bharat Mistry is just starting research into Enterprise (and ‘Basic’) Content Management. As for myself, I am dual tracking Event Stream Processing and Data Quality.

Event stream processing is what you have to do if you want genuinely real-time information as opposed to the namby-pamby “near real-time” or the even more pusillanimous “right time”. It is sometimes referred to as Complex Event Processing but this is misleading as the events being processed do not have to be complex. It is also often called stream-based processing but this is too likely to be confused with Streambase, one of the leading vendors in the market. Mind you, Progress has named its product in this space as Progress ESP so I’ll be careful about using acronyms.

The point about event stream processing is that you process data as it streams through your environment and before it is committed to the database – because putting the data into a database first (what with commit processes, logging, index updates and so on) takes too long. However, what I find really interesting about the space is the applications for which streaming is suited. There are no more than about half-a-dozen suppliers in this market and they rarely meet one another in competition: this is because there is currently space enough to target different subsets of the market. So, in this report I intend to focus a lot on case studies and potential uses of event streaming as much as the vendors and products.

Data quality is another story: this, at last, seems to be a technology whose time has come. We at Bloor have been writing reports and preaching the data quality message for the best part of a decade, but it seems that it is only now that the message is finally getting through, thanks in part to compliance and governance issues but also due to an increased awareness of the need for better quality data to support master data management and business intelligence, in particular.

In terms of tools, the data quality market includes two types of product, which do profiling and analysis, and cleansing and matching respectively. However, in the latter category there are not only general-purpose tools there are also products that have been specially designed for, say, name recognition or for complex data such as product names.

In addition, vendors are positioned differently in the market. Group 1, for example, although it is a leading supplier of name and address matching and cleansing solutions (particularly in the United States) does not really sell data quality. Rather, it sells customer communications management of which data quality is merely a part.

A second group of data quality providers is made up of those vendors that provide data quality as ancillary to another product. For example, Microsoft and Oracle are both in this category, as is Sunopsis: you would not go to any of these companies for a data quality solution but you might use the data quality provided if you wanted that company’s ETL (extract, transform and load) tool. Indeed, Sunopsis, for example, specifically provides enough data quality to support its ETL functions but is not intended to go beyond that.

The two remaining groups of suppliers consist of companies that provide stand-alone product solutions and those which also provide a broader EIM (enterprise information management) platform, which typically includes both ETL and EII (enterprise information management) as well as data quality (and metadata management). All of the EIM vendors—Business Objects, IBM, Informatica and SAS—also provide their data quality tools as stand-alone options to compete with the likes of Trillium, Silver Creek Systems, Datanomic and the like.

Perhaps the most interesting question is who else will move into the EIM market? In my view, the EIM space will expand to include master data management (MDM) as well the components identified above, in which case Oracle should be the first candidate to spring to mind. This would make sense, not just from a technical perspective (combing Oracle Warehouse Builder with the company’s master data management solutions) but also from a business viewpoint: with all of its recent acquisitions its customer base must have major integration/migration issues and introducing a generic (as opposed to Oracle-centric) EIM/MDM solution would seem a sensible direction in which to move.