Product data quality in the Information Supply Chain

Written By:
Published:
Content Copyright © 2007 Bloor. All Rights Reserved.

Readers may recall that I wrote last year about Silver Creek Systems®. At that time I extolled its capabilities for product data quality (matching, cleansing and classification) through the use of semantically-based content profiling and attribute identification. This is particularly relevant when information comes into the organization in unstructured formats as most conventional data quality tools have historically been limited to support for structured information.

I went on to say that “if you have a complex matching problem that goes beyond conventional name and address matching (not necessarily for products) then you must talk to Silver Creek Systems”. Well, I have been getting an update from the company about what it has been doing over the last year or so. And it turns out to be quite a lot.

Before I discuss what Silver Creek has done with its software, however, it is worth reporting on some research that Ventana Research has conducted on Silver Creek’s behalf. It found that “80% of companies are not confident about the quality of their product data” (which I don’t find entirely surprising considering the widespread use of Excel spreadsheets) and that “73% find it ‘difficult’ or ‘impractical’ to standardise product data”. In other words, product data quality is a big problem, although it is only relatively recently that this has become obvious. Silver Creek was one of the first, if not the first, data quality company to recognize this problem and to develop a specialised offering to address the issues involved.

The first point that I want to raise is with respect to what Silver Creek refers to as Information Supply Chains. What the company means by ‘Information Supply Chains’ is the data equivalent of a physical supply chain and they are important within the context of product data because they can be used to highlight problems associated with such information. And just as a physical chain can have many participants so too can an Information Supply Chain. Specifically, Information Supply Chains carry product (or other) information between systems, users and organizations and, just as a physical supply chain is intended to maximize the efficiency of that chain so, in an Information Supply Chain, you want to enable efficient operations that range through eCommerce to product design, and from inventory to business intelligence and customer service. Also, just as in a physical supply chain, whenever the chain is broken, the business is likely to suffer.

Now, the point about Information Supply Chains is that there are a number of touch points where data quality potentially becomes a major issue and can break the chain, particularly where data is coming from external sources. In particular—and this is what makes product data fundamentally different from other types of data—there are no universal standards for either the format or content of data. Different systems and users need to see the data in very different ways within very different contexts, and each product category differs from every other product category, with different schemas, validation and business rules, and vocabulary. Moreover, MDM (master data management) systems need the data differently from the inventory system; the website needs it differently from the ERP system; and every system needs it in a consistent, complete and correct form, which is almost never available from external systems!

To summarize on Information Supply Chains then: the data requires a lot of work and the more you have to use manual methods to do all of this work, then the more expensive and unreliable the solution is going to be. This is where Silver Creek Systems comes in.

What Silver Creek has done is to introduce the concept of Data Service Applications that you can place at each of the touch points in the Information Supply Chain. These are web service-enabled data quality applications, each of which is based around one or more Data Lenses. I probably need to explain that a bit further.

Each Data Lens applies semantic rules to interpret, standardize, validate and apply a quality metric to any product (or other) data. This is regardless of the format or domain to which the product data belongs. Next, Data Service Applications take one or more Data Lenses and combine these with business rules to manage exceptions and to take appropriate (business) actions to ensure that only complete, consistent and reliable data is passed on to downstream systems. For example, suppose that you are selling a particular type of widget for which you require five specific attributes. What do you do if a description of a widget comes in from a supplier that only has three of those attributes? The short answer is that you determine some sort of business rule concerning how to handle this exception, such as fill in the gaps by scraping a website or other data source, flag it for manual review or some other sort of resolution. These rules are built into the Data Service Application which then ‘handles’ the exception.

Going back to the Information Supply Chain, the idea is that you embed Data Service Applications at each relevant point in the supply chain, in order to ensure a smooth flow of accurate data. Note that these applications do not need to know about the source or target as these are simply applications that are called or invoked via a web service, as required. Further, there is no actual programming involved as the DataLens™ System has been built from the ground up to be used by business people and you create Data Service Applications simply by using graphical, drag-and-drop techniques.

There are two really important points here. The first is that a Data Lens is better than traditional approaches when it comes to standardising, cleansing and matching unstructured product data. For example, Silver Creek cites stories from its customer base such as an electronics company that improved its quote fulfillment rate by 20% in 3 months, worth “millions of dollars in additional quotations”; and a distributor that cut manual services costs by 75% and, of course, with improved data quality. However, the real point is that these companies couldn’t have achieved anything close to these results using traditional approaches.

The second important point is this notion of a Data Service Application that can be built for the specific task in hand (Data Lenses can be reused where appropriate) and which can be initiated on demand. It is not so much that you couldn’t implement this sort of solution with other products but that I have not spoken to any other company that has such a clear vision of how these sorts of solutions should be created and deployed. This is particularly important because there are other companies trying to grab at Silver Creek’s coat-tails with respect to product data quality and it is this broader vision as much as its capabilities that will keep Silver Creek ahead of the chasing pack.