skip to Main Content

Data Quality

Last Updated:
Analyst Coverage:

Data quality is about ensuring that data is fit for purpose; that it is accurate, timely and complete enough relative to the use to which it is put. As a technology, data quality can either be applied after the fact or in a preventative manner.

Data quality is often separated into data cleansing and data profiling as separate technologies, where the latter is used to discover where errors exist and to monitor (typically via a dashboard) the current status of errors within a particular source. Here we are defining data quality as excluding data profiling. In addition, different elements of data quality such as data matching (discovering duplicated records) and data enrichment (adding, say, geocoding or business data from the Internet), as well as data cleansing per se, are often treated separately even though they form part of a single product.

Data quality also provides one of the sets of functionality required for data governance and master data management (MDM). Some data quality products have specific capabilities to support, for example, data stewards and/or facilities such as issue tracking.

Data quality products provide tools to perform various automated or semi-automated tasks that ensure that data is as accurate, up-to-date and complete as you need it to be. This may, of course, be different for different types of data: you want your corporate financial figures to be absolutely accurate but a margin of error is probably acceptable when it comes to mailing lists.

Data quality provides a range of functions. A relevant tool might simply alert you that there is an invalid postal code and then leave you to fix that; or the software, perhaps integrated with a relevant ERP or CRM product, might prevent the entry of an invalid post code altogether, prompting the user to re-enter that data. Some functions, such as adding a geocode to a location, can be completely automated while others will always require manual intervention. For example, when identifying potentially duplicate records the software can do this for you, and calculate the probability of a match, but it will require a business user or data steward to actually approve the match.

Poor data quality can be very costly indeed and there have been numerous studies examining, and proving, this point. So the CFO should care. Conversely, good data quality ensures that your information about your customers is as complete and as accurate as it can be, which means, as we move more into a world of one-to-one marketing, that the CMO will also be interested in data quality. For companies that recognise that data is a corporate asset then data quality will be important for line of business managers and everybody up to the CEO level.

Further, data quality is of particular importance for compliance officers and data governance (which overlap) and CIOs. We discuss the relevance to compliance in the section on emerging trends but for CIOs data quality is important in a number of technical environments such as data migration, where poor data quality can adversely affect the success of the project and extend both costs and duration. This also applies to data warehousing where unsuccessful or delayed implementations have frequently been ascribed to poor data quality.

Data quality is what you might call a slow-burner. It has been an issue since the mid to late 90s and there are still companies that either refuse to recognise that they have an issue with data quality or don’t hink it is worth the cost of fixing. This is gradually changing as people get better educated but the uptake of data quality technology remains on a slow growth path and we don’t expect that to change unless and until compliance requires it.

Traditionally, the adoption of data quality methods and tools has been a choice. The biggest emerging trend is that it is starting to become mandatory. Regulations such as Solvency II and MiFID II in Europe, and Dodd-Frank in the United States are starting to mandate that data is accurate, in which case good data quality will no longer be a choice. It is our belief that other regulations (SOX II?) will increasingly focus on the accuracy of data in addition to the existing emphasis on process.

The data quality market is mature and there has been little change over the last several years. The last major acquisition was that of Datanomic by Oracle. One notable feature has been that a number of smaller companies, such as Uniserv and Ataccama, have emerged as credible suppliers from non-English speaking environments.

In general the market is split between those companies that just focus on data quality and those that also offer either ETL (extract, transform and load) or MDM (master data management) or both. Some of these “platforms” have been built from the ground up, such as that from SAS, while some others consist more of disparate bits that have been loosely bolted together. There is also a distinction between those that can provide specialist facilities for product matching (which is more complex than name and address matching), such as Oracle, and those that cannot.


  • Alex Solutions (logo)
  • Experian Data Quality (logo)
  • Global IDs (logo)
  • Informatica (logo)
  • Pitney Bowes (logo)
  • Trillium Software (logo)

These organisations are also known to offer solutions:

  • BDQ
  • Clavis
  • FICO (InfoGlide)
  • IBM
  • iWay
  • Microsoft
  • Oracle
  • Pervasive Software
  • SAP
  • SAS
  • Talend
  • Uniserv
  • X88
post (Icon)

Profiling is not just about quality

Data profiling can be used for more things than just supporting data quality
Cover for Data Quality

Data Quality

By data quality we mean both data cleansing (matching, deduplication, error correction, enrichment et al) as well as data profiling and discovery.
Cover for Data profiling: the business case

Data profiling: the business case

This paper discusses the benefits of automating the discovery of data quality issues through the use of data profiling technology.
Cover for The business case for Data Quality

The business case for Data Quality

It should be clear from the preceding discussions that there is much to be said in favour of a platform-based approach to data quality.
tribe_events (Icon)

Free Breakfast Seminar with Ataccama

Breakfast seminar with Philip Howard, Research Director of Data Management at Bloor Research. Strategic Business Value from Your Enterprise Data.
post (Icon)

There’s identity resolution and then there’s identity resolution

Identity Insight from IBM is not a data quality product
Cover for Solvency II: data quality and governance

Solvency II: data quality and governance

The Solvency II Directive for insurance and reinsurance companies in the EU comes into force on December 31st 2012.
Cover for Talend Data Management Platform

Talend Data Management Platform

Talend is established as the leading open source provider for data integration and the introduction of a unified data management solution represents a major step forward for the company.
Cover for The Importance of Data Quality

The Importance of Data Quality

Data quality is too important to your business to leave it to the chance of a tick-box.
post (Icon)

Confessions of a serial DQ perpetrator

I admit to entering false information into web sites but is it my fault?
Cover for Pervasive Data Quality

Pervasive Data Quality - Improving business processes with high quality data

In this paper we argue that a more progressive and pervasive approach to data quality initiatives is required.
Back To Top