Datanomic dn:Director

Cover from Datanomic dn:Director

Date: 6th June, 2008
Format: InDetail

Free Download (subject to terms)

Here are some statistics, derived from a variety of sources:

  • 88% of all data integration projects either fail completely or significantly overrun their budgets (by an average of two thirds).
  • One in three organisations have delayed or cancelled new IT systems owing to poor data.
  • 75% of organisations have identified costs stemming from dirty data.

While the first of these quotes does not provide any reasons behind the failures quoted we would contend that the major cause of these failures is because of problems with data quality as identified in the other two quotes (both of which originate from PriceWaterhouseCoopers).

Fortunately, in recent years, companies have become more and more aware of the problems associated with poor quality data; not least because of increased governance requirements (both internal, to meet SLAs and the like; and external, through regulations such as Sarbanes-Oxley) but also because of the growing interest in applications such as master data management (including customer data integration, product information management, global supplier management and so on).

However, in so far as Datanomic is concerned, it is not just a question of errors in your database. In particular, Datanomic does not limit itself to relational databases. This is just as well, because even more and worse surprises often lurk in legacy systems, which are frequently not documented, the source code has been lost, and the people who wrote the initial programs retired a decade ago. Further, end user computing resources (particularly Access databases and Excel spreadsheets) are a source of similar problems for similar reasons. So it isn’t just a question of identifying the errors in your data, in many cases it is at least as important to understand your data in the first place. In our view, a large proportion of the failed projects quoted above failed because the developers devoted insufficient time and effort in understanding the data they were going to use in the first place.