Here are some statistics, derived from a variety of sources:
- 88% of all data integration projects either fail completely or
significantly overrun their budgets (by an average of two thirds).
- One in three organisations have delayed or cancelled new IT systems owing
to poor data.
- 75% of organisations have identified costs stemming from dirty data.
While the first of these quotes does not provide any reasons behind the
failures quoted we would contend that the major cause of these failures is
because of problems with data quality as identified in the other two quotes
(both of which originate from PriceWaterhouseCoopers).
Fortunately, in recent years, companies have become more and more aware of
the problems associated with poor quality data; not least because of
increased governance requirements (both internal, to meet SLAs and the like;
and external, through regulations such as Sarbanes-Oxley) but also because of
the growing interest in applications such as master data management
(including customer data integration, product information management, global
supplier management and so on).
However, in so far as Datanomic is concerned, it is not just a question of
errors in your database. In particular, Datanomic does not limit itself to
relational databases. This is just as well, because even more and worse
surprises often lurk in legacy systems, which are frequently not documented,
the source code has been lost, and the people who wrote the initial programs
retired a decade ago. Further, end user computing resources (particularly
Access databases and Excel spreadsheets) are a source of similar problems for
similar reasons. So it isn’t just a question of identifying the errors in
your data, in many cases it is at least as important to understand your data
in the first place. In our view, a large proportion of the failed projects
quoted above failed because the developers devoted insufficient time and
effort in understanding the data they were going to use in the first place.