This is the second of four Market Updates on data discovery, data profiling, data cleansing and matching, and data quality platforms respectively.
For those readers who have not seen the Market Update on data discovery we need to explain the distinction between that discipline and data profiling. Data profiling does two things: it discovers relationships between data elements, whether they are in the same data source or across multiple, heterogeneous data sources; and it performs statistical analysis against individual columns (in a relational database) discovering such things as the number of null values, whether the data matches the expected datatype and so on. Data discovery is only concerned with the first of these capabilities. So why make the distinction? There are two answers: the first is that there are data discovery tools that are not data profiling tools and the second is that data profiling is closely associated with data cleansing whereas data discovery has utility in a number of other areas, for example it is complementary to data modelling. For a full discussion on this topic see the first Market Update in this series and its accompanying Spotlight Paper.