It is important when protecting sensitive information to know what data needs to be protected. This applies both to data that needs to be anonymised, pseudonymised or redacted for regulatory reasons but also to data that is company confidential. The problem is that finding all of this data, prior to masking the data or taking other protective measures, is not as easy or as comprehensive as it is sometimes made to sound. Data profiling, for example, is not as exhaustive in discovering sensitive data as the suppliers of such software often claim.
In this paper, we examine how to find occurrences of sensitive data and we consider the different techniques that are currently available, as well new methods that are starting to emerge. In particular, we argue that data profiling, on its own, is not usually sufficient for this task.