This paper discusses why data lakes need to be managed and the sorts of capabilities that are required to manage them. As it turns out the list is quite extensive. And this raises a problem: if you need six or eight different technologies to effectively manage a data lake then does it makes sense to look for so many best-of-breed solutions or would you be better off starting with a platform-based approach and only adopting best-of-breed solutions where that is absolutely necessary? This paper also considers the hidden costs involved in managing data lakes such as training, integration costs between tools and other elements that make up total cost of ownership.
We have published a new companion paper (2018) which outlines a methodology for building a business case in support of implementing suitable data lake management software.