Waterline Data Catalog

Update solution on September 26, 2017

Waterline was founded on the realisation that the rise of big data and data lakes, and the consequent increase in the volume and variety of data available, would quickly lead to an impenetrable, unusable morass of dark data – now colloquially known as a data swamp – without proper management. As a result, its product, Waterline Data Catalog, is targeted at managing the enterprise data lake, covering Hadoop and traditional data environments, offering a complete solution for data discovery, cataloguing and compliance both on the data lake or on the more traditional relational database.

Waterline’s self-described mission is to help connect the right people to the right data. Waterline recognises that there is too much data volume and variety for humans to manually catalogue. As a result, it has created an approach called “data fingerprinting” that uses machine learning algorithms to automate the consistent tagging of data attributes with commonly used business terms. At the same time, Waterline recognises that the people in your organisation, particularly but not exclusively your data stewards, still know a great deal about your data. Consequently, Waterline seeks to extract that knowledge and formalise it so that it can be accessed freely within your organisation. This has led to augmenting the automated discovery technology, which helps populate the catalogue very quickly, with collaboration and crowd-sourcing capabilities. For instance, users are encouraged to leave ratings and reviews on data sources, and data stewards, in particular, can annotate data sources in order to guide and help other users.

While Waterline uses a direct sales model it also has a significant partner network. These partners fall into four categories: value added resellers and systems integrators, which resell the product both across the USA and internationally, namely Infosys, Wipro, Leidos and T-Systems; pure consulting partners, which include Deloitte, CapGemini and others; platform partners, such as Cloudera, Hortonworks, MapR, Amazon and so forth; and technology partners like Trifacta, Privitar, Syncsort and Paxata, amongst others.

Waterline has customers spanning many industries, including automotive, aerospace, banking, government, healthcare, insurance, and life sciences. Some of their leading customers include McDonald’s, Hewlett Packard Enterprise, NASA, Airbus, Intel, Starbucks, Creditsafe, Kaiser Permanente, Nordea and Santander. Use cases for the product range from self-service analytics to data governance and compliance, to data consolidation and rationalisation.

Waterline Data Catalog features data profiling, data discovery and global search based on Apache Solr, all of which leverage the compute power across your data lake systems, including both data lakes (Apache Hadoop, with support for Apache Spark) while also connecting directly to relational databases (via a plugin architecture) for cataloguing purposes. Lineage can be imported from other sources (for instance, Cloudera Navigator or Apache Atlas) or derived directly from your data (and corrected manually if needed). Note that the native use of Hadoop and Spark make it possible for Waterline Data Catalog to scale to meet the needs of even the largest data lakes.

The software also includes collaborative capabilities, such as crowdsourced ratings, reviews and annotations, and has sophisticated data matching capabilities that leverage machine learning to automatically suggest business terms – known as ‘tags’ – for fields within your data sources. Notably, this is done by examining the data itself, rather than simply the field name. Users with appropriate authority can accept or reject these suggestions. If they are accepted, they are added to the field as a custom attribute. What’s more, due to the machine learning component, these suggestions will become more accurate over time as the system ‘learns’ how your organisation interacts with your data. The tags themselves are stored within your system, each associated with a particular domain. Built-in tags exist in a default domain and additional prefab domains, that contain common tags used in a particular space, such as ‘GDPR’ or ‘Retail’, are available. Existing business glossaries or taxonomies can be imported and if additional terms and domains are needed they can be created manually. Waterline also features automated tag-based data access control and integration with Apache Ranger and Cloudera Sentry is provided.

Solutions

In addition to the support, training and other services provided directly by Waterline the company also partners with several professional service and consulting firms to help their customers implement Waterline products and integrate them into their existing data environment. These firms currently include: Infosys, T-Systems, Deloitte, Capgemini and Leidos.

Related Company

Connect with Us

Ready to Get Started

Learn how Bloor Research can support your organization’s journey toward a smarter, more secure future."

Connect with us Join Our Community