MarkLogic Data Hub
Update solution on March 3, 2021
MarkLogic Data Hub, whose architecture is shown in Figure 1, is a fully cloud-native platform for data storage, integration, operationalisation and governance. On-premises implementations are also possible. The product is built on top of MarkLogic Server, a multi-model database that is capable of handling graph, relational, and document data. MarkLogic’s offerings can be deployed either on-premises or in the cloud. The latter in particular is enabled via MarkLogic Data Hub Service, the company’s fully managed, multi-cloud data hub SaaS solution. Moreover, the company supports a gradual transition – what it terms a “pathway to the cloud” – from on-premises deployment, to self-managed cloud deployment, to the fully managed Data Hub Service.
Customer Quotes
“Critical to our 2nd century digitization strategy is not just data integration but business integration – which, of course, they go hand in hand. That is what MarkLogic delivers.”
Boeing
“With MarkLogic, we were able to do in months what we failed to do in years with our previous approach.”
Credit Suisse
Figure 2 provides a more detailed view of how MarkLogic Data Hub works. However, some of this may take some explanation, especially with respect to data integration processes. In MarkLogic this is model-driven. That is, instead of building a one-to-one pipeline you build a source to model mapping and a model to target mapping. Imagine that you have 100 sources which might map to any of 10 targets. If you use a traditional point-to-point approach you need 1,000 pipelines. If you use a model-driven approach you need 110. The immediate benefits deriving from this approach are obvious: far less development and maintenance, as well as logical separation between the sources and targets. You can change a mapping, and test it, without interrupting existing processes. There are check-in and version control features to support updated mappings.
However, there are also less obvious benefits. What you are actually creating during this process is an entity model, or series of entity models, which represents an abstraction layer between source and targets, with semantic models that are persisted within the underlying database. This allows curation of that model, both in terms of data quality and, because of the semantic nature of models, support functions like classification and metadata management (including data lineage). In effect, it offers much the same capabilities as a data catalogue with the exception that you cannot automatically crawl data sources to create the catalogue. On the other hand, the underlying multi-model (graph) database allows you to construct a knowledge graph through which you can explore all the relationships that exist between different data elements. This is done through the Data Hub Central user interface, which provides a no/low-code environment targeted at domain experts rather than IT folk. In addition to allowing you to explore your data catalogue equivalent via a search interface it is also extensible, allowing you to define plug-ins for, say, extra data quality functions that are not provided out of the box. Automation and machine learning are implemented in various places across the Data Hub.
From a data integration perspective, MarkLogic refers to consumption models rather than targets. This is because the environment is by no means tied to populating data warehouses and data lakes but also supports direct feeds into business intelligence tools, machine learning platforms and applications.
More generally, MarkLogic Data Hub acts as a hub for data management on top of MarkLogic Server. This means that it inherits the latter’s multi-model approach, then exposes it via the unified data integration and management platform. It also offers a number of additional features, including data lineage tracking, fast data pipelines, and additional governance features. Most notably, it allows you to directly enforce policy rules on your queries at the code level, filtering the results in order to comply with the applied policy and thus embedding governance into the queries themselves. As a means of policy enforcement this is highly effective because it is present in your system at a deep level, which means it cannot be ignored or easily circumvented.
Finally, with respect to data privacy, GDPR-style permissions can be attached to models during the curation phases, and both redaction and anonymisation (data masking) are supported.
As a concept, the idea of putting an abstraction layer between sources and targets is a strong one. And implementing that on top of a multi-model database that supports a graph model provides powerful support for other data management functions. It is also worth commenting on the company’s pricing model. This is consumption-based. Moreover, the Data Hub Service is serverless and separates storage from compute, with different compute functions (say, loading data as opposed to curation processes) also separated. This is taken into account for pricing purposes, so it is as efficient as it possibly can be.
The Bottom Line
We are impressed with MarkLogic Data Hub. MarkLogic is less well known outside of its core markets but its innovation, and the agility of its solution mean that it should be considered a leading vendor.
Related Company
Connect with Us
Ready to Get Started
Learn how Bloor Research can support your organization’s journey toward a smarter, more secure future."
Connect with us Join Our Community