Data Fabric with Progress MarkLogic

Update solution on March 12, 2024

Data Fabric with Progress MarkLogic
Mutable Award: Gold 2024

Progress MarkLogic is a data management platform, powered by a multi-model database that can natively store a wide variety of datatypes such as documents (XML and JSON files), geospatial data, relational data, semantic data, images and more. It could be considered a “semantic data platform” or documents store. The product represented its contents with a knowledge graph as far back as 2014, well before the idea was popularised further by the data fabric movement. In a modern interpretation of data fabric, governance and query are centralised. In the related approach known as data mesh both query and governance are decentralised. MarkLogic has a foot in both camps as it decentralises governance and centralises management and querying of the data.

Customer Quotes

“We’ve derived operational benefits in terms of cost reduction, efficiencies and, in analytics we’ve come up with better reporting mechanisms, which helps in risk management.”
Dr Alice Claire Augustine, Taxonomy Data Management Lead, Amgen

“Two years ago they used to do time-consuming flight-by-flight analysis. Today they can compare thousands of tests quickly – this is part of the multi-flight analysis revolution.”
Laurent Peltier, Test Data Processing Expert, Flight and Integration Test Centre, Airbus

Most data projects have to tackle the task of getting data out from operational systems into a separate store such as a data warehouse or data lake. This approach has a number of drawbacks. To begin with, it is usually carried out by developers who may not be intimately familiar with the context of the business data. To avoid duplication of common data like “product”, “asset” or “location”, some process of merging and matching of records will be needed, with consequent data transformations needed. Since most large companies have hundreds of different data sources (some have thousands) this process can be complex and expensive.

Fig 01 – MarkLogic-Powered Data Fabric

MarkLogic starts with a data fabric approach that has a knowledge graph, a store of metadata in a catalogue, and possibly some commonly used data. The data is represented by a model using business terms such as “customer”, “campaign” or “invoice”. MarkLogic manages both conceptual and logical data models, and can store conventional rows and columns of data as well as unstructured data like documents or images. The core platform has connectivity, security, a data model, metadata, semantic analysis and business rules management. MarkLogic has connectors to sources like Salesforce or SAP., which are accessed either via APIs where available or raw SQL where necessary. There is then a process of curating data, and the product has a data hub that includes matching and merging (based on their intellectual property rather than an OEM of a third party matching product as some other companies do). The product carries out semantic analysis and enrichment, and does reference data management, subject classification and annotation. Business rules management is also supported via an add-on product Progress Corticon, which may be specific to an industry or company. The product has specific support for querying complex data like bi-temporal, so goes beyond basic SQL for structured data. There are elaborate security capabilities down to field level including the dynamic redaction of fields.

MarkLogic has many elements of a modern data fabric architecture. In theory, the product can access data stored within it or pass queries out to source systems without needing to move data into the MarkLogic store, though in practice the latter turns out to be generally a lot more efficient. Some of their customers have quite large amounts of data up to the petabyte range.

The knowledge graph within MarkLogic has several navigation mechanisms tailored for both data engineers and data consumers. This includes an attractive visual diagramming tool that can be dynamically navigated. There is also a specialist way to visualise bi-temporal data, a datatype often neglected but which is important in certain industries like finance.

The bottom line

MarkLogic has many capabilities of a modern data fabric, or indeed data mesh, architecture. It has a mature knowledge graph with several ways to visualise the data, and is adept at handling complex data types such as bi-temporal, documents and semantic data, as well as regular structured data. Having been deployed by hundreds of customers, it is a proven product that is worth consideration, especially for companies having to deal with complex datatypes and metadata.

Related Company

Connect with Us

Ready to Get Started

Learn how Bloor Research can support your organization’s journey toward a smarter, more secure future."

Connect with us Join Our Community