
Figure 1 – Stardog functional architecture
The functional architecture of Stardog is illustrated in Figure 1, though the underlying architecture is not represented here. This is significant because, in the recently released version 6.0 alpha, the storage layer has been re-written to run on RocksDB and deployment is now based on containers and Kubernetes. RocksDB is a persistent key-value store written in C++. The significance of this is that, compared to say Hadoop, RocksDB is much faster, and has a much smaller footprint.
The company has also introduced Stardog Studio, which it describes as a Knowledge Graph IDE. Currently, the product is replacing a pre-existing web console, but it includes contextually aware auto-completion and hints for editing graph queries, with that support being extended to mappings, rules, and graph data models in upcoming releases.
As far as Figure 1 is concerned, we have already mentioned the support for SPARQL and Gremlin while the support for the GraphQL API is also noteworthy. However, the most important elements to explain are the virtual graph and natural language processing pipelines (BITES) capabilities, along with support for declarative models (which require no coding) that enable the creation of your knowledge graph(s).
It is important to appreciate that graphs can be used to represent any sort of data and not just business data. Within the context of federating data to create a suitable knowledge graph, this means a graph that represents metadata about the source environments. With structured data sources, you create a graph that represents all the data sources you can address, while any particular query is defined by the sub-graph that defines the nodes (data sources) that you access. As for the details of each source that you might want to access, these are defined in what Stardog calls virtual graphs. To create these virtual graphs, you declaratively (no code) map tabular data into the graph model used by a Stardog database, typically using R2RML (relational to RDF modelling language).
For unstructured data, Stardog uses BITES. This is an extensible, but optional, document storage system which provides configurable storage and processing for unifying unstructured data (including images, voice and so forth) with Stardog graphs. Note that by “unstructured” here we exclude semi-structured data contained in, for example, either XML or JSON documents, both of which can be handled via virtual graphs. Sources are processed to extract structured data from the unstructured base artefact so that relevant details can be stored as graph data within the database. BITES allows users to extract text from the source document and then use Stardog’s full-text search capabilities over the contents of both the extracted document data and the explicit graph data.