Yellowbrick emphasises scalability and affordability

Written By:
Content Copyright © 2024 Bloor. All Rights Reserved.
Also posted on: Bloor blogs

Yellowbrick emphasises scalability and affordability banner

Yellowbrick is a data infrastructure vendor that provides a fast, scalable and affordable columnar data warehouse database, massively parallel and itself based on the open-source Postgres core. It was founded in 2014 and is headquartered in Mountain View in California. Yellowbrick can be deployed either on-premise or in a private cloud, and can also be deployed in public clouds such as Azure and AWS. Yellowbrick has integration both with open-source data lake technologies like Apache Spark and Delta Lake, and also support for leading business intelligence and analytic tools like Tableau and SAS.

Yellowbrick has around fifty customers at the time of writing, including the US Navy, LexisNexis, BMW and Catalina Marketing. Its largest deployed customer has a 10-petabyte Yellowbrick database. They do not have a specific industry focus, but are often deployed by customers who have security or privacy concerns about data.

Yellowbrick emphasises scalability and affordability in its messaging. Several customers have migrated to it from Redshift for example, who impose certain scale limitations that make it cumbersome or expensive to grow beyond certain limits (the fact that both Redshift and Yellowbrick have a Postgres base makes data migration very easy).

In its engineering, Yellowbrick continues to work on simplifying the deployment of large-scale data warehouses in the cloud. Its support for Kubernetes containers means that it can safely execute arbitrary code inside the database, which can allow processor-intensive activities to be executed quickly. The vendor continues to improve the core database optimiser and expect in the future to provide greater support for geospatial data and unstructured data.

Yellowbrick can play a role within a data fabric architecture, though the vendor does not itself have a data catalog or a federated query capability. However, it can work alongside such products, and they have technology partnerships with Collibra and Denodo, for example.

Yellowbrick has a code library and tutorials to enable custom chat-style large language models (LLMs) to run on top of Yellowbrick. For example, their product documentation has been stored in a database and a chat application has been implemented that can run against it, pulling out the most relevant documentation articles and feeding them to ChatGPT as a real-time prompt, an example of so-called retrieval automated generation (RAG). This technique allows more accurate responses to be produced from ChatGPT than a regular query to the AI.

At present, while there is a lot of excitement about the potential of natural language queries via AI being used for business intelligence, this is an emerging field. In particular, it requires a high level of data quality in the underlying data sources to work effectively. Yellowbrick continues to be an interesting vendor to watch, with its emphasis on an affordable and highly scalable data warehouse that is suitable for very large and challenging analytics applications.