Monte Carlo – Don’t Gamble With Data Quality
Update solution on April 12, 2024
Monte Carlo positions itself as a data observability platform, monitoring enterprise data quality. The company uses the term “data downtime”, an analogy of website downtime, to explain how customers need to carefully monitor the quality of data in their enterprise, often in real-time. In one survey of 200 data professionals, companies reported an average of 67 data quality incidents per year, with well over half of survey respondents estimating that 25% or more of their company’s revenue was impacted by poor data quality.
Monte Carlo’s software makes extensive use of machine learning to automate the production and monitoring of business data quality rules, which in the case of some older tools is a manual task. The software has connectors to various popular data sources like SAP, Oracle, MySQL, Snowflake and DataBricks amongst others, and reads the metadata of these and makes use of database logs to build a picture of the dataflows in an enterprise. Additional connectors exist to data transformation tools such as Airflow, DBT and Prefect, though not as yet to some of the larger data integration vendors like Informatica or Talend. In this way, the Monte Carlo software is able to, at least to a degree, generate a view of the lineage of data from source systems through to, say, a data warehouse or data lake.
Once installed, the software does extensive profiling of the data and generates data quality rules based on the characteristics and volumes of the data that it is tasked to check, including thresholds of values. It then builds extensive exception reporting and alerts to highlight when the data quality thresholds are breached, which may be due to a faulty data feed, or invalid data or something else like a systems upgrade that has caused a data load to be missing.
Customer Quotes
“Monte Carlo alerts are high quality. We don’t get many false alarms, which really helps build a culture of urgency
to event management and response.”
Adam Woods, Chief Technology Officer, Choozle
“Giving the power back to the domain owners and experts, is one of the most important steps in achieving improved data observability.”
Martynas Matimaitis, Senior Data Engineer, Checkout.com
Monte Carlo is firmly in the data observability and monitoring market; it is a “read-only” tool in the sense that it consciously does not do deduplication or merge/matching, which are usually core functions of traditional data quality tools. Nor do they do comprehensive customer name address validation in the manner of tools like Loquate.
One particular use case of Monte Carlo is for authenticating data that is used in training large language models. Many corporations have decided to spurn general-purpose generative AIs like ChatGPT and instead train a pre-built language model (LLaMA from Meta is one example, others being PaLM, Alpaca and Koala) with their own data. Retrieval-augmented generation is a technique used to optimise the output of an LLM with targeted information rather than modifying the underlying model. For example, an engineering company might train an LLM on engineering documents, while a B2C company might train a customer support Chatbot on previous support conversations and on customer account information. The data used in such situations is clearly subject to the same data quality issues as any corporate data, so Monte Carlo can be used to monitor the quality of such data. They have a partnership with Pinecone, one of the leading vector databases, a specialist database often used in such cases.
The data observability label has been picked up by other vendors and has spawned a series of start-up competitors, while some larger incumbents have picked up on the term and started to label their own technologies with a data observability tag. Hence Monte Carlo competes with other specialist companies like Soda, BigEye and Anomalo in the data observability sector, and with broader providers of data quality software such as Ataccama, Informatica etc.
The bottom line
Monte Carlo has made rapid progress in defining the data observability segment of the broader data quality market, and has rapidly acquired an impressive customer list as well as prestigious investors. It should be seriously considered by customers who do not wish to gamble with their data quality.
Related Company
Connect with Us
Ready to Get Started
Learn how Bloor Research can support your organization’s journey toward a smarter, more secure future."
Connect with us Join Our Community