Ataccama and AI

Every technology company is assessing what they need to do to react to the avalanche of interest in generative AI that was unleashed when OpenAI released its ChatGPT interface on an unsuspecting world on November 30^th2022. Of course, some software companies have been using AI in different forms for years. For example, in the world of data quality, it has become common to use machine learning to help with merging and matching records, and that has been the case for several years. However, generative AI has captured the public imagination, and every company has been exploring what uses they might make of this latest flavour of artificial intelligence.

Ataccama is a data management company originally founded in 2007 in the Czech Republic but is now headquartered in Toronto. Its software platform spans data quality, master data management and data governance. Since 2016 it has been using machine learning for tasks in data quality. In particular, it has been using it for data transformations, business term detection, anomaly detection, adaptative data quality thresholds and outlier detection. Recently they have started to explore the use of generative AI, and in the next version of their product (general release in Q1 2024) they will unveil its use in generating data descriptions for data assets, generating data quality rules, allowing text to SQL (and SQl to text), automatic rules suggestion and a chat feature for product documentation. This last one seems a particularly natural use for generative AI. Instead of wading through a long product manual you will be able to just ask questions in normal English and the AI will search the product documentation and retrieve for you an answer.

It was interesting to discuss with the vendor what they see as the limitations of the technology. One is cost and scalability. For example, one Ataccama project for a large Telco involved classifying 22,000 databases and a million tables. This was possible with their existing technology but would be impractical for generative AI due to the sheer cost and elapsed time of processing such a large structure. The vendor has also set up a research and development effort to look at more radical potential uses of generative AI, and shared some experiences of this and a prototype of what may come in the future. One concern about generative AI is its tendency to “hallucinate” and produce answers that are plausible but made up. Ataccama has found a way to address this. When a user asks a question, say about the metadata associated with a data asset, then they do not immediately return the AI answer to the user. Instead, they validate the response, initially by syntax-checking the answer that the AI gave. If the AI produces an answer that does not work syntactically then this is detected and the AI is sent back a message explaining the problem with an error message and is asked to correct its answer. This process may take multiple feedback loops, all before the user is exposed to an answer. It is intriguing that simply asking the AI to “double check your answer” seems to actually improve the quality of responses. Another feature is the ability to do data transformation to some degree, for example, splitting columns where appropriate and removing columns that are not relevant to the particular query.

In a demonstration, an AI was asked to find all the non-US attendees for a conference. The interface first explained the steps it was going to take, so detecting which tables might have conference data, then retrieving the conference attendee data but filtering it to remove attendees from the USA. The first answer it gave was mostly right but still had some data with users who had identified themselves as coming, not from “USA” but with an ambiguous label such as “USA-Cal” or “USA-NJ”. When this was pointed out to the AI via the natural language interface it corrected itself without having to be explicitly told the nature of the problem. Instead, just saying: “there are still some US attendees” was enough for it to remove the problem records from its answer.

The aim of the vendor is to ensure that generative AI is used to produce meaningful answers and to provide guardrails through carefully engineered prompts and feedback loops. In this way, business users can get the benefits of productivity from using an AI without (or at least minimising) the drawbacks that current AIs have through hallucination. It will be intriguing to see how these research and development efforts manifest themselves in the future in the core Ataccama product.