Natural language and SQL – A puzzle of complexity

There is no greater impediment to the advancement of knowledge than the ambiguity of words.”
Thomas Reid, 18th century Scottish philosopher

It has long been a goal of computer scientists to allow humans to interrogate data in a form that does not require an understanding of programming. Approaches were proposed in the 1980s and early tools based on language parsers were developed in the early 2000s. However, the popularity of generative AI, sparked by the release to the public of OpenAI’s ChatGPT in November 2022, has stimulated renewed interest in the subject. Modern generative AIs can write essays and poems, but also program code in various popular languages, including the ability to formulate structured query language (SQL) queries, the de-facto language used to access relational databases like Oracle, MySQL and DB2. Software vendors have started to incorporate natural language interfaces based on ChatGPT and other AIs within their products, thus enabling customers to query the data within their applications. An end user can pose a question in normal English (or indeed a whole range of other human languages that the AI understands) and get an answer back from a database.

So, problem solved? Not quite. Since writing SQL statements is a technical job, products have been around for years that try to hide the complexity of SQL syntax for end users. Back in the 1990s products like Business Objects introduced a semantic layer (called a “universe” by this vendor) between the end user and the physical database. Instead of displaying a database schema with columns and rows and keys, a user would see a graphical display showing logical groupings of data around useful topics like customer or product or asset. They could then quite easily query this data via a visual interface, dragging and dropping icons on a screen that represented “customer number” or “product code” etc. To actually get this far, though, quite a lot of effort had been made behind the scenes by data administrators, mapping often technical table and column names into more cuddly names. For example, the opaque “KNA1” in SAP becomes the more intuitive “customer master” when presented to the user, the delightfully unintuitive “VBAP” is shown instead as “sales”, “LFA1” becomes “vendor master” and so on. Other products took a similar approach, the idea being to hide the technical complexity of the physical database names and structures from the end user, so that the user could make sense of them when formulating a query.

Now generative AI has come along with a natural language interface, there is still the question of how an end user is to make sense of an underlying database structure. Even if the underlying tables and columns have friendly names like “customer id” and “sales”, it turns out that it is not easy to translate natural sounding user inquiries into unambiguous SQL. For example, “Show me sales for clothing last year” could mean total sales for all clothing categories, or sales for each clothing category. The SQL query could group by category or not. Or, to take another example, “Which customers spent over $500?” could return customers who spent over $500 in total, or in a single transaction. The SQL query would need a transaction or a customer level filter, depending on what was intended. “Which stores had high sales?” is reasonable in English but is subjective. A numeric threshold, rank, or comparison to the average would need to be specified in the SQL query.

SQL requires a level of precision that (mostly) ensures that the query will result in an answer that corresponds to the intent, but as mentioned, end users typically do not fully understand the implications of joining tables together or foreign keys, or the differences between an inner and outer join in SQL. That is why business intelligence products have gone to the trouble of ensuring that end users see a friendly, visual semantic layer rather than a physical database structure. Building this layer requires a lot of work and indeed is an ongoing task for data administrators, adapting the semantic layer as new sources of data are needed or structure changes occur, like a new product hierarchy or a new way of grouping sales regions after a reorganisation.

An AI will do its best to translate a user query in natural language into a SQL statement, and indeed AIs are notoriously eager to please – they will almost always give you an answer. In simple queries this will mostly be fine, but then a really simple query is easy to write in SQL too. The trouble is that, in more complex cases, the answer may not be the one that you are expecting, given the potential ambiguity of the natural language inquiry. A secondary issue is efficiency, though that is arguably less important. Throwing lots of queries at a very large database can result in either a slow response or a high cost (in the case of cloud databases that charge on usage) or both, and good programmers spend a lot of time understanding how to formulate reasonably efficient SQL queries, even if database optimisers these days are much better at generating efficient access plans than they were some years ago.

There are other issues that are peculiar to AIs. Generative AIs have a tendency to make up plausible answers that may not actually be correct, a phenomenon known as hallucination. In the case of a document, they may make up academic references that don’t actually exist, something that can have serious consequences as one lawyer found out when he used ChatGPT to write a legal brief for his clients. In the case of database queries, software developers have reported AIs making up table names in queries. An AI is not a programmer or database designer – it has no underlying understanding of SQL. A large language model AI like ChatGPT generates text by statistically predicting the next thing to write in response to a user prompt, making up words or sentences by drawing on its training data. Due to the statistical probability of some words appearing after another in the training data, AIs often generate misinformation which is presented in a plausible way. In the case of coding, the AIs have been trained on huge numbers of programs, and complete them in much the same way as writing an essay. This approach is frequently successful and will generate a fluent sentence or a computer program than executes correctly, but it may from time to time include material that the AI thinks looks reasonable but does not actually exist. The frequency of hallucination is the subject of some interest and debate, but academic studies suggest that hallucinations are far from rare – a rate of 29% in one study of ChatGPT4.

A further issue is that generative AIs do not really explain how they have come up with an answer in the way that a conventional computer program can be traced and decoded. If I write a program and get an unexpected answer then I can interrogate the code and try and debug it. Some AIs will at least show you the SQL they used to execute a database query, and may even help you fix the same faulty SQL, but they do not have the understanding of the database structure that an experience programmer or database administrator has.

The concern is that people are used to answering questions of computers and getting answers back that are correct, or at the very least consistent. We do not usually ask Excel to multiply two numbers together and then double check the answers with a calculator, yet at the time of writing many leading AIs struggle to multiply together two five-digit numbers. If we ask an AI to generate a database query about sales figures, how confident can we be that the query that we expressed was unambiguous, and that the AI did not hallucinate in its answer? How about if that query was a bank employee checking whether a transaction was fraudulent, or a government official member using an AI to research whether someone was carrying out a bank transfer from a sanctioned bank, or looking for a pattern of financial activity that would flag someone as a potential terrorist?

Generative AI has great potential in a wide range of applications, from interpreting medical images to writing appealing press releases and coding programs. However, at the moment there are limitations and issues, as noted above, that mean that considerable caution needs to be exercised when using these tools in situations where precision and accuracy is crucial. Over-reliance on a technology that has inherent limitations, especially if human supervision and checking is either cursory or non-existent, will have serious consequences. In such cases it is to be hoped that the lawyers involved in dealing with these consequences use AI better than the one mentioned earlier.