Generative AI – case studies and limitations

Generative artificial intelligence (AI) burst onto the public imagination in 2023 with the public release of ChatGPT, an AI that has an excellent command of language, and can generate articles, essays, product descriptions or poems. This technology, alongside rivals such as Google Bard and many others, shows great promise in many areas, and has been the subject of feverish press attention and a goldrush of venture capital into start-ups with even a vaguely AI label. Such AIs can interpret medical images as well as physicians, outperform fighter pilots in simulated dogfights, write program subroutines and spot fraudulent patterns in banking transactions. These abilities have generated a frenzy of speculation as to the possible effect on the job market, labour productivity and even whether they pose a threat to humanity.

So far there has been less attention to where these AIs have limitations. We are used to trusting computers to reliably and consistently respond to instructions, tally our bank accounts and search for articles on subjects that interest us. When we ask Excel to calculate something we assume that the answer is correct. To illustrate why generative AI needs to be treated with caution, try this little experiment. Multiply two random five-digit numbers in Excel or on a pocket calculator and note the answer. Now ask ChatGPT or similar to multiply those same numbers together. At the time of writing, you have a roughly 10% chance of getting the right answer from the AI. The longer the numbers, the worse your chances.

Generative AI is very good at creating plausible content and has excellent language skills, but this does not extend to areas such as arithmetic, as this little test shows. It is telling that while AIs have been passing various examinations such as the US bar exam, their performance at more mundane accounting exams has so far been relatively poor.

If you ask ChatGPT to generate you an essay with detailed references it will rapidly come up with an impressively fluent essay, but can you rely on its sources? A lawyer found out the hard way that you cannot, when he lazily got the AI to write a legal submission to a court on behalf of his client. It looked fine, but when the judge tried to look up the cases cited in the submission it turned out that they were entirely fictitious. This is not a rare glitch. This phenomenon of AI “hallucination” seems to be a feature rather than a bug. One set of tests found hallucination rates from 5% to 29% in various popular AIs, and this level of error has been confirmed by other sources.

Similarly, Visual AIs like Dall-E, Midjourney and Stable Diffusion can create really impressive images, but currently struggle with text, amongst other things. So, if you ask a visual AI for a company logo it may look great, but if you ask for the company name on the logo then you are likely to get random letters instead. This is an area being worked on by the vendors, but is another example of the nascent state of the technology.

Intellectual property violation is a major concern about generative AI, since current commercial large language models are trained partly on copyrighted material. Generative AI can only generate data that is based on the data that it was trained on. It may also produce outputs that are similar or identical to existing data; the more obscure the subject domain and the less training data that exists, the more likely that an AI will generate something very close, or even identical to, something that it has been trained on. Apple restricts use of AI-powered tools like GitHub’s Copilot, seemingly due to concerns about potentially giving away trade secrets. It is not alone, with Amazon, Verizon and Northup Grumman just a few other corporations that have taken similar measures.

Another issue related to the data that AIs are trained on is that of bias. If training data contains stereotypes or discrimination based on gender or race (and let’s face it, there is plenty of such material on the internet) then the generative AI may also produce outputs that reflect these biases. This could have negative consequences for the users and recipients of the generated data, as well as for the reputation and credibility of the generative AI system.

Given that generative AI can produce very plausible content, both textual and imagery, it would seem inevitable that it will be used maliciously to produce “fake news”, for example during political campaigns. “Deep fake” videos have already been produced that are worryingly convincing and could be used to influence news or politics. A fake image of an attack on the Pentagon in May 2023 briefly caused stock market jitters when it was shared widely on social media.

The issues discussed above show that generative AI is a double-edged sword: its capabilities are remarkable in many ways, but these capabilities can and will be abused. Moreover, the limitations of current commercial AIs such as hallucination are quite troubling and may not quickly be resolved. People tend to trust computer generated content, as we have all become used to computers generating things that we rely on such as our bank statements. The latest AI tools are exciting and have great promise in many areas, but we need to carefully check their output. We also need to be sure we are using them in areas where they are genuinely useful and can help us, rather than in areas where their weaknesses show through.

If you want to know about this topic in-depth, please download my recent eBook.