The use of artificial intelligence (AI) in medicine and life sciences has a long history. In 1971, the University of Pittsburgh developed a tool called INTERNIST-1. This used a database of disease profiles and algorithms to help with clinical decision-making, including interrelated diagnosis. It did well, and was later commercialised as a product called QMR. Further AI systems, such as DXPlain in 1986 allowed clinicians to input symptoms and receive potential diagnoses, with a 73% success rate in one trial, though it had limited take-up. As both AI itself developed, and cheaper and faster processing power became available, more systems sprang to life, from expert systems to deep learning. By 2017, it was estimated that 40% of US hospitals were using such clinical decision support systems, with a similar number (62%) in Canada, a number that has doubtless since risen. They are commonly used to avoid drug-to-drug interaction, a major problem in medicine.

More recently, there have been many further attempts to bring AI into healthcare, some more successful than others. IBM’s Watson for Oncology project was a $4 billion dollar failure, being abandoned in 2021. A deep learning system from Google called Verily, aimed at the detection of diabetes from eye scans, failed in a trial of 7,600 people in Thailand. A sepsis diagnosis model from Epic Systems Corporation, deployed at hundreds of US hospitals, was shown in 2024 to have been an epic failure, predicting sepsis in just a third of cases in a trial of over 27,000 patients, with many false alarms. The model was shown to be no more accurate than tossing a coin, according to a study published in the New England Journal of Medicine.

To be fair, as well as these high-profile failures, there are some considerable successes. AI has been shown to be particularly good at analysing medical imaging. In one case, the success rate of diagnosing breast cancer was 97% accurate, better than trained human staff. A number of trials have shown AI outperforming radiologists, though the picture here is nuanced. AI systems can process large volumes of images quickly, which is a good thing in overstretched radiology departments. Machine learning algorithms are consistent (unlike their cousin, generative AI, which is probabilistic in nature) and do not get tired. Radiologists seem to do better in complex cases, and a blend of human experience with AI input is the best approach here. There are a number of barriers to AI deployment in hospital settings, from limited high-quality training data to inconsistent systems, and a study in 2020 found that only a third of radiologists in the USA were using AI.

Generative AI has a number of potential medical applications. A commercial system called Functional Mind allows clinicians to do literature reviews by checking the latest medical literature, having been trained on medical texts and peer-reviewed documents. Other possible use cases are in producing tailored treatment plans as well as supporting doctors in general administrative tasks like scheduling appointments. There are a number of issues in the deployment of generative AI models compared with traditional machine learning models. Generative AI (or the large language models that underlie it) have a common predilection to “hallucinate”, producing fabricated or nonsensical, though usually plausible, answers. In some situations, this does not matter too much. If, for example, you use a generative AI tool to produce a new logo for a start-up company and it spits out something you don’t like or has nonsense text instead of the company name, then you just rerun it until you get something that you do like. In medicine, it is obvious that a system that hallucinates is about as desirable as a doctor that hallucinates. Hallucination has already proven to be highly problematic when lawyers have used generative AI, and it easy to see that physicians will need to be very careful indeed in employing a creative but inherently unreliable technology in a patient setting.

In the related field of drug discovery and development, AI has had a number of successes. DeepMind has developed a tool called Alphafold, which uses neural networks to identify how proteins will fold, a problem that had proved extremely hard to solve with conventional approaches. This technology has been shown to be very effective. In the world of clinical trials of drugs, datasets are often sparse, and generative AI can be used to generate synthetic data to fill gaps, enabling better validation of predictive models. Several new drugs have already been discovered with the aid of AI and have entered clinical trials. AI has been used in other fields in the pharmaceutical industry, such as speeding up manufacturing and even in regulatory compliance reporting, though in the latter case with human oversight and tracking.

In summary, AI has had a long history in medicine and related fields. Different types of AI, from rules-based models through to machine learning algorithms and to various types of neural networks, have different applications. Although experiences have been mixed, with some high-profile failures, there also are enough well-documented successes to suggest that, with careful selection of the right types of AI, and applying these to suitable use cases, AI can work and bring benefits. A key to success is in the application of the right type of AI to the appropriate type of problem, avoiding getting caught up in the current AI hype. A further key is in not just being rigorous about selecting the right use cases, but also monitoring and documenting them as they are tested and deployed. This is something that doctors and scientists have had plenty of practice at over the years.