Ever since OpenAI’s ChatGPT burst on the scene in November 2022, there has been a buzz of excitement about the possible use cases for generative AI. From medical imaging to writing marketing material, from generating illustrations to translating between languages, generative AI has created a popular wave of awareness that is rare for a new technology. ChatGPT gained 100 million users in two months from its release, an unprecedented adoption rate. Some use cases are much clearer than others, as issues with AI hallucinations in particular have caused many expensive and often amusing mishaps. One of the principal and most established use cases for generative AI has been in coding for software development. Indeed, some people have heralded the end of the line for software developers, with Mark Zuckerberg of Facebook predicting that many mid-level software engineers will soon be redundant. The media is full of articles examining the potential demise of programming as a career, or at least the disruption of it by AI. When reading such articles, always consider the vested interests of the claimant. A company selling AI tools, or an investor in such a company, or a consultant implementing the tools, all clearly have an incentive to claim things that will encourage people to spend money with them.
Certainly, AI can be used to deliver prototypes quickly, and the phenomenon of “vibe coding” has emerged, where non-developers use AI to write and deploy applications, though clearly using such code in practice has many risks, not least with security. Programmers have always sought out ways to become more productive, ever since programming languages moved from machine code to assembler to high-level languages. More recently, there have been integrated development environments like Microsoft Visual Studio Code and a host of tools to help speed up testing, debugging and deployment. AI is another step on that path.
It is worth stepping back a moment and considering what a software developer actually does all day: they do not just sit down at their computer and write code continuously. Various studies have shown that software developers spend from 20%-40% of their time actually coding. They also carry out code reviews, testing, debugging, writing documentation, and analysing the problems that they are tackling with their code. This is quite apart from project meetings such as code reviews, administration, training and research. AI can help with some of these things, but not all. Developers still have to understand the problem that their code is addressing, and they still need to test and debug the software. Debugging software written by someone else is much harder than debugging your own code, where you at least understand the intent and thought process, so it would make sense that AI may considerably speed up the process of actual coding at the expense of debugging. It is possible to try and help an AI by providing coding standards in a prompt, or by referencing an external file with such coding standards. The ability to do this depends on the particular LLM’s context window and its support for document handling.
Remember that at least 60% of an enterprise’s IT budget goes on code maintenance and support. Different surveys give different numbers, but most estimates are actually higher than this. This is a critical point: if most enterprise software budget goes on maintenance, then an initiative that improves software delivery times but at the cost of increased maintenance may actually be a bad idea. Certainly, AI code is likely to have an increased support burden: a survey in Devops.com found that 67% of programmers spent more time debugging AI-generated code than human-written code, and 69% spent more time fixing security issues. One thing that should be said is that AI can itself be used in debugging, being able to read and analyse legacy code and suggest changes.
What is the evidence on software productivity, for the delivery of code? The largest software survey, the annual Dora report, found that AI usage in coding actually had a negative reported effect on productivity, with an increase in code instability and a slight reduction in software delivery rates. 39% of the respondents of this huge survey claimed to have little or no trust in the code being generated. This was widely regarded as counterintuitive, especially given the number of media articles predicting the end of programming as a career. In July 2025, another major study emerged that took a scientific approach to studying the productivity of generative AI on programming.
A recent study by METR, a specialist AI benchmarking company, had some surprising results. A group of experienced software developers were paid to carry out a series of coding tasks, each task taking around two hours. For some tasks the developers were allowed to use whatever AI assistance they wanted, which was usually in the form of Cursor Pro and Claude Sonnet; the rest were coded the old-fashioned way, the tasks being randomly assigned between the developers. The programmers were asked about their perception of their own productivity. They thought that AI would speed them up by around 20%. The reality was that the tasks using AI were completed 19% slower than the ones with hand coding. This is a fascinating result, and one that contradicts the current anecdotal opinion that AI tools speed up coding significantly. It should be emphasised that this is just one study, though it is consistent with the massive Dora study. This new study did not consider issues like code stability or security, which were issues identified in the Dora study and others. Studies from Stanford University and others have shown that AI-generated software is riddled with security flaws.
More and more software developers are adopting AI in their jobs, with an adoption rate of 76% according to one study. This suggests that there is considerable momentum behind the AI coding bandwagon. However, the recent METR study and the large Dora study show that the actual picture for productivity using AI is, at the very least, nuanced. Developers believe that they are being more productive when the evidence in the METR study shows that they are actually less productive. Study after study has shown that AI-generated code can be insecure and is less stable than hand-generated code. Are developers themselves being carried along on a wave of inflated expectations? No one doubts that a generative AI can churn out code very quickly indeed, but is the price to be paid in terms of debugging ease, stability and security flaws as great, or greater, than the productivity gain from instant code generation? If the productivity studies for software delivery suggest that developers using AI are actually delivering usable code more slowly, and given that AI code is costlier to maintain, this has significant implications. One recent survey highlighted a divide between developers using AI tools and the executives mandating their use. The survey found that 75% of executives claimed that their AI rollout had been successful, but just 45% of the practitioners shared that view. The 2024 Stack Overflow survey showed a decline in faith in AI tools compared to the 2023 survey.
It must be remembered that this is quite a new field. Widespread use of AI for coding has only been around since 2023, and so studies of it and formal academic research into it are only now starting to appear in any quantity. There is no doubt that many more such studies will appear, and there is also no doubt that there is a huge amount of money being poured into AI research to improve the quality of the AI coding environments. Also, developers are still getting to grips with the new AI tools, and over time, they may get better at using them and become more productive. Consequently, the picture in mid-2025 shown by current research may not be the same in years to come. Nonetheless, the picture painted by these current studies suggests that there are many issues still to be addressed if generative AI is to deliver on its promise of radically improving software productivity. More careful analysis is required to objectively understand the true situation, and it would be prudent to temper expectations of just how productive generative AI is in the world of software development at present. As ever, careful measurement and monitoring are advised. At this point, it seems that software developers may not need to retrain in other career fields, at least not just yet.