Coding on AI: The Short- vs Long-Term Impact on Developer Productivity
by Adam Tornhill, October 2024
How much does it take for a software system to break down into an unmaintainable morass? AI-coding tools pushing the promise of rapid efficiency boosts got me me thinking about short- vs long-term optimizations of developer productivity.
The study that piqued my interest showed how developers using Copilot complete 26.08% more tasks. As in previous Copilot studies, it was the less experienced developers who reaped these greater benefits. What caught my interest, though, was that the study claims that no negative impact on code quality was observed. This stood out to me, given what we have found in CodeScene's research on AI-refactoring: an LLM frequently makes the code measurably worse.
Reading the actual paper (it's a great read, btw), it becomes clear that the research team never measured code quality. Instead they used 'build success rates' as a proxy for code quality. Yet, a successful build tells us nothing about code structure, maintainability, or clarity -- it only shows that the code, well, builds. Code quality, on the other hand, is about the longevity of the system: can the code be easily understood, maintained, and adapted over time?
Now, combine those insights with another fundamental property of maintainable code: familiarity. Last year, professor Eric Klopfer conducted an interesting experiment. Klopfer split students into multiple groups. One used ChatGPT, another group had to search for solutions themselves. The ChatGPT group completed their tasks the fastest, indicating that AI indeed boosts productivity. However, there was a twist. Next, Klopfer had their students solve the problem from memory. And this time, the ChatGPT group performed worse. Much worse. As a psychologist, this is a fun story, yet completely unsurprising. To learn something, we need to engage actively with the subject under study. We need to internalize the content. Klopfer's experiment illustrates that task completion doesn't necessarily come with true learning nor understanding.
So, what happens if lots of developers quickly complete tasks but never understand what the code actually does? At what point will the AI break down, fail to ship working code, and leave the organization with a massive codebase that no one understands?
The productivity gains of AI are real, but we do ourselves a disservice if we optimize for 'get things done' without understanding the _why_ behind the solutions. Navigating this AI-assisted coding frontier requires balancing our yearning for short-term 'productivity' with a strong focus on code quality and continuous learning.
About Adam Tornhill
Adam Tornhill is a programmer who combines degrees in engineering and psychology. He's the founder of CodeScene where he designs tools for code analysis. Adam is also the author of Software Design X-Rays: Fix Technical Debt with Behavioral Code Analysis, the best selling Your Code as a Crime Scene, Lisp for the Web, Patterns in C, a public speaker, and software researcher. Adam's other interests include music, retro computing, and martial arts.