It’s well documented that software development efforts that incorporate generative AI include mistakes that are radically different than what any human programmer would ever make. And yet, most enterprise plans for remediating AI coding mistakes rely on simply inserting experienced human programmers in the loop. Cue train wreck. (No need to click if you are not an Adams Family fan.)

Experienced human programmers know intuitively the kinds of mistakes and shortcuts human programmers make. But they need to be trained to look for the kinds of mistakes that arise when software creates software.

This conversation was accelerated by comments by AWS CEO Matt Garman that he expects most developers will no longer be coding as early as 2026

Many vendors in the dev tools arena have argued that this can be solved by using AI apps to manage AI coding apps. Cue train wreck No. 2. Even financial giant Morgan Stanley is toying with using AI to manage AI

As a practical matter, the only safe and remotely viable approach is to train programming managers to understand the nature of generative AI coding errors. In fact, given that the nature of AI coding errors is so vastly different, it might be better to train new people to manage AI coding efforts — people who are not already steeped in finding human coding mistakes.

Part of the problem is human nature. People tend to magnify and misinterpret differences. If managers see an entity — be it human or AI — making mistakes those managers themselves would never do, they tend to assume the entity is inferior to the manager on coding issues. 

But consider that assumption in light of autonomous vehicles. Statistically, those vehicles are light years safer than human-operated cars. The automated systems are never tired, never drunk, never deliberately reckless. 

But automated vehicles are not perfect. And the kinds of mistakes they make — such as smashing full-speed into a truck stopped for traffic — prompt humans to argue, “I never would have done something so stupid. I don’t trust them.” (The Waymo parked car disaster is a must-see video.)

But just because automated vehicles make weird mistakes doesn’t mean they’re less safe than human drivers. But human nature can’t reconcile those differences.

It’s the same situation with managing coding. Generative AI coding models can be quite efficient, but when they go off the reservation, they go way off.

Insane alien programmers

Dev Nag, CEO of SaaS firm QueryPal, has been working with generative AI coding efforts and feels many enterprise IT executives are not prepared for how different the new technology is.

“It made tons of weird mistakes, like an alien from another planet,” Nag said. “The code misbehaves in a way that human developers don’t do. It’s like an alien intelligence that does not think like we do, and it goes in weird directions. AI will find a pathological way to game the system.”

Just ask Tom Taulli, who’s authored multiple AI programming books, including this year’s AI-Assisted Programming: Better Planning, Coding, Testing, and Deployment.

“For example, you can ask these LLMs [large language models] to create code and they sometimes make up a framework, or an imaginary library or module, to do what you want it to do,” Taulli said. (He explained that the LLMs were not actually creating a new framework as much as pretending to do so.)

That’s not something a human programmer would even consider doing, Taulli noted, “unless (the human coder) is insane, they are not going to make up, create out of thin air, an imaginary library or module.”

When that happens, it can be easy to detect — if someone looks for it. “If I try to pip install it, you can find that there’s nothing there. If it hallucinates, the IDE and compiler give you an error,” Taulli said.

The idea of turning over full coding of an application — including creative control of the executable — to a system that periodically hallucinates seems to me a dreadful approach.

A much better way to leverage the efficiency of generative AI coding is by using it as a tool to help programmers get more done. Taking humans out of the loop, as AWS’s Garman suggested might happen, would be suicidal.

What if a generative AI coding tool lets its mind wander and creates some back doors so it can later do fixes without having to bother a human — back doors that attackers could also use? 

Enterprises tend to be quite effective at testing apps — especially homegrown apps — for functionality, to make sure the app does what it is supposed to. Where app testing tends to fall apart is when checking on whether it can do anything that it should not do. That would be a penetration testing mentality. 

But in a generative AI coding reality, that pen testing approach has to become the default. It also needs to be managed by supervisors well schooled in the wacky world of generative AI mistakes.

Enterprise IT is certainly looking at a more efficient coding future, with programmers assuming more strategic roles where they focus more on what the apps should do and why and devote less time to laboriously coding every line. 

But that efficiency and those strategic gains will come at a hefty price: paying for better and differently-trained humans to make sure AI-generated code stays on track.