I’ve watched a lot of people using “AI” coding assistants, and noted how often they wave through large batches of code changes the model proposes, sometimes actually saying it out loud: “Looks good to me”.
After nearly 3 years of experimentation using LLMs to generate and modify code, I know beyond any shadow of a doubt that you need to thoroughly check and understand every line of code they produce. Or there may be trouble ahead. (But while there’s music… etc)
But should I be surprised that so many developers are happily waving through code in such a lackadaisical way? Is this anything new, really?
I’ve watched developers check in code they hadn’t even run. Heck, code that doesn’t even compile.
I’ve watched developers copy and paste armfuls of code from sites like Stack Overflow, and not even pause to read it, let alone try to understand it or even – gasp – try to improve it.
I’ve watched developers comment out or delete tests because they were failing. I’ve watched teams take testing out of their build pipeline to get broken software into production.
We’ve been living in an age of “LGTM” for a very long time.
What’s different now is the sheer amount of code being waved through into releases, and just how easy “AI” coding assistants make it for the driver to fall asleep at the wheel.
And when we put our coding assistant into “agent” mode – or, as I call it, “firehose mode” – that’s when things can very quickly run away from us. Dozens or hundreds of changes, perhaps even happening simultaneously as parallel agents make themselves busy on multiple tasks at once.
Even if there were no issues in any of those changes – and the odds against that are extremely remote – when code’s being churned out faster than we’re understanding, it creates a rapidly-growing mountain of comprehension debt.
When the time comes – or should I say, when the times come – that the coding assistant gets stuck in a “doom loop” and we have to fix problems ourselves, that debt has to be repaid with interest.
Agents have no “intelligence”. They’re old-fashioned computer programs that call LLMs when they need sophisticated pattern recognition and token prediction. LLMs don’t follow instructions or rules. Use them for just a few minutes and you’ll see them crashing through their own guardrails, doing things we’ve explicitly told them not to do, and forgetting to do things we insist that they should.
The intelligence in this set-up is us. We’re the ones who can follow rules and instructions. We’re the ones who understand. We’re the ones who reason and plan. And we’re the ones who learn.
In 2025, and probably for many years to come, we are the agents. We’re the only ones qualified for the job.
My advice – based on the best available evidence and a lot of experience using these tools over the past 3 years – remains the same when you’re working on code that matters.
I recommend working one failing test at a time, one refactoring at a time, one bug at a time, and so on.
I recommend thoroughly testing after every step, and carefully reviewing the small amount of code that’s changed.
I recommend committing changes when the tests go green, and being ready to revert when they go red.
I recommend a fresh context, specific to the next step. I recommend relying on deterministic sources of truth – the code as it is (not the model’s summary of it), the actual test results, linter reports, mutation testing scores etc.
I strongly advise against letting LLMs mark their own homework or rely on their version of reality.
And forget “firehose mode” for code that matters. Keep it on a very tight leash.