code comprehension – Codemanship's Blog

I’ve watched a lot of people using “AI” coding assistants, and noted how often they wave through large batches of code changes the model proposes, sometimes actually saying it out loud: “Looks good to me”.

After nearly 3 years of experimentation using LLMs to generate and modify code, I know beyond any shadow of a doubt that you need to thoroughly check and understand every line of code they produce. Or there may be trouble ahead. (But while there’s music… etc)

But should I be surprised that so many developers are happily waving through code in such a lackadaisical way? Is this anything new, really?

I’ve watched developers check in code they hadn’t even run. Heck, code that doesn’t even compile.

I’ve watched developers copy and paste armfuls of code from sites like Stack Overflow, and not even pause to read it, let alone try to understand it or even – gasp – try to improve it.

I’ve watched developers comment out or delete tests because they were failing. I’ve watched teams take testing out of their build pipeline to get broken software into production.

We’ve been living in an age of “LGTM” for a very long time.

What’s different now is the sheer amount of code being waved through into releases, and just how easy “AI” coding assistants make it for the driver to fall asleep at the wheel.

And when we put our coding assistant into “agent” mode – or, as I call it, “firehose mode” – that’s when things can very quickly run away from us. Dozens or hundreds of changes, perhaps even happening simultaneously as parallel agents make themselves busy on multiple tasks at once.

Even if there were no issues in any of those changes – and the odds against that are extremely remote – when code’s being churned out faster than we’re understanding, it creates a rapidly-growing mountain of comprehension debt.

When the time comes – or should I say, when the times come – that the coding assistant gets stuck in a “doom loop” and we have to fix problems ourselves, that debt has to be repaid with interest.

Agents have no “intelligence”. They’re old-fashioned computer programs that call LLMs when they need sophisticated pattern recognition and token prediction. LLMs don’t follow instructions or rules. Use them for just a few minutes and you’ll see them crashing through their own guardrails, doing things we’ve explicitly told them not to do, and forgetting to do things we insist that they should.

The intelligence in this set-up is us. We’re the ones who can follow rules and instructions. We’re the ones who understand. We’re the ones who reason and plan. And we’re the ones who learn.

In 2025, and probably for many years to come, we are the agents. We’re the only ones qualified for the job.

My advice – based on the best available evidence and a lot of experience using these tools over the past 3 years – remains the same when you’re working on code that matters.

I recommend working one failing test at a time, one refactoring at a time, one bug at a time, and so on.

I recommend thoroughly testing after every step, and carefully reviewing the small amount of code that’s changed.

I recommend committing changes when the tests go green, and being ready to revert when they go red.

I recommend a fresh context, specific to the next step. I recommend relying on deterministic sources of truth – the code as it is (not the model’s summary of it), the actual test results, linter reports, mutation testing scores etc.

I strongly advise against letting LLMs mark their own homework or rely on their version of reality.

And forget “firehose mode” for code that matters. Keep it on a very tight leash.

The idea of “separation of concerns” originated from a need to make it possible for programmers to reason about a piece of code without the need to understand what’s going on inside its dependencies (and its dependencies’ dependencies).

In this sense, the primary benefit of modular design is to reduce cognitive load when working with any part of the system.

But that can only happen if every reference to other parts (e.g., function calls) “says what it does on the tin”, so we can form correct expectations about its behaviour within the context we’re reasoning about.

Ideally, to understand what a dependency does, we shouldn’t need to understand how it does it.

When names are unclear, or even misleading, we form the wrong expectations about what a dependency will do, and are forced to “look inside the box” to understand it.

In the same way that this increases the context size for an LLM and therefore the risk of errors, it increases cognitive load for programmers with the same end result. I see folks complaining often about having to have a bunch of source files open in order to understand what one piece of code is going to do.

It might be helpful here if I put forward a definition of code comprehensibility and a rough way of measuring it.

I see comprehensibility as the likelihood that the target audience (e.g., other team members) will correctly predict what a piece of code will do in specific cases.

ratings = [4, 6, 4, 5]

average_rating = sum(ratings)/len(ratings)

What will the value of average_rating be after that assignment?

Let’s refuctor that code to make it a little less obvious.

r = [4, 6, 4, 5]

ar = s(r)/l(r)

Now you need to know what the functions s and l do. That may mean looking it up, if there’s documentation available – more cognitive load.

Or it may mean actually peeking inside their implementations – even more cognitive load.

We could ask 10 developers to predict what the result will be. If 8 of them predict correctly, we might roughly gauge the comprehensibility of this code for that sample of people – remember who the audience is – is 80%.

Or we could ask an LLM ten times. Though it’s important to remember that LLMs can’t reason about behaviour. They would literally just be matching patterns. And in that sense, this is a reasonable test of how closely the code correlates with examples within the training data distribution.

So, in summary, naming is very, very important in modular software design. Names help us form expectations about behaviour, and if those expectations are correct then this means we don’t need to go beyond a signpost to understand what’s down that road.

Category: code comprehension

The Age of Coding “Agents”? Or The Age of “LGTM”?

What’s In A Name?