February 2026 – Codemanship's Blog

Will You Finally Address Your Development Bottlenecks In 2026?

I’ve spent the best part of 3 decades telling teams that to minimise the bottleneck of testing changes to their code, they’ll need to build testing right into their innermost workflow, and write fast-running automated regression tests.

“No, we don’t have time for that, Jason.”

I’ve spent the best part of 3 decades telling teams that to minimise rework due to misunderstandings about requirements, they’ll need to describe requirements in a testable way as part of a close and ongoing collaboration with our customers.

“No, we don’t have time for that, Jason.”

I’ve spent the best part of 3 decades telling teams that to minimise the bottleneck of code reviews, they’ll need to build review into the coding workflow itself, and automate the majority of their code quality checks.

“No, we don’t have time for that, Jason.”

I’ve spent the best part of 3 decades telling teams that to minimise merge conflicts and broken builds, and to minimise software delivery lead times, they’ll need to integrate their changes more often and automatically build and test the software each time to make it ready for automated deployment.

“No, we don’t have time for that, Jason.”

I’ve spent the best part of 3 decades telling teams that to minimise the “blast radius” of changes, they’ll need to cleanly separate concerns in their designs to reduce coupling and increase cohesion.

“No, we don’t have time for that, Jason.”

I’ve spent the best part of 3 decades telling teams that to minimise the cost and the risk of changing code, they’ll need to continuously refactor their code to keep its intent clear, and keep it simple, modular and low in duplication.

“We definitely don’t have time for that, Jason!”

“AI” coding assistants don’t solve any of these problems. They AMPLIFY THEM.

More code, with more problems, hitting these bottlenecks at accelerated speed turns the code-generating firehose into a load test for your development process.

For most teams, the outcome is less reliable software that costs more to change and is delivered later.

Those teams are being easily outperformed by teams who test, review, refactor and integrate continuously, and who build shared understanding of requirements using examples – with and without “AI”.

Will you make time for them in 2026? Drop me a line if you think it’s about time your team addressed these bottlenecks.

Or was productivity never the point?

Rely On AI And Get Left Behind

LLM-based code generation comes with multiple potential gotchas for a business.

More code hitting process bottlenecks like testing, review and integration makes most teams slower.

More problems being generated by text prediction engines that simply don’t understand what the requirements or the code means makes releases less stable.

These are effects we’re seeing today. Most dev teams are shipping less reliable software, and shipping it later thanks to “AI”. Good work, everyone!

But I suspect the biggest risk is creeping up on us without us necessarily realising.

The more teams offload their thinking to these tools, the less they understand the code your business relies on. And the less they understand it, the less able they are to fix problems that “AI” can’t fix. Outages last longer, and fixes are less sticky.

Mean Time To Failure goes down, and Mean Time To Recovery gets longer. More fires, taking longer to put out. Just ask the folks at AWS!

“AI”-assisted programming’s a bit like pedal-assist on electric bicycles. It makes progress feel easy, but we might not realise the impact that’s having on our coding “muscles” – our ability to comprehend and reason about code.

That is, until we run out of juice and have to pedal unaided. That’s when it becomes obvious just how much of our Code Fu we’ve lost as we’ve come to rely on that assistance more and more. Increasingly, I hear developers say “I’ve hit my token limit, so I’m blocked.”

Working in the low-gravity environment of “AI” coding, we may not notice the decline in our own abilities. And the more they decline, the less able we are to notice – until we find ourselves back in 1g and suddenly are unable to walk.

I’m seeing teams where most of the developers now have a hard time working on their code unaided by “AI”. It’s taking them much longer to wrap their heads around it, and they’re missing more and more problems.

Shipping code faster than we can understand it creates comprehension debt – extra time needed to build that understanding when the tool can’t do what we need it to do (and the need for that is very probably never going away).

Relying on LLMs to write the code for us erodes our ability to understand code, full stop. It’s a double-whammy.

There’s little doubting now that the devs who are most effective using “AI” are the ones who can work just fine without it.

Some say “Use AI or get left behind”. The reality is more probably “Rely on AI and get left behind”. Developers need to maintain their cognitive edge, which means keeping their hand in with regular unaided working.

There’s every chance that, in the future, these developers will be most in-demand.

If they don’t, then this is going to add a lot of risk for businesses – not least because the vendors will have them over a barrel. Price plans are heavily subsidised today. That can’t last, and if you’re not able to walk away, then you have no negotiating position.

That more businesses don’t see becoming completely reliant on a third-party as a strategic risk is baffling, especially in such a volatile market as “AI”. It could all implode at any moment.

But the real risk is that, one day, your core product or system will break, and there’ll be nobody capable of fixing it.

Well, nobody you could afford.

100% Autonomous “Agentic” Coding Is A Fool’s Errand

Though I’ve seen very little evidence of it being attempted on production systems with real users (because risk), my socials are flooded with posts about people’s attempts to crack fully-autonomous, completely-unattended software creation and evolution using “agents” at scale.

Demonstrations by Cursor and Anthropic of large-scale development done – they claim – almost entirely by agents working in parallel have proven that the current state of the art produces software that doesn’t work. Perhaps, to those businesses, that’s just a minor detail. In the real world, we kind of prefer it when it does.

I’ve attempted experiments myself to see if I can get to a set-up good enough that I can hit “Play” and walk away to leave the agents to it while I go to the proverbial pub.

That seems to be the end goal here – the pot of gold at the end of the rainbow. Whoever makes that work will surely, at the very least, make a name for themselves, and probably a few coins.

I’ve seen many people – some who understand this technology far better than me – attempt the same thing. Curiously, they don’t seem to have nailed it either, but are convinced that somebody else must have.

It’s that FOMO, I suspect, that continues to drive people to try, despite repeated failures.

But, as of writing, I’ve seen no concrete evidence that anybody has done it successfully on any appreciable scale. (And no, a GitHub repo you claim was 100% agent-generated, “Trust me bro”, doesn’t qualify, I’m afraid.)

The rules of my closed-loop experiments are quite simple: I can take as much time as I like setting things up for Claude Code in read-only planning mode, but once the wheels of code generation are set in motion, we’re like an improv troupe – everything it suggests, the answer is automatically “yes”. I just let it play out.

Progress is measured with pre-baked automated acceptance tests driving deployed software, which act as a rough proxy for “value created”, and help to avoid confirmation bias and the kind of “LGTM” assessments of progress that plague accounts of “agentic” achievements right now. It’s very a much an “either it did, or it didn’t” final quality bar.

I can’t intervene until either Claude says it’s done, or progress stalls. I can’t correct anything. I can’t edit any of the generated files. I have to simply sit back, watch and wait.

So far, no matter how I dice it and slice it, no set-up has produced 100% autonomous completion, or anything close.

No doubting, the tools are improving – using LLMs in smarter ways. But there’s only so much we can do with context management, workflow, agent coordination, quality gates and version control before we reach the limits of reliability that are possible when LLMs are involved. I suspect some of us are almost at that plateau already.

Agents – with those faulty narrators at their core – will always get stuck in “doom loops” where the problem falls outside their training data distribution, or the constraints we try to impose on them conflict.

Round and round little Ralph Wiggum will go, throwing the dice again and again in the hope of getting 13, or any prime number greater than 5 that’s also divisible by 3.

Out-of-distribution problems will always be a feature of generative transformers. It’s an unfixable problem. The best solution OpenAI have managed to come up with is having the model look at the probabilities and, if there’s no clear winner for next token, reply “I don’t know”. That’s not good news for full autonomy.

And, no, a swarm of Ralphs won’t solve the problem, either. It just creates another major problem – coordination. No matter how many lanes in your motorway, ultimately every change has to go through the same garden gate of integration at the end.

A bunch of agents checking in on top of each other will almost certainly break the build, and once the build’s broken, everybody’s blocked, and your beeper is summoning you back from the proverbial pub to unblock them.

One amusing irony of all these attempts to fully define 100% autonomous “agentic” workflows is that it’s turning many advocates into software process engineers.

Just taking quality gates as the example, a completely automated code quality check will require us to precisely and completely describe exactly what we mean by “quality”, and in some form that can be directly interpreted against, for example, the code’s abstract syntax tree.

I know Feature Envy when I see it, but describing it precisely in those terms is a whole other story. Computing has a long history of teaching us that there are many things we thought we understood that, when we try to explain it to the computer, it turns out we don’t.

Software architecture and design is replete with woolly concepts – what exactly is a “responsibility”, for example? How could we instruct a computer to recognise when a function or a class has more than one reason to change? (Answers on a postcard, please.)

Fully autonomous code inspections are really, really, really (really) hard.

90% automated? Definitely do-able. But skill, nuance and judgement will likely always be required for the inevitable edge cases.

Having worked quite extensively in software process engineering earlier in my career, I know from experience that it’s largely a futile effort.

We naively believed that if we just described the processes well enough – the workflows, the inputs, the outputs, the roles and the rules – then we could shove a badger in a bowtie into any of those roles and the process would work. No skill or judgement was required.

You can probably imagine why this appealed to the people signing the salary cheques.

It didn’t work, of course. Not just because it’s way, way harder to describe software development processes to that level of precision, but also because – you guessed it – teams never actually did it the way the guidance told them to. They painted outside the lines, and we just couldn’t stop them.

In 2026, some of us are making the same mistakes all over again, only now the well-dressed badger’s being paid by the token.

We might get 80% of the way and think we’re one-fifth away from full autonomy, but the long and checkered history of AI research is littered with the discarded bones of approaches that got us “most of the way”. Close, but no cigar.

It turns out that last few percent is almost always exponentially harder to achieve, as it represents the fundamental limits of the technology. On the graph of progress vs. cost, 100% is typically an asymptote. We need to recognise a wall when we see one and back away to where the costs make sense.

Attempting to achieve better outcomes using agents with more autonomy seems like a reasonable pursuit, as long as we’re actually getting those better outcomes – shorter lead times, more reliable releases, more satisfied customers.

Folks I know being successful with an “agentic” approach have stepped back from searching for the end of that rainbow, and have focused on what can be achieved while staying very much in the loop.

They let the firehose run in short, controlled bursts and check the results thoroughly – using a combination of automated checks and their expert judgement – after every one. And for a host of reasons, that’s probably why they’re getting better results.

It’s highly likely there’s no end to the “agentic” rainbow. Perhaps we should start looking for some gold where we actually are, using tools we’ve actually got?

Best Way To Keep A Secret In Software Development? Document It

Once upon a time, in a magical land far, far way – just north of London Bridge – I wore the shiny cape and pointy hat of a senior software architect.

A big part of my job was to document architectures, and the key decisions that led to them.

Many diagrams. Many words. Many tables. Many, many documents.

Structure, dynamics, patterns, goals, principles, problem domains, business processes. You name it, I documented it. Much long time writing them, much even longer time keeping them up-to-date. Descriptions of things that aren’t the things themselves, it turns out, go stale fast.

It only takes a couple of gigs to become suspicious that maybe, perhaps, nobody actually reads these documents. So I started keeping the receipts. I’d monitor file access to see when a document was last pulled from our DMS or shared server, or when Wiki pages were last viewed.

My suspicions were confirmed. Teams weren’t looking at the documents much, or – usually – at all. Those fiends!

In their defence, as the saying goes:

Documentation’s useful until you need it

(I looked up the source of this anonymous quote, which I’ve used often. Turns out it’s me. LOL. Why didn’t I remember? I probably wrote it in a document.)

A lot of architecture documentation is out-of-date to the point where it becomes misleading. Code typically evolves faster than one person’s ability to keep an accurate record of the changes.

A fair amount was never in-date in the first place because it describes what the architect wanted the developers to do, and not what they actually did.

The upshot is that architecture documentation is rarely an accurate guide to reality. It’s either stale history, or creation myths.

Recent research found that familiarity with a code base is a far better predictor of a developer’s comprehension of the architecture than access to documentation, and the style of the documentation didn’t seem to make any significant difference.

When I think back to my times as developer rather than architect, that rings true. I’d often skim the documentation – if I looked at it at all – then go look at the code. Because the code is the code. It’s the truth of the architecture.

For architecture documentation to be more useful in aiding comprehension, it has to be tightly-coupled to the reality it describes. Changes to the architecture need to be reflected very soon after they’re made – ideally, pretty much immediately.

I experimented with round-trip modeling of architecture quite deeply, reverse-engineering code on every successful build, producing an updated description for every working version.

But that, too, produced web documentation that I know for a fact almost never got looked at. Least of all by me, as one of the developers.

Reverse-engineering code tends to produce static models – models of structure. I do not find purely structural descriptions very helpful in understanding what code does. It only describes what code is.

If a library has documentation generated from JavaDoc, I’ll tend to ignore that and look for examples of how the library can be used.

When I want to understand a modular design, I’ll look for examples of how key components in the architecture interact in specific usage scenarios – how does the architecture achieve user outcomes?

Quick war story: I joined a team in an investment bank who’d been working in isolation on different COM components. The Friday before our launch date, we still hadn’t seen how these very well-specified components – we all got handed detailed Word documents for our parts – would work together to satisfy key use cases.

Long-story-short, turns out they didn’t. Not a single end-to-end use case scenario.

Every part of the system was described in detail. But the system as a whole didn’t work, and we couldn’t see that until it was too late.

This is why I start with usage scenarios. You can keep your documentation – I’m gonna set breakpoints, run the damn thing, and click some buttons so I can step down through the call stack and build a picture of the real architecture, as it is now, in functional slices.

And I would have damned well designed it that way in the first place, starting with user outcomes and working backwards to an architecture that achieves them.

So, a dynamic model driven by usage examples is preferable in aiding my comprehension of architecture to structural models of static elements and their compile-time relationships.

One neat way of capturing usage examples is by writing them as tests. This can serve a dual purpose, both documenting scenarios as well as enabling us to see explicitly if the software satisfies them.

Tests as specifications – specification by example – is a cornerstone of test-driven approaches to software design. And it can also be a cornerstone of test-driven approaches to software comprehension.

When tests describe user outcomes, we can pull on that thread to visualise how the architecture fulfils them. That so few tools exist to do that automatically – generate diagrams and descriptions of key functions, key components, key interactions as the code is running (a sort of visual equivalent of your IDE’s debugger) – is puzzling. Other fields of design and engineering have had them for decades.

As for understanding the architecture’s history, again I’ll defer to what actually happened over what someone says happened. The version history of a code base can reveal all sorts of interesting things about its evolution.

Of course, the problem with software paleontology is that we’re often working with an incomplete fossil record. Some teams will make many design decisions in each commit, and then document them with a helpful message like “Changed some stuff, bro”.

A more fine-grained version history, where each diff solves one problem and every commit explains what that problem was, provides a more complete picture of not just what was done to the design, but why it was done.

e.g., “Refactor: moved Checkout.confirm() to Order to eliminate Feature Envy”

Couple this with a per-commit history of outcomes – test run results, linter issues, etc – and we can get a detailed picture of the architecture’s evolution, as well as the development process itself. You can tell a lot about an animal’s lifestyle from what it excretes. Or, as Jeff Goldblum put it in Jurassic Park, “That is one big pile of shit”.

Naturally, there will be times when the reason for a design decision isn’t clear from the fossil record. Just as with code comments, in those cases we probably need to add an explanation to that record.

This is my approach to architecture decision records – we don’t record every decision, just the ones that need explaining. And these, too, need to be tightly-coupled to the code affected. So if I’m looking at a piece of code and going “Buh?”, there’s an easy way to find out what happened.

Right now, some of you are thinking “But I can just get Claude to do that”. The problem with LLM-generated summaries is that they’re unreliable narrators – just like us.

I’ve been encouraging developers to keep the receipts for “AI”-generated documentation. It’s pretty clear that humans aren’t reading it most of the time, and when we do, it’s at “LGTM-speed”. We’re not seeing the brown M&Ms in the bowl.

Load unreliable summaries into the context alongside actual code, and models can’t tell the difference. Their own summaries are lying to them. Humans, at least, are capable of approaching documentation skeptically.

Whenever possible, contexts – just like architecture descriptions intended for human eyes – should be grounded in contemporary reality and validated history.

And talking of “AI” – because, it seems, we always are these days – one area of machine learning that I haven’t seen applied to software development yet might lie in mining the rich stream of IDE, source file and code-level events we give off as we work to recognise workflows and intents.

A friend of mine, for his PhD in Machine Vision, trained a model to recognise what people were doing – or attempting to do – in kitchens from videos of them doing it.

Such pattern recognition of developer activity might also be useful to classify probable intent, and even predict certain likely outcomes as we work. At the very least, more accurate and meaningful commit messages could be automatically generated. No more “Changed some stuff, bro”.

At the end of the (very long) day, comprehension is the goal here. Not documentation. If a document is not comprehended – and if nobody’s reading it, then that’s a given – then it’s of no help.

And if it’s going to be read, it needs to be helpful in aiding comprehension. Like, duh!

Personally, what I would find useful is not better documentation, but better visualisation – visualisation of static structure and dynamic behaviour, of components and their interactions, and of the software’s evolution.

And visualisation at multiple scales from individual functions and modules all the way up to systems of systems, and the business processes and problem domains they interact with.

And when we want the team to actually read the documentation, we need to take it to them. Diagrams should be prominently displayed (I’ve spent a lot of time updating domain models to be printed on A0 paper and hung on the wall), explanations should be communicated. Ideally, architecture should be a shared team activity – e.g., with teaming sessions, with code reviews, with pair programming – and an active, ongoing, highly-visible collaboration.

Active participation in architecture is essential to better comprehension. Doing > seeing > reading or hearing.

That also goes for legacy architecture – active engagement (clicking buttons, stepping through call stacks in the debugger, writing tests for existing code) tends to build deeper understanding faster than reading documentation.

The fastest way to understand code is to write it. The second fastest is to debug it.

This is especially critical in highly-iterative, continuous approaches to software and system architecture, where decisions are being made in real-time many times a day. Without a comprehensible, bang-up-to-date picture of the architecture, we’re basing our decisions on shaky foundations.

Like an LLM generating code by matching patterns in its own summaries, we risk coming untethered from reality.

Which would be an apt description of most architecture documents, including mine.

Why LLMs Will Always Need An Expert In The Loop

The gargantuan valuations of “AI” companies – companies making or using LLMs – are built on things that investors believe the technology is going to do in the future.

That future, they’ve promised us many times, has been “just around the corner” since 2022. But the reality, when we actually observe it, has remained far less impressive.

We can keep falling for the “Yes, but this time it’s different” claims every time a new model comes out, or we can look ahead at the theoretical limits of the technology.

Large Language Models are massive, hyperdimensional, probabilistic pattern-matching and text prediction systems (yep, they’re basically predictive texting). What theoretical discipline could meaningfully predict the limits of their performance as they scale towards infinity?

There’s a branch of physics called Statistical Mechanics that enables researchers to describe the behaviour of complex, probabilistic systems at scale. SM uses probability to predict macroscopic thermodynamical properties of systems like entropy, pressure and heat based on the microscopic properties of atoms and molecules.

Tools from statistical mechanics are increasingly used to understand deep neural networks at scale by treating their millions or billions of parameters as interacting degrees of freedom in a high-dimensional system.

Concepts such as energy landscapes, phase transitions, and collective behavior help explain why models can exhibit sudden changes in capability, remarkable robustness – and the lack thereof – or predictable scaling trends as size and data grow.

Researchers have been using statistical mechanics to explore questions like how much more accurate LLMs are likely to get with greater scale.

For folks claiming that “hallucinations” will soon be a solved problem, it’s not good news.

In their paper, “The wall confronting large language models”, Peter V. Coveney and Sauro Succi showed that producing a model with an order of magnitude better accuracy – e.g., wrong 3% of the time instead of 30% – would require 10²⁰ times more data and compute to train. .

We’re talking Dyson Spheres (plural) to train a model of that size on similar timescales to the frontier models of today.

And it would require a further 10²⁰ times to go from 3% to 0.3%.

To scale an LLM to be as reliable as a modern compiler – as some people wrongheadedly claim they are – you’d probably need to be a Type V civilisation. And why would anybody burn the multiverse to create something undergrads can probably make?

Any kind of text-predicting AI that performs with that level of accuracy won’t be a generative transformer. It’ll be technology that doesn’t exist yet.

“Ah, but Jason, models are improving in benchmarks all the time.”

Yeah, about that. What researchers and users are finding when they measure it objectively is that, while benchmark scores do indeed improve, real-world performance is a very different story.

The newer scores can be explained by the fact that many benchmarks contain questions, prompts, or examples that overlap with the model’s training data. Combined with overfitting – finetuning models to benchmark data – this creates the illusion of solving problems when it’s actually statistical recall. Models are being taught to the test.

Summary: thoroughly testing & reviewing LLM output is here to stay.

Another question statistical mechanics sheds light on is how much better deep neural networks will get at handling long-range, “deep” patterns.

In his 2019 paper, “Mutual Information Scaling and Expressive Power of Sequence Models“, Huitao Shen links long‑range dependencies in recurrent neural networks to mutual information scaling, a concept also studied in statistical mechanics. Shen found that exponential decay of mutual information with distance implies difficulty in capturing very long dependencies.

And in his paper “Theoretical Limitations of Self-Attention in Neural Sequence Models“, Michael Hahn explains how attention at long ranges will always be a problem for generative transformers at any scale.

Cliffs Notes version: LLMs will always be driving in thick fog. This explains why they’re so bad at modular design.

The practical upshot of this is that LLMs don’t now, and very probably never will, “see” the bigger picture. For the foreseeable future, high-level concerns around modular design, dependencies and architecture will remain human concerns.

Any AI that can effectively handle long-range patterns will be built on a technology that doesn’t exist yet.

So, there are two very likely futures we can work with (or, more accurately, around). We should not expect LLMs to get significantly more reliable, and we should not expect them to make sound decisions about modular design, dependencies and high-level architecture on non-trivial systems.

The full skinny: a software developer will be required behind the wheel for any “AI” or “agentic” workflow that’s built around LLMs.

It’s physics.

An Ode To “It Can’t Be Done”

Yesterday, I blogged on its 25th birthday about the Manifesto for Agile Software Development.

Whenever I mention Agile (as she’s known to her friends and to her enemies), inevitably there’ll be at least one comment along the lines of “It can’t be done” or “Nobody’s ever done it”.

They may qualify that by citing organisational or technical obstacles to applying the values and principles laid out in the document, like entrenched management or unskilled developers or a risk-averse, command-and-control culture.

For sure, these are all obstacles I’ve faced many times. Fair comment.

Having said that, these are also obstacles that I’ve overcome in 90% of instances.

It’s possible to manage upwards, winning the support needed for – at the very least – your customer to actively engage with an agile development process.

You don’t need to change the whole organisation to create an island of self-organisation and rapid feedback. This is why I focus on teams and individuals, and don’t get involved with organisation-level “agile transformations”. From what I’ve seen in the last 25 years, They. Just. Don’t. Work. Not on that scale. But that’s a whole other post.

As for developer skills, well there’s a reason I necessarily ended up becoming a trainer and mentor in skills like usage-driven analysis & design, TDD, refactoring and continuous integration. To some extent, I’ve had to train most of the teams I’ve worked with going back to the late 1990s.

My response when friends complain that the team they’re stuck with “don’t know what they’re doing” is to ask “And what are you going to do about that?”

Command-and-control cultures are often a product of lack of trust. And in many cases, that lack of trust has been earned by past performance. Many disappointments, many broken promises. So organisations micromanage, and if anything’s guaranteed to end an experiment in software agility, it’s that.

To earn trust, teams need to deliver rapidly, reliably and sustainably. To gain the autonomy needed to deliver rapidly, reliably and sustainably, teams need to be trusted. Catch 22.

Here’s the thing, though. Nothing’s stopping you from having a conversation about that. There’s no law that says you can’t spell out the impasse to the managers you’ll be needing that trust from.

There’s a window of opportunity here for someone coming in to the organisation with fresh eyes to break the cycle. And that’s been my specialty for most of my career.

This is where it can help to have some courage, as well as a healthy disrespect for hierarchical authority.

Does it ruffle feathers? You betcha! You’ll struggle to find a project manager who has a good word to say about me.

Did we actually deliver? Almost always, yes.

Often begrudgingly, organisations have to acknowledge that the medicine is working after the team’s delivered and delivered and delivered, and users are making positive noises about the quality of it.

The fact that it’s hardly ever your code that wakes the head of engineering up at 2 am definitely sends a message.

The bargain with the Devil, of course, is that you have to deliver, and deliver, and deliver. Results are a much easier sell than promises. You really need to know what you’re doing.

While the rest of the engineering department may still be stuck in micromanagement hell, my experience has been that a reliable track record – suitably trumpeted to make sure the right people notice (dev teams need good PR) – will usually encourage management to back off.

Not always, of course. Sometimes it really is just about status and control, and I’ve used more ruthless tactics in those situations.

To borrow from comedian Stewart Lee, I can do office politics. I just choose not to.

I’d much rather have an honest conversation, but in leadership roles, my job is to remove the obstacles standing in the way of the team (and the customer, and sometimes the customer is the obstacle, so some tough love may be required).

If the choice is between creating value, and preserving the status quo, you can probably guess which door I normally go through.

Some teams are working in industries that are heavily regulated, where development processes are highly prescribed. There are a lot of boxes that need to be ticked before software can be released.

Here’s a little secret that I’ll let you have for free: you can fit a more iterative process inside a less iterative process without the high-ups and the box-tickers realising. As John Daniels and Alan Cameron Wills once said to me when we hired their company to do some specialised custom work for my client, “Yeah, we can make it look like that.”

The manual demands “Big Design Up-Front”? We stick a UML reverse-engineering step into our build pipeline. And for once, they get an architecture model that accurately reflects the actual design! We don’t need to mention that we already wrote the code.

(I would normally do something like that anyway, because an up-to-date-ish bigger picture helps us steer the architecture more effectively.)

More generally, when teams I’ve worked with have faced obstacles to rapid, reliable and sustainable development, we’ve almost always found ways to remove them or go around them or dig a tunnel under them.

9 times out of 10, the solution is very simply not to ask for permission. Just f**king do it.

Do we need the boss’s approval to write the tests first? No. If we’re delivering, what business is that of theirs?

Do we need C-suite buy-in to check-in changes in smaller batches? Again. Who’s gonna know?

It’s a happy accident that some of the changes we could make that have the highest leverage – the biggest impact for the smallest investment – are also the least visible outside the team. We don’t need to get budget to start writing unit tests. We don’t need the CTO’s signature to run a linter.

It’s usually only when the change you want to make will require people outside the team to change that buy-in is really required. This is where cohesive cross-functional teams shine best. The less we need from other teams or from management, the more autonomous we can choose to be.

And becoming agile under the radar reduces the risk of attracting the attention of organisational antibodies. We can be delivering very visible results, and the mechanisms involved can be largely invisible.

I’ve worked in teams where we’ve done everything from setting up our own token-ring network because the client wouldn’t allow us on theirs (obviously, don’t do this if you work for MI5), to inventing a team member so we could use their PC as a build server after they refused to give us a machine.

If it’s in the business’s best interest, we can usually find a way. Just takes a bit of creative thinking, a pinch of courage, and a little dash of charm. “Wah, wah, wah, we want a build machine!” is less persuasive than you think. At the very least, be ready to explain why it’s in their interest. A lot of the time, “It can’t be done” really means “I couldn’t persuade them”.

I’d guesstimate that more than half the successes I’ve been involved with doing software development in an agile way have been despite the hand we were dealt.

And never underestimate the bargaining power of a united team. Those 10% of times when greater agility stayed firmly, stubbornly out of reach were when I was the lone voice in the wilderness. In those situations, I cut my losses. And I’ve grown much better at recognising the signs before I engage.

By 2000, I was usually in a position to change the makeup of my team, too. Extreme Programmers of a feather and so on.

The poker metaphor is very appropriate here, since the real benefit of agile software development is minimising risk. We don’t let uncertainty pile up into big batches and big releases, and managers usually realise that their picture of actual progress is far more realistic, enabling them to make more informed decisions grounded in the reality of user feedback about working software.

So not only can it be done, it has been done, and successfully, many times. I’ve seen it, and I’ve lived it.

You might as well tell me that my house “can’t be built”.

Psst. Did you know that, as well as delivering high-quality training and coaching in agile development skills like TDD, refactoring and CI, I’m also available as a fractional principal engineer. I’ve worked with clients on an ongoing basis helping dev teams to turn “It can’t be done” into “We did it!” Let me have those difficult conversations with management and absorb the heat coming from the project office.

The Agile Manifesto at 25 – The Most Talked-About Unread Document In Software

Feb 11th 2001 – exactly 25 years ago – was the first day of a 3-day meeting at the Snowbird ski resort in Utah that gave birth to the Manifesto for Agile Software Development.

Like all important documents, most folks who think they know what it says don’t appear to have actually read it.

There’s no mention of “sprints”, or “story points”, or “user stories”, or “build pipelines” or “Kanban boards” or any of the other details of specific Agile methodologies. The Manifesto isn’t a “how-to” guide. It’s a “why-to” guide.

It lays out a set of values and principles for teams who wish to be more responsive to changing user and business needs, through continuous learning and adaptation in a highly people-centric – and especially customer-centric – process.

On the front page of the manifesto’s website, it states:

“We are uncovering better ways of developing software by doing it and helping others do it.

Through this work we have come to value:

– Individuals and interactions over processes and tools

– Working software over comprehensive documentation

– Customer collaboration over contract negotiation

– Responding to change over following a plan

That is, while there is value in the items on the right, we value the items on the left more.”

The values and principles of Agile Software Development describe what it is like to be agile (with a small ‘a’). We do not “do” agile. We do not “adopt” agile. We do not “implement” agile.

We become agile.

The manifesto itself is, of course, a laudable but compromised attempt to bring multiple iterative and “lightweight” development methodologies – different “brands” – under one banner.

One of those brands came to dominate in the minds of software developers.

Another came to dominate in the minds of managers and executives, and ultimately came to dominate economically. This was a mistake.

“Agile Software Development” is doing software development in an agile way. It’s not a management thing.

Agile teams are self-organising, because putting decision-making power where (and more importantly, when) decisions need to be made – and therefore understood – is one of the keys to unlocking greater agility. A chain of command almost always becomes a serious bottleneck.

The other key message that got lost in translation is the need for strong technical chops. Teams who can’t deliver rapidly, reliably and sustainably get micromanaged, and that ends their experiment in agility.

Without development teams with strong technical discipline who are empowered to make decisions, Agile Software Development becomes little more than Agility Theatre. And that’s the reality for the majority of teams today – command-and-control waterfall development wearing an Agile Halloween costume, while the technical capability withers from lack of investment.

In this respect, Agile Software Development’s reputation – like the reputations of so many ideas in software development – is built on the experiences of teams that weren’t applying the values and principles. Most developers and managers have never even seen software agility in practice.

I recommend reading the manifesto again (or for the first time). Perhaps do it as a team. Maybe talk about how the values and principles could or are helping, and identify where your organisation might just be play-acting

“TDD Slows Me Down”. Good.

The illusion of productivity in software development can be very seductive.

At the start of my career, I felt most productive when I was able to write code uninterrupted for hours. I even thought that increasing my typing speed would make me a more productive developer.

It took a while, but the penny eventually dropped that creating code and creating value aren’t the same thing.

And when I’m creating code faster than I can truly, deeply understand it and validate it – what I call “LGTM-speed” – I’m really just creating more problems for later.

That’s the thing about iterative processes: it’s easy to forget that what’s downstream of present you is upstream of future you. The details you missed, the boo-boos you made while you were being “super-productive” will come right back around, eating up more and more of your “productive” time in later iterations.

In training and coaching, I encourage developers to try a style of programming called “test && commit || revert” (TCR). When they run their tests, if all the tests pass their changes are automatically committed to version control. If any tests fail (or the code won’t compile), then their changes are automatically reverted and they have to try again.

TCR trains them to take smaller steps. Instead of making 10 changes and then running the tests, they make one or two, greatly reducing the risk of losing their work.

When they work in this way, they report that it feels really slow. But they underestimate just how much time they’d spent debugging or reworking code to get it working before. If we time the exercise, it usually doesn’t take longer. It often actually takes less time to complete the problem.

It’s a bit like typing fast. When I try to type faster – because I’m very busy and important, people to see, places to be – I tend to find myself backspacing much more often to fix errors. I’m actually faster when I type slower.

We see a similar effect when developers slow down their workflow. Magically – even though it feels slower – we often deliver sooner.

So, yes, TDD slows you down – with its small steps, continuous testing, continuous code review and continuous refactoring. And that helps you to get there sooner.

Think of it as brakes. Sure, we can go much faster without brakes. But not for long.

Tech Leaders: Low-Performing Teams Are A Gift, Not A Curse

It’s an age-old story. You’re parachuted in to lead a software development organisation that’s experiencing – let’s be diplomatic – challenges.

Releases are far apart and full of drama. Lead times are long, and unless it’s super, super necessary – and they’re prepared to dig deep into their pockets and wait – the business has just stopped asking for changes. It saves times. They know the answer will probably be “too expensive”.

The only respite comes with code freezes and release embargos – never on a Friday, never over the holidays. But is a break really a break when you know what you’ll be coming back to?

Morale is low, the best people keep leaving, and they can almost smell the chaos on you when you’re trying to hire them.

The finger-pointing is never-ending. And soon enough, those fingers will be pointing at you. Honeymoon periods don’t last long, and the clock is ticking to effect noticeable positive change. They lit the fuse the moment you said “yes”.

I know, right! Nightmare!

Except it isn’t. For a leader new in the chair, it’s actually a golden ticket.

The mistake most tech leaders make is to go strategic – bring in the big guns, make the big changes. It’s a big song and dance number, often with the words “Agile” and/or “transformation” above the door.

This creates highly-visible activity at management and process level. But not meaningful change. Agility Theatre is first and foremost still theatre. You’re focusing on trying to speed up the outer feedback loops of software and systems development – at high cost – without recognising where the real leverage is.

Where I work as a teacher and advisor is at the level of small, almost invisible changes to the way software developers work day-to-day, hour-to-hour, minute-to-minute.

The biggest leverage in a system of feedback loops within feedback loops is in the innermost loops.

Think about it in programming terms: you have nested loops in a function that’s really slow, and you need to speed it up. Which loop do you focus your attention on – the outer loop, or the inner loop? Which one would optimising give you the most leverage for the least change?

I worked with a FinTech company in London who called me in because they’d been going round in circles trying to stabilise a release. Every time they deployed, the system exploded on contact with users – costing a lot of money in fines and compensation, and doing much damage to their reputation.

“Agile coaches” had been engaging with management and, to a lesser extent, teams to try to “fix the process”.

I took one look, saw their total reliance on manual testing, and recognised the “doom loop” immediately. Testing the complete system took a big old QA team several weeks, during which time the developers were busy fixing the bugs reported from the last test cycle, while – of course – introducing all-new bugs (and reintroducing a few old favourites).

They’d been stuck in this loop for 9 months, at a cost running into millions of pounds – plus whatever they’d been paying for the “Agile coaching” that had made no discernible difference. You can’t fix this problem with “stand-ups”, “cadences” and “epics”.

I took the developers into a meeting room and taught them to write NUnit tests. Prioritising the most unstable features, the problem was gone within 6 months, never to return.

And something else happened – something quite magical. Not only did the software become much more reliable, but release cycles got shorter. Better software, sooner.

That’s quite the notch on the belt for a new CTO or head of engineering, dontcha think?

After decades of helping organisations build software development capability, I know from wide experience that taking a team from bad to okay is way easier than taking them from okay to good, which is way easier than from good to excellent.

Having helped teams climb that ladder all the way to the top, I can attest that the first rung is a real gift. Your CEO or investors probably can’t distinguish excellent from good, but the difference between bad and okay is very noticeable to businesses.

Unit testing is table stakes in that journey, but the difference it makes is often huge. Low-visibility, but very high-leverage.

Here’s the 101: the inner loops of software development – coding, testing, reviewing, refactoring, merging – are where the highest-leverage changes can be made at surprisingly low cost.

90% of the time, nobody but the developers themselves need get involved. No extra budget’s required (you might even save money). No management permission need be sought. No buy-in from other parts of the organisation is necessary.

High-leverage changes that can slip under the radar, but that can also have quite profound impact.

And results are much easier to sell than promises. Imagine, when your honeymoon period’s up, being able to point to a graph showing how lead times have shrunk, and how fires in production have become rare.

I’ve seen that buy tech leaders the higher trust and autonomy they need to make those strategic changes that the infamous organisational antibodies may well have attacked before. They were immunised by results.

Now for the really good news. In all those cases I’ve been involved in – many over the years – tech leaders only did one thing.

I am not an “Agile coach” or a “transformation lead”. I do not sit in meetings with management. I do not address strategic matters or matters of high-level process.

I work directly with developers, helping them to make those high-leverage, mostly invisible changes to their innermost workflows. Your CEO will likely never meet me, or even hear my name.

The first rule of Code Craft is that we do not talk about Code Craft.

Right now, you may be thinking, “But Jason, we’ve got AI. Developers can produce 2x the code in the same time.”

Do you think 2x the code hitting that FinTech company’s testing bottleneck would have made things better? Or would that just have been more bugs hitting QA in bigger batches?

Attaching a code-generating firehose to your dev process is very likely to overwhelm bottlenecks like testing, code review and merging – making the system as a whole slower and leakier. I call it “LGTM-speed”.

Tightening up those inner loops is even more essential when we’re using “AI” code generators. If we don’t, lead times get longer, releases get less stable, and the cost of change goes up. Worse software, later.

Good luck presenting that graph to the C-suite!

If you need help tightening up the inner feedback loops in your software development processes – with or without “AI” – that’s right up my street.

Visit https://codemanship.co.uk/ for details of training and coaching in the technical practices that enable rapid, reliable and sustainable evolution of software to meet changing needs.

Am I Anti-AI? No. I’m Anti-Harm.

If you’ve been following my blog posts and social media noisings about generative “AI” (see, there’s those quotation marks again) – and especially about LLM-assisted software development – you might be forgiven for thinking that I’m against it.

For sure, I take a skeptical, evidence-based view. I’ve followed the data, learned the underlying mechanics, and in many instances tested claims myself to try and establish what’s real and what works. (And for it to work, it has to be real.)

This has led me to a far less sensational position than many you’ll see out there on Teh Internets. And, from the perspective of people who’ve drunk the Kool-Aid – and now insist that we all drink it – that can look like an anti-AI take.

For the sake of clarity, I’m not “anti-AI”.

I’m anti:

Shipping code at LGTM-speed

Replacing user research & feedback with hallucinations

Abdicating thinking

BDUF in .md files

Burning the planet for – in most cases – lower productivity & worse reliability

Stealing IP on an industrial scale

Economy-distorting investment fraud

Gross invasions of privacy

“AI psychosis”

Technofeudalism & technofascism (Let’s be honest – the folks controlling the hyper-scale models are not big fans of democracy)

A future is possible where development teams use small, specialised language models, trained on ethically-sourced data using a tiny fraction of the compute and energy, and running on consumer hardware under our control. Nobody else sees our data.

These models, and the data they’re trained on, can be a common good – prepared by communities with the express intent of licensing them for ethically-trained-and-operated “AI”.

Base models can be finetuned for our purposes using our data (e.g., our code).

And they won’t be trained to try to fool us into believing they think, feel or care about us, or to drive engagement.

Devs will use them when it adds value, not just “because AI”.

And they’ll use them in highly iterative processes where they’re 100% engaged – continuously testing, inspecting, refactoring and integrating code in small batches, and never letting things slide into LGTM-speed.

So, yes, I’m actually “pro-AI” – I use it every day.

I just have a different vision of the future to what the hyper-scalers and their backers would prefer – democratic, environmentally-friendly, economically viable, genuinely useful, and ethically and socially responsible.

Thank you for your attention to this matter.

</rant>